# **Programming Languages and Systems**

**28th European Symposium on Programming, ESOP 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019, Proceedings**

# Lecture Notes in Computer Science 11423

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

### Editorial Board Members

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA

More information about this series at http://www.springer.com/series/7407

Luís Caires (Ed.)

# Programming Languages and Systems

28th European Symposium on Programming, ESOP 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019 Proceedings

Editor Luís Caires Universidade NOVA de Lisboa Caparica, Portugal

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-17183-4 ISBN 978-3-030-17184-1 (eBook) https://doi.org/10.1007/978-3-030-17184-1

Library of Congress Control Number: 2019936299

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# ETAPS Foreword

Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech Republic in its beautiful capital Prague.

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security.

Organizing these conferences in a coherent, highly synchronized conference program enables participation in an exciting event, offering the possibility to meet many researchers working in different directions in the field and to easily attend talks of different conferences. ETAPS 2019 featured a new program item: the Mentoring Workshop. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference. On the weekend before the main conference, numerous satellite workshops took place and attracted many researchers from all over the globe.

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac Flanagan (University of California at Santa Cruz). Invited tutorials were provided by Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 2019 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles University. Charles University was founded in 1348 and was the first university in Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Jan Vitek and Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek.

The ETAPS SC consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst (Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Müller (Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local organization team for all their enormous efforts enabling a fantastic ETAPS in Prague!

February 2019 Joost-Pieter Katoen ETAPS SC Chair ETAPS e.V. President

# Preface

This volume contains the papers presented at the 28th European Symposium on Programming (ESOP 2019) held April 8–11, 2019, in Prague, Czech Republic. ESOP is one of the European Joint Conferences on Theory and Practice of Software (ETAPS). It is devoted to fundamental issues in the specification, design, analysis, and implementation of programming languages and systems.

The 28 papers in this volume were selected from 86 submissions based on originality and quality. Each submission was reviewed by at least three Program Committee (PC) members and external reviewers, with an average of 3.2 reviews per paper. Authors were given the opportunity to respond to the reviews of their papers during the rebuttal period, January 11–14, 2019.

Each paper was assigned a guardian in the PC, who was in charge of making sure that additional reviews were solicited if necessary, and for presenting a summary of the reviews, author responses, and decision proposals at the physical PC meeting. All submissions, reviews, and author responses were considered during online discussion, which identified 52 submissions to be further discussed at the physical PC meeting held in Cascais, Portugal, January 19, 2019. All non-conflicted PC members participated in the discussion of each paper's merits.

The PC wrote summaries based on online discussions and on discussions during the physical PC meeting, to help authors understand decisions and improve the final version of their papers. Papers co-authored by members of the PC were held to a higher standard and were discussed first at the physical PC meeting. There were 11 such submissions of which five were accepted. Papers for which the PC chair had a conflict of interest were kindly handled by Shao Zhong.

I would like to thank all who contributed to the success of the conference: the authors who submitted papers for consideration, the external reviewers, who provided expert reviews, and the Program Committee, who worked hard to provide detailed reviews, and engaged in deep discussions about the submissions. I am also grateful to have benefited from the experience of past ESOP PC chairs Amal Ahmed and Jan Vitek, and to the ESOP Steering Committee chairs, Giuseppe Castagna and Peter Thiemann, who provided essential advice for numerous procedural issues. I would like also to thank the ETAPS Steering Committee chair, Joost-Pieter Katoen, for his dedicated work and blazing fast responsiveness.

EasyChair was used to handle submissions, online discussions, and proceedings editing. Finally, I would like to thank the NOVA Laboratory for Computer Science and Informatics and OutSystems SA for supporting the physical PC meeting and Joana Dâmaso for assisting with the organization.

February 2019 Luís Caires

# Organization

### Program Committee

Stephanie Balzer CMU

Nada Amin Ecole Polytechnique Fédérale de Lausanne, Switzerland Lars Birkedal Aarhus University, Denmark Johannes Borgström Uppsala University, Finland Luís Caires Universidade NOVA de Lisboa, Portugal Ugo Dal Lago Università di Bologna, Italy, and Inria Sophia Antipolis, France Constantin Enea IRIF, University Paris Diderot, France Deepak Garg Max Planck Institute for Software Systems, Germany Simon Gay University of Glasgow, UK Alexey Gotsman IMDEA Software Institute, Spain Atsushi Igarashi Kyoto University, Japan Bart Jacobs Katholieke Universiteit Leuven, Belgium Isabella Mastroeni Università di Verona, Italy J. Garrett Morris The University of Kansas, USA Markus Müller-Olm Westfälische Wilhelms-Universität Münster, Germany Tim Nelson Brown University, USA Scott Owens University of Kent, UK Luca Padovani Università di Torino, Italy Brigitte Pientka McGill University, Canada Zhong Shao Yale University, USA Alexandra Silva University College London, UK David Walker Princeton University, USA

# Additional Reviewers

Andersen, Kristoffer Just Asai, Kenichi Atkey, Robert Avanzini, Martin Berger, Martin Bernardi, Giovanni Bocchi, Laura Bracevac, Oliver Byrd, William Cano, Mauricio

Cohen, Liron Contrastin, Mistral D'Osualdo, Emanuele Dahlqvist, Fredrik Delbianco, Germán Andrés Dezani, Mariangiola Docherty, Simon Fellleisen, Mattthias Frumin, Dan

Fränzle, Martin Genestier, Guillaume Ghyselen, Alexis Gratzer, Daniel Gregersen, Simon Gutsfeld, Jens Oliver Hackett, Jennifer Hamza, Jad Heo, Kihong Hirai, Yoichi

Hirokawa, Nao Jung, Ralf Kammar, Ohad Kappé, Tobias Katsumata, Shin-Ya Kenter, Sebastian Krebbers, Robbert Kuchen, Herbert Laird, James Lammich, Peter Lanese, Ivan Levy, Paul Blain Liu, Fengyun Mackie, Ian Martres, Guillaume Mazza, Damiano McLaughlin, Craig Meyer, Roland

Miltner, Anders Momigliano, Alberto Mutluergil, Suha Orhun Nakazawa, Koji Norman, Gethin Novotný, Petr Ohlenbusch, Marit Ohrem, Christoph Pavlogiannis, Andreas Peressotti, Marco Rogalewicz, Adam Sacerdoti Coen, Claudio Sammartino, Matteo Scalas, Alceste Sekiyama, Taro Sieczkowski, Filip Sighireanu, Mihaela Singer, Jeremy

Sjöberg, Vilhelm Staton, Sam Stiévenart, Quentin Sutherland, Julian Tanter, Éric Tate, Ross Thibodeau, David Timany, Amin Tsukada, Takeshi Ulbrich, Mattias Voorneveld, Niels Wang, Yuting Weber, Tjark Yamada, Akihisa Zdancewic, Steve Zinkov, Rob

# From Quadcopters to Helicopters: Formal Verification to Eliminate Exploitable Bugs (Abstract of Invited Talk)

Kathleen Fisher

Computer Science Department, Tufts University

For decades, formal methods have offered the promise of software that does not have exploitable bugs. Until recently, however, it has not been possible to verify software of sufficient complexity to be useful. Recently, that situation has changed. SeL4 [1] is an open-source operating system microkernel efficient enough to be used in a wide range of practical applications. It has been proven to be fully functionally correct, ensuring the absence of buffer overflows, null pointer exceptions, use-after-free errors, etc., and to enforce integrity and confidentiality properties.

The CompCert Verifying C Compiler [2] maps source C programs to provably equivalent assembly language, ensuring the absence of exploitable bugs in the compiler. A number of factors have enabled this revolution in the formal methods community, including increased processor speed, better infrastructure like the Isabelle/HOL and Coq theorem provers, specialized logics for reasoning about low-level code, increasing levels of automation afforded by tactic languages and SAT/SMT solvers, and the decision to move away from trying to verify existing artifacts and instead focus on co-developing the code and the correctness proof.

In this talk I will explore the promise and limitations of current formal methods techniques for producing useful software that provably does not contain exploitable bugs. I will discuss these issues in the context of DARPA's HACMS program, which had as its goal the creation of high-assurance software for vehicles, including quad-copters, helicopters, and automobiles. This talk summarizes the goals and results of the HACMS program, which are described in more detail in a recent paper written by the speaker and the two other DARPA program managers who oversaw the HACMS program [3].

#### References

1. Klein, G., et al.: Comprehensive formal verification of an OS microkernel. ACM Trans. Comput. Syst. 32(1), 2:1–2:70 (2014). http://doi.acm.org/10.1145/2560537


# Contents

#### Program Verification




xiv Contents


# Program Verification

# Time Credits and Time Receipts in Iris

Glen Mével<sup>1</sup>, Jacques-Henri Jourdan2(B) , and François Pottier<sup>1</sup>

<sup>1</sup> Inria, Paris, France

<sup>2</sup> CNRS, LRI, Univ. Paris Sud, Université Paris Saclay, Orsay, France jacques-henri.jourdan@lri.fr

Abstract. We present a machine-checked extension of the program logic Iris with time credits and time receipts, two dual means of reasoning about time. Whereas time credits are used to establish an upper bound on a program's execution time, time receipts can be used to establish a lower bound. More strikingly, time receipts can be used to prove that certain undesirable events—such as integer overflows—cannot occur until a very long time has elapsed. We present several machine-checked applications of time credits and time receipts, including an application where both concepts are exploited.

"Alice: How long is forever? White Rabbit: Sometimes, just one second." *—* Lewis Carroll, *Alice in Wonderland*

# 1 Introduction

A program logic, such as Hoare logic or Separation Logic, is a set of deduction rules that can be used to reason about the behavior of a program. To this day, considerable effort has been invested in developing ever-more-powerful program logics that control the *extensional* behavior of programs, that is, logics that guarantee that a program safely computes a valid final result. A lesser effort has been devoted to logics that allow reasoning not just about safety and functional correctness, but also about *intensional* aspects of a program's behavior, such as its time consumption and space usage.

In this paper, we are interested in narrowing the gap between these lines of work. We present a formal study of two mechanisms by which a standard program logic can be extended with means of reasoning about time. As a starting point, we take Iris [11–14], a powerful evolution of Concurrent Separation Logic [3]. We extend Iris with two elementary time-related concepts, namely *time credits* [1, 4,9] and *time receipts*.

Time credits and time receipts are independent concepts: it makes sense to extend a program logic with either of them in isolation or with both of them simultaneously. They are dual concepts: every computation step *consumes one time credit* and *produces one time receipt*. They are purely static: they do not exist at runtime. We view them as Iris assertions. Thus, they can appear in the correctness statements that we formulate about programs and in the proofs of these statements.

Time credits can be used to establish an upper bound on the execution time of a program. Dually, time receipts can be used to establish a lower bound, and (as explained shortly) can be used to prove that certain undesirable events cannot occur until a very long time has elapsed.

Until now, time credits have been presented as an ad hoc extension of some fixed flavor of Separation Logic [1,4,9]. In contrast, we propose a construction which in principle allows time credits to be introduced on top of an arbitrary "base logic", provided this base logic is a sufficiently rich variety of Separation Logic. In order to make our definitions and proofs more concrete, we use Iris as the base logic. Our construction involves *composing* the base logic with a program transformation that inserts a *tick*() instruction in front of every computation step. As far as a user of the composite logic is concerned, the *tick*() instruction and the assertion \$1, which represents one time credit, are abstract: the only fact to which the user has access is the Hoare triple {\$1} *tick*() {True}, which states that "*tick*() consumes one time credit".

There are two reasons why we choose Iris [12] as the base logic. First, in the proof of soundness of the composite logic, we must exhibit concrete definitions of *tick* and \$1 such that {\$1} *tick*() {True} holds. Several features of Iris, such as ghost state and shared invariants, play a key role in this construction. Second, at the user level, the power of Iris can also play a crucial role. To illustrate this, we present the first machine-checked reconstruction of Okasaki's debits [19] in terms of time credits. The construction makes crucial use of both time credits and Iris' ghost monotonic state and shared invariants.

Time receipts are a new concept, a contribution of this paper. To extend a base logic with time receipts, we follow the exact same route as above: we compose the base logic with the *same* program transformation as above, which we refer to as "the tick translation". In the eyes of a user of the composite logic, the *tick*() instruction and the assertion - 1, which represents one time receipt, are again abstract: this time, the only published fact about *tick* is the triple {True} *tick*() {-1}, which states that "*tick*() produces one time receipt".

Thus far, the symmetry between time credits and time receipts seems perfect: whereas time credits allow establishing an upper bound on the cost of a program fragment, time receipts allow establishing a lower bound. This raises a pragmatic question, though: why invest effort, time and money into a formal proof that a piece of code is slow? What might be the point of such an endeavor? Taking inspiration from Clochard *et al*. [5], we answer this question by turning slowness into a quality. If there is a certain point at which a process might fail, then by showing that this process is slow, we can show that failure is far away into the future. More specifically, Clochard *et al*. propose two abstract types of integer counters, dubbed "one-time" integers and "peano" integers, and provide a paper proof that these counters cannot overflow in a feasible time: that is, it would take infeasible time (say, centuries) for an execution to reach a point where overflow actually occurs. To reflect this idea, we abandon the symmetry between time credits and time receipts and publish a fact about time receipts which has no counterpart on the time-credit side. This fact is an implication: - N -False, that is, "N time receipts imply False". The global parameter N can be adjusted so as to represent one's idea of a running time that is infeasible, perhaps due to physical limitations, perhaps due to assumptions about the conditions in which the software is operated. In this paper, we explain what it means for the composite program logic to remain sound in the presence of this axiom, and provide a formal proof that Iris, extended with time receipts, is indeed sound. Furthermore, we verify that Clochard *et al*.'s ad hoc concepts of "one-time" integers and "peano" integers can be reconstructed in terms of time receipts, a more fundamental concept.

Finally, to demonstrate the combined use of time credits and receipts, we present a proof of the Union-Find data structure, where credits are used to express an amortized time complexity bound and receipts are used to prove that a node's integer rank cannot overflow, even if it is stored in very few bits.

In summary, the contributions of this paper are as follows:


All of the results reported in this paper have been checked in Coq [17].

#### 2 A User's Overview of Time Credits and Time Receipts

#### 2.1 Time Credits

A small number of axioms, presented in Fig. 1, govern time credits. The assertion \$<sup>n</sup> denotes <sup>n</sup> time credits. The splitting axiom, a logical equivalence, means that *time credits can be split and combined*. Because Iris is an affine logic, it is implicitly understood that *time credits cannot be duplicated, but can be thrown away*.

The axiom timeless(\$n) means that time credits are independent of Iris' stepindexing. In practice, this allows an Iris invariant that involves time credits to be acquired without causing a "later" modality to appear [12, §5.7]. The reader can safely ignore this detail.

The last axiom, a Hoare triple, means that *every computation step requires and consumes one time credit*. As in Iris, the postconditions of our Hoare triples are λ-abstractions: they take as a parameter the return value of the term. At this point, *tick* () can be thought of as a pseudo-instruction that has no runtime effect and is implicitly inserted in front of every computation step.


Fig. 1. The axiomatic interface *TCIntf* of time credits

Fig. 2. The axiomatic interface of exclusive time receipts (further enriched in Fig. 3)

Time credits can be used to express *worst-case time complexity guarantees*. For instance, a sorting algorithm could have the following specification:

$$\begin{array}{c} \{array(a, xs)\* \ n = |xs| \ \* \ \\$(6n\log n) \} \\ \qquad sort(a) \\ \{array(a, xs') \land xs' = \dots \} \end{array}$$

Here, *array*(*a*, xs) asserts the existence and unique ownership of an array at address *a*, holding the sequence of elements xs. This Hoare triple guarantees not only that the function call *sort*(*a*) runs safely and has the effect of sorting the array at address *<sup>a</sup>*, but also that *sort*(*a*) runs in at most 6<sup>n</sup> log <sup>n</sup> time steps, where n is the length of the sequence xs, that is, the length of the array. Indeed, only 6<sup>n</sup> log <sup>n</sup> time credits are provided in the precondition, so the algorithm does not have permission to run for a greater number of steps.

#### 2.2 Time Receipts

In contrast with time credits, time receipts are a new concept, a contribution of this paper. We distinguish two forms of time receipts. The most basic form, *exclusive time receipts*, is the dual of time credits, in the sense that *every computation step produces one time receipt*. The second form, *persistent time receipts*, exhibits slightly different properties. Inspired by Clochard *et al*. [5], we show that time receipts can be used to *prove that certain undesirable events, such as integer overflows, cannot occur unless a program is allowed to execute for a very, very long time*—typically centuries. In the following, we explain that exclusive time receipts allow reconstructing Clochard *et al*.'s "one-time" integers [5, §3.2], which are so named because they are not duplicable, whereas persistent time receipts allow reconstructing their "peano" integers [5, §3.2], which are so named because they do not support unrestricted addition.

Exclusive time receipts. The assertion n denotes n time receipts. Like time credits, these time receipts are "exclusive", by which we mean that they are not duplicable. The basic laws that govern exclusive time receipts appear in Fig. 2. They are the same laws that govern time credits, with two differences. The first difference is that time receipts are the dual of time credits: the specification of *tick*, in this case, states that *every computation step produces one time receipt*. 1 The second difference lies in the last axiom of Fig. 2, which has no analogue in Fig. 1, and which we explain below.

In practice, how do we expect time receipts to be exploited? They can be used to prove lower bounds on the execution time of a program: if the Hoare triple {True} p { n} holds, then the execution of the program p cannot terminate in less than n steps. Inspired by Clochard *et al*. [5], we note that time receipts can also be used to *prove that certain undesirable events cannot occur in a feasible time*. This is done as follows. Let N be a fixed integer, chosen large enough that a modern processor cannot possibly execute N operations in a feasible time.<sup>2</sup> The last axiom of Fig. 2, - N - False, states that N time receipts imply a contradiction.<sup>3</sup> This axiom informally means that *we won't compute for* N *time steps*, because we cannot, or because we promise not to do such a thing. A consequence of this axiom is that n implies n<N: that is, *if we have observed n time steps, then* n *must be small.*

Adopting this axiom weakens the guarantee offered by the program logic. A Hoare triple {True} p {True} no longer implies that the program p is forever safe. Instead, it means that <sup>p</sup> is (<sup>N</sup> <sup>−</sup>1)-safe: the execution of <sup>p</sup> cannot go wrong until at least <sup>N</sup> <sup>−</sup> 1 steps have been taken. Because <sup>N</sup> is very large, for many practical purposes, this is good enough.

How can this axiom be exploited in practice? We hinted above that it can be used to prove the absence of certain integer overflows. Suppose that we wish to use signed w-bit machine integers as a representation of mathematical integers. (For instance, let w be 64.) Whenever we perform an arithmetic operation, such as an addition, we must prove that no overflow can occur. This is reflected in the specification of the addition of two machine integers:

$$\begin{aligned} \{\iota(x\_1) = n\_1 \, \* \, \iota(x\_2) = n\_2 \, \* \, -2^{w-1} \le n\_1 + n\_2 < 2^{w-1}\} \\ \operatorname{add}(x\_1, x\_2) \\ \{\lambda x. \iota(x) = n\_1 + n\_2\} \end{aligned}$$

Here, the variables x<sup>i</sup> denote machine integers, while the auxiliary variables n<sup>i</sup> denote mathematical integers, and the function ι is the injection of machine integers into mathematical integers. The conjunct <sup>−</sup>2<sup>w</sup>−<sup>1</sup> <sup>≤</sup> <sup>n</sup><sup>1</sup> <sup>+</sup> <sup>n</sup><sup>2</sup> <sup>&</sup>lt; <sup>2</sup><sup>w</sup>−<sup>1</sup> in the precondition represents an obligation to prove that no overflow can occur.

<sup>1</sup> For now, we discuss time credits and time receipts separately, which is why we have different specifications for *tick* in either case. They are combined in Sect. 6.

<sup>2</sup> For a specific example, let *N* be 2<sup>63</sup>. Clochard *et al*. note that, even at the rate of one billion operations per second, it takes more than 292 years to execute 2<sup>63</sup> operations. On a 64-bit machine, 2<sup>63</sup> is also the maximum representable signed integer, plus one.

<sup>3</sup> The connective - is an Iris view shift, that is, a transition that can involve a side effect on ghost state.

Suppose now that the machine integers x<sup>1</sup> and x<sup>2</sup> represent the lengths of two disjoint linked lists that we wish to concatenate. To construct each of these lists, we must have spent a certain amount of time: as proofs of this work, let us assume that the assertions n<sup>1</sup> and n<sup>2</sup> are at hand. Let us further assume that the word size w is sufficiently large that it takes a very long time to count up to the largest machine integer. That is, let us make the following assumption:

$$\underset{\dots}{N} \le 2^{w-1} \tag{\text{large word size assumption}}$$

(E.g., with <sup>N</sup> = 2<sup>63</sup> and <sup>w</sup> = 64, this holds.) Then, we can prove that the addition of x<sup>1</sup> and x<sup>2</sup> is permitted. This goes as follows. From the separating conjunction n<sup>1</sup> ∗ n2, we get -(n<sup>1</sup> <sup>+</sup> <sup>n</sup><sup>2</sup>). The existence of these time receipts allows us to deduce <sup>0</sup> <sup>≤</sup> <sup>n</sup><sup>1</sup> <sup>+</sup> <sup>n</sup><sup>2</sup> < N, which implies <sup>0</sup> <sup>≤</sup> <sup>n</sup><sup>1</sup> <sup>+</sup> <sup>n</sup><sup>2</sup> <sup>&</sup>lt; <sup>2</sup><sup>w</sup>−<sup>1</sup>. Thus, the precondition of the addition operation *add*(x1, x<sup>2</sup>) is met.

In summary, we have just verified that the addition of two machine integers satisfies the following alternative specification:

$$\begin{aligned} \{\iota(x\_1) = n\_1 \, \* \, \blacksquare \, n\_1 \, \* \, \iota(x\_2) = n\_2 \, \* \, \blacksquare \, n\_2\} \\ \{\lambda x. \iota(x) = n\_1 + n\_2 \, \* \, \blacksquare (n\_1 + n\_2)\} \end{aligned}$$

This can be made more readable and more abstract by defining a "clock" to be a machine integer <sup>x</sup> accompanied with <sup>ι</sup>(x) time receipts:

*clock*(x) <sup>∃</sup>n.(ι(x) = <sup>n</sup> <sup>∗</sup> n)

Then, the above specification of addition can be reformulated as follows:

$$\begin{array}{c} \{ \operatorname{cock}(x\_1) \, \* \, \operatorname{cock}(x\_2) \} \\ \operatorname{add}(x\_1, x\_2) \\ \{ \lambda x. \operatorname{cock}(x) \, \* \, \iota(x) = \iota(x\_1) + \iota(x\_2) \} \end{array}$$

In other words, clocks support unrestricted addition, without any risk of overflow. However, because time receipts cannot be duplicated, neither can clocks: *clock*(x) does not entail *clock*(x) <sup>∗</sup> *clock*(x). In other words, a clock is uniquely owned. One can think of a clock x as a *hard-earned integer* : the owner of this clock has spent x units of time to obtain it.

Clocks are a reconstruction of Clochard *et al*.'s "one-time integers" [5], which support unrestricted addition, but cannot be duplicated. Whereas Clochard *et al*. view one-time integers as a primitive concept, and offer a direct paper proof of their soundness, we have just reconstructed them in terms of a more elementary notion, namely time receipts, and in the setting of a more powerful program logic, whose soundness is machine-checked, namely Iris.

Persistent time receipts. In addition to exclusive time receipts, it is useful to introduce a persistent form of time receipts.<sup>4</sup> The axioms that govern both exclusive and persistent time receipts appear in Fig. 3.

<sup>4</sup> Instead of viewing persistent time receipts as a primitive concept, one could define them as a library on top of exclusive time receipts. Unfortunately, this construction leads to slightly weaker laws, which is why we prefer to view them as primitive.


Fig. 3. The axiomatic interface *TRIntf* of time receipts

We write n for a persistent receipt, a witness that at least n units of time have elapsed. (We avoid the terminology "n persistent time receipts", in the plural form, because persistent time receipts are not additive. We view n as one receipt whose face value is n.) This assertion is persistent, which in Iris terminology means that once it holds, it holds forever. This implies, in particular, that it is duplicable: n ≡ n ∗ n. It is created just by observing the existence of n exclusive time receipts, as stated by the following axiom, also listed in Fig. 3: n - n ∗ n. Intuitively, someone who has access to the assertion n is someone who knows that n units of work have been performed, even though they have not necessarily "personally" performed that work. Because this knowledge is not exclusive, the conjunction <sup>n</sup><sup>1</sup> <sup>∗</sup> <sup>n</sup><sup>2</sup> does not entail (n<sup>1</sup> <sup>+</sup>n<sup>2</sup>). Instead, we have the following axiom, also listed in Fig. 3: (max(n1, n<sup>2</sup>)) <sup>≡</sup> <sup>n</sup><sup>1</sup> <sup>∗</sup> <sup>n</sup>2.

More subtly, the specification of *tick* in Fig. 3 is stronger than the one in Fig. 2. According to this strengthened specification, *tick* () does not just produce an exclusive receipt - 1. In addition to that, if a persistent time receipt <sup>n</sup> is at hand, then *tick* () is able to increment it and to produce a new persistent receipt (<sup>n</sup> + 1), thus reflecting the informal idea that a *new* unit of time has just been spent. A user who does not wish to make use of this feature can pick <sup>n</sup> = 0 and recover the specification of *tick* in Fig. 2 as a special case.

Finally, because n means that n steps have been taken, and because we promise never to reach N steps, we adopt the axiom N - False, also listed in Fig. 3. It implies the earlier axiom - N - False, which is therefore not explicitly shown in Fig. 3.

In practice, how are persistent time receipts exploited? By analogy with clocks, let us define a predicate for a machine integer <sup>x</sup> accompanied with <sup>ι</sup>(x) persistent time receipts:

$$\land snapclock(x) \triangleq \exists n. (\iota(x) = n \,\,\, \ast \,\, \boxtimes n)$$

By construction, this predicate is persistent, therefore duplicable:

$$Isnapclock(x) \equiv snapclock(x) \; \* \; snapclock(x)$$

We refer to this concept as a "snapclock", as it is not a clock, but can be thought of as a snapshot of some clock. Thanks to the axiom k - k ∗ k, we have:

$$c 
stackrel{x}{
mapsto} 
dashk(x) 
$$

Furthermore, snapclocks have the valuable property that, by performing just one step of extra work, a snapclock can be incremented, yielding a new snapclock that is greater by one. That is, the following Hoare triple holds:

$$\begin{array}{c} \{snap(x) \} \\ trick \ (); add(x,1) \\ \{ \lambda x'.snap(x') \; \* \; \iota(x') = \iota(x) + 1 \} \end{array}$$

The proof is not difficult. Unfolding *snapclock*(x) in the precondition yields <sup>n</sup>, where <sup>ι</sup>(x) = <sup>n</sup>. As per the strengthened specification of *tick*, the execution of *tick* () then yields - 1 <sup>∗</sup> (n+ 1). As in the case of clocks, the assertion (n+ 1) implies 0 <sup>≤</sup> <sup>n</sup> + 1 <sup>&</sup>lt; 2<sup>w</sup>−<sup>1</sup>, which means that no overflow can occur. Finally, - 1 is thrown away and (n+1) is used to justify *snapclock*(x ) in the postcondition.

Adding two arbitrary snapclocks x<sup>1</sup> and x<sup>2</sup> is illegal: from the sole assumption *snapclock*(x<sup>1</sup>) <sup>∗</sup> *snapclock*(x<sup>2</sup>), one cannot prove that the addition of <sup>x</sup><sup>1</sup> and <sup>x</sup><sup>2</sup> won't cause an overflow, and one cannot prove that its result is a valid snapclock. However, snapclocks do support a restricted form of addition. The addition of two snapclocks x<sup>1</sup> and x<sup>2</sup> is safe, and produces a valid snapclock x, provided it is known ahead of time that its result is less than some preexisting snapclock y:

$$\begin{aligned} \{ \begin{aligned} \{ \begin{aligned} & \iota \alpha p \text{cock} (x\_1) \ \* & \iota \alpha p \text{cock} (x\_2) \ \* & \iota (x\_1 + x\_2) \le \iota (y) \ \* & \iota \alpha p \text{cock} (y) \} \} \} \\ & \iota \text{ad} (x\_1, x\_2) \\ & \{ \lambda x. \operatorname{sna} p \text{cock} (x) \ \* & \iota (x) = \iota (x\_1) + \iota (x\_2) \} \end{aligned} \} \end{aligned} $$

Snapclocks are a reconstruction of Clochard *et al*.'s "peano integers" [5], which are so named because they do not support unrestricted addition. Clocks and snapclocks represent different compromises: whereas clocks support addition but not duplication, snapclocks support duplication but not addition. They are useful in different scenarios: as a rule of thumb, if an integer counter is involved in the implementation of a mutable data structure, then one should attempt to view it as a clock; if it is involved in the implementation of a persistent data structure, then one should attempt to view it as a snapclock.

### 3 HeapLang and the Tick Translation

In the next section (Sect. 4), we extend Iris with time credits, yielding a new program logic Iris\$. We do this *without modifying* Iris. Instead, we *compose* Iris with a program transformation, the "tick translation", which inserts *tick*() instructions into the code in front of every computation step. In the construction of Iris-, our extension of Iris with time receipts, the tick translation is exploited in a similar way (Sect. 5). In this section, we define the tick translation and state some of its properties.

Iris is a generic program logic: it can be instantiated with an arbitrary calculus for which a small-step operational semantics is available [12]. Ideally, our extension of Iris should take place at this generic level, so that it, too, can be instantiated for an arbitrary calculus. Unfortunately, it seems difficult to define the tick translation and to prove it correct in a generic manner. For this reason, we choose to work in the setting of HeapLang [12], an untyped λ-calculus equipped with Booleans, signed machine integers, products, sums, recursive functions, references, and shared-memory concurrency. The three standard operations on mutable references, namely allocation, reading, and writing, are available. A compare-and-set operation (e1, e2, e<sup>3</sup>) and an operation for spawning a new thread are also provided. As the syntax and operational semantics of HeapLang are standard and very much irrelevant in this paper, we omit them. They appear in our online repository [17].

The tick translation transforms a HeapLang expression e to a HeapLang expression e*tick* . It is parameterized by a value *tick*. Its effect is to insert a call to *tick* in front of every operation in the source expression e. The translation of a function application, for instance, is as follows:

$$\langle\langle e\_1 \begin{pmatrix} e\_2 \end{pmatrix}\rangle\rangle\_{tick} = trick \left(\langle\langle e\_1\rangle\rangle\_{tick}\right) \left(\langle\langle e\_2\rangle\rangle\_{tick}\right).$$

For convenience, we assume that *tick* can be passed an arbitrary value v as an argument, and returns v. Because evaluation in HeapLang is call-by-value and happens to be right-to-left<sup>5</sup>, the above definition means that, after evaluating the argument e2*tick* and the function e1*tick* , we invoke *tick*, then carry on with the function call. This translation is syntactically well-behaved: it preserves the property of being a value, and commutes with substitution. This holds for every value *tick*.

 ${\text{ $tick\_c \triangleq \text{rec } self}}(x) = \text{$ \\\mathbf{1} \text{et } k = !c \text{ in} \\ \text{if } k = 0 \text{ then } \operatorname{oops}() \\ \text{else if } \mathbf{CAS}(c, k, k-1) \text{ then } x \text{ \texttt{o1s} } \operatorname{self}(x)$ 

# Fig. 4. Implementation of *tick <sup>c</sup>* in HeapLang

As far the end user is concerned, *tick* remains abstract (Sect. 2). Yet, in our constructions of Iris\$ and Iris-, we must provide a concrete implementation of it in HeapLang. This implementation, named *tick <sup>c</sup>*, appears in Fig. 4. A global

<sup>5</sup> If HeapLang used left-to-right evaluation, the definition of the translation would be slightly different, but the lemmas that we prove would be the same.

integer counter *c* stores the number of computation steps that the program is still allowed to take. The call *tick <sup>c</sup>* () decrements a global counter *<sup>c</sup>*, if this counter holds a nonzero value, and otherwise invokes *oops* ().

At this point, the memory location *c* and the value *oops* are parameters.

We stress that *tick <sup>c</sup>* plays a role only in the proofs of soundness of Iris\$ and Iris-. It is never actually executed, nor is it shown to the end user.

Once *tick* is instantiated with *tick <sup>c</sup>*, one can prove that the translation is correct in the following sense: the translated code takes the same computation steps as the source code and additionally keeps track of how many steps are taken. More specifically, if the source code can make n computation steps, and if *<sup>c</sup>* is initialized with a value <sup>m</sup> that is sufficiently large (that is, <sup>m</sup> <sup>≥</sup> <sup>n</sup>), then the translated code can make n computation steps as well, and *c* is decremented from m to m − n in the process.

Lemma 1 (Reduction Preservation). *Assume there is a reduction sequence:*

$$(T\_1, \sigma\_1) \to\_{\mathbf{tp}}^n (T\_2, \sigma\_2)$$

*Assume <sup>c</sup> is fresh for this reduction sequence. Let* <sup>m</sup> <sup>≥</sup> <sup>n</sup>*. Then, there exists a reduction sequence:*

$$(\langle\langle T\_1 \rangle\rangle, \langle\langle \sigma\_1 \rangle\rangle \left[c \leftarrow m\right]) \quad \rightarrow\_{\mathsf{tp}}^{\*} (\langle\langle T\_2 \rangle\rangle, \langle\langle \sigma\_2 \rangle\rangle \left[c \leftarrow m - n\right])$$

In this statement, the metavariable T stands for a thread pool, while σ stands for a heap. The relation →tp is HeapLang's "threadpool reduction". For the sake of brevity, we write just e for e*tickc* , that is, for the translation of the expression e, where *tick* is instantiated with *tick <sup>c</sup>*. This notation is implicitly dependent on the parameters *c* and *oops*.

The above lemma holds for every choice of *oops*. Indeed, because the counter *c* initially holds the value m, and because we have m ≥ n, the counter is never about to fall below zero, so *oops* is never invoked.

The next lemma also holds for every choice of *oops*. It states that if the translated program is safe and if the counter *c* has not yet reached zero then the source program is not just about to crash.

Lemma 2 (Immediate Safety Preservation). *Assume c is fresh for* e*. Let* m > 0*. If the configuration* (e,σ[*<sup>c</sup>* <sup>←</sup>m]) *is safe, then either* <sup>e</sup> *is a value or the configuration* (e, σ) *is reducible.*

By combining Lemmas 1 and 2 and by contraposition, we find that safety is preserved backwards, as follows: if, when the counter *c* is initialized with m, the translated program e is safe, then the source program e is m-safe.

Lemma 3 (Safety Preservation). *If for every location c the configuration* (T,σ[*<sup>c</sup>* <sup>←</sup> <sup>m</sup>]) *is safe, then the configuration* (T,σ) *is* <sup>m</sup>*-safe.*

### 4 Iris with Time Credits

The authors of Iris [12] have used Coq both to check that Iris is sound and to offer an implementation of Iris that can be used to carry out proofs of programs. The two are tied: if {True} p {True} can be established by applying the proof rules of Iris, then one gets a self-contained Coq proof that the program p is safe.

In this section, we temporarily focus on time credits and explain how we extend Iris with time credits, yielding a new program logic Iris\$. The new logic is defined in Coq and still offers an end-to-end guarantee: if {\$k} <sup>p</sup> {True} can be established in Coq by applying the proof rules of Iris\$, then one has proved in Coq that p is safe and runs in at most k steps.

To define Iris\$, we compose Iris with the tick translation. We are then able to argue that, because this program transformation is operationally correct (that is, it faithfully accounts for the passing of time), and because Iris is sound (that is, it faithfully approximates the behavior of programs), the result of the composition is a sound program logic that is able to reason about time.

In the following, we view the interface *TCIntf* as explicitly parameterized over \$ and *tick*. Thus, we write "*TCIntf* (\$) *tick*" for the separating conjunction of all items in Fig. <sup>1</sup> except the declarations of \$ and *tick*.

We require the end user, who wishes to perform proofs of programs in Iris\$, to work with Iris\$ triples, which are defined as follows:

Definition 1 (Iris\$ triple). *An* Iris\$ triple {P} e {Φ}\$ *is syntactic sugar for:*

<sup>∀</sup>(\$ : <sup>N</sup> <sup>→</sup> *iProp*) <sup>∀</sup>*tick TCIntf* (\$) *tick* −∗ {P} e*tick* {Φ}

Thus, an Iris\$ triple is in reality an Iris triple about the instrumented expression e*tick* . While proving this Iris triple, the end user is given an abstract view of the predicate \$ and the instruction *tick*. He does not have access to their concrete definitions, but does have access to the laws that govern them.

We prove that Iris\$ is sound in the following sense:

Theorem 1 (Soundness of Iris\$). *If* {\$n} <sup>e</sup> {True}\$ *holds, then the machine configuration* (e, <sup>∅</sup>)*, where* <sup>∅</sup> *is the empty heap, is safe and terminates in at most* n *steps.*

In other words, a program that is initially granted n time credits cannot run for more than n steps. To establish this theorem, we proceed roughly as follows:


Step 1. Our first step is to provide an implementation of *tick*. As announced earlier (Sect. 3), we use *tick <sup>c</sup>* (Fig. 4). We instantiate the parameter *oops* with *crash*, an arbitrary function whose application is unsafe. (That is, *crash* is chosen so that *crash* () reduces to a stuck term.) For the moment, *<sup>c</sup>* remains a parameter.

With these concrete choices of *tick* and *oops*, the translation transforms an out-of-time-budget condition into a hard crash. Because Iris forbids crashes, Iris\$, which is the composition of the translation with Iris, will forbid out-oftime-budget conditions, as desired.

For technical reasons, we need two more lemmas about the translation, whose proofs rely on the fact that *oops* is instantiated with *crash*. They are slightly modified or strengthened variants of Lemmas 2 and 3. First, if the source code can take one step, then the translated code, supplied with zero budget, crashes. Second, if the translated code, supplied with a runtime budget of m, does *not* crash, then the source code terminates in at most m steps.

Lemma 4 (Credit Exhaustion). *Suppose the configuration* (T,σ) *is reducible. Then, for all <sup>c</sup>, the configuration* (T,σ[*<sup>c</sup>* <sup>←</sup>0]) *is unsafe.*

Lemma 5 (Safety Preservation, Strengthened). *If for every location c the configuration* (T,σ[*<sup>c</sup>* <sup>←</sup>m]) *is safe, then* (T,σ) *is safe and terminates in at most* m *steps.*

Step 2. Our second step, roughly, is to exhibit a definition of \$ : <sup>N</sup> <sup>→</sup> *iProp* such that *TCIntf* (\$) *tick <sup>c</sup>* is satisfied. That is, we would like to prove something along the lines of: <sup>∃</sup>(\$ : <sup>N</sup> <sup>→</sup> *iProp*) *TCIntf* (\$) *tick <sup>c</sup>*. However, these informal sentences do not quite make sense. This formula is not an ordinary proposition: it is an Iris assertion, of type *iProp*. Thus, it does not make sense to say that this formula "is true" in an absolute manner. Instead, we prove in Iris that we can *make this assertion true* by performing a view shift, that is, a number of operations that have no runtime effect, such as allocating a ghost location and imposing an invariant that ties this ghost state with the physical state of the counter *c*. This is stated as follows:

Lemma 6 (Time Credit Initialization). *For every c and* n*, the following Iris view shift holds:*

$$(\ (c \mapsto n) \quad \Rightarrow \ \top \ \exists (\\$ : \mathbb{N} \to iProp) \quad \quad (\ TCIntf \ (\\$ \ ) \ tick\_c \ \* \ \\$ n))$$

In this statement, on the left-hand side of the view shift symbol, we find the "points-to" assertion *<sup>c</sup>* <sup>→</sup> <sup>n</sup>, which represents the unique ownership of the memory location *c* and the assumption that its initial value is n. This assertion no longer appears on the right-hand side of the view shift. This reflects the fact that, when the view shift takes place, it becomes impossible to access *c* directly; the only way of accessing it is via the operation *tick <sup>c</sup>*.

On the right-hand side of the view shift symbol, beyond the existential quantifier, we find a conjunction of the assertion *TCIntf* (\$) *tick <sup>c</sup>*, which means that the laws of time credits are satisfied, and \$n, which means that there are initially n time credits in existence.

In the interest of space, we provide only a brief summary of the proof of Lemma 6; the reader is referred to the extended version of this paper [18, Appendix A] for more details. In short, the assertion \$1 is defined in such a way that it represents an exclusive contribution of one unit to the current value of the global counter c. In other words, we install the following invariant: at every time, the current value of c is (at least) the sum of all time credits in existence. Thus, the assertion \$1 guarantees that <sup>c</sup> is nonzero, and can be viewed as a permission to decrement c by one. This allows us to prove that the specification of *tick* in Fig. 1 is satisfied by our concrete implementation *tick <sup>c</sup>*. In particular, *tick <sup>c</sup>* cannot cause a crash: indeed, under the precondition \$1, <sup>c</sup> is not in danger of falling below zero, and *crash* () is not executed—it is in fact dead code.

Step 3. In the last reasoning step, we complete the proof of Theorem 1. The proof is roughly as follows. Suppose the end user has established {\$n} <sup>e</sup> {True}\$. By Safety Preservation, Strengthened (Lemma 5), to prove that (e, <sup>∅</sup>) is safe and runs in at most n steps, it suffices to show (for an arbitrary location *c*) that the translated expression e, executed in the initial heap <sup>∅</sup> [*<sup>c</sup>* <sup>←</sup>n], is safe. To do so, beginning with this initial heap, we perform Time Credit Initialization, that is, we execute the view shift whose statement appears in Lemma 6. This yields an abstract predicate \$ as well as the assertions *TCIntf* (\$) *tick* and \$n. At this point, we unfold the Iris\$ triple {\$n} <sup>e</sup> {True}\$, yielding an implication (see Definition 1), and apply it to \$, to *tick <sup>c</sup>*, and to the hypothesis *TCIntf* (\$) *tick*. This yields the Iris triple {\$n} e {True}. Because we have \$<sup>n</sup> at hand and because Iris is sound [12], this implies that e is safe. This concludes the proof.

This last step is, we believe, where the modularity of our approach shines. Iris' soundness theorem is re-used as a black box, without change. In fact, any program logic other than Iris could be used as a basis for our construction, as along as it is expressive enough to prove Time Credit Initialization (Lemma 6). The last ingredient, Safety Preservation, Strengthened (Lemma 5), involves only the operational semantics of HeapLang, and is independent of Iris.

This was just an informal account of our proof. For further details, the reader is referred to the online repository [17].

#### 5 Iris with Time Receipts

In this section, we extend Iris with time receipts and prove the soundness of the new logic, dubbed Iris-. To do so, we follow the scheme established in the previous section (Sect. 4), and compose Iris with the tick translation.

From here on, let us view the interface of time receipts as parameterized over -, , and *tick*. Thus, we write "*TRIntf* (-) () *tick*" for the separating conjunction of all items in Fig. 3 except the declarations of -, , and *tick*.

As in the case of credits, the user is given an abstract view of time receipts:

Definition 2 (Iris triple). *An* Iris triple {P} e {Φ}*is syntactic sugar for:*

∀(-, : <sup>N</sup> <sup>→</sup> *iProp*) <sup>∀</sup>*tick TRIntf* (-) () *tick* −∗ {P} e*tick* {Φ}

Theorem 2 (Soundness of Iris-). *If* {True} e {True} *holds, then the machine configuration* (e, <sup>∅</sup>) *is* (<sup>N</sup> <sup>−</sup> 1)*-safe.*

As indicated earlier, we assume that the end user is interested in proving that crashes cannot occur until a very long time has elapsed, which is why we state the theorem in this way.<sup>6</sup> Whereas an Iris triple {True} e {True} guarantees that e is safe, the Iris triple {True} e {True} guarantees that it takes at least <sup>N</sup> <sup>−</sup> 1 steps of computation for <sup>e</sup> to crash. In this statement, <sup>N</sup> is the global parameter that appears in the axiom N - False (Fig. 3). Compared with Iris, Iris provides a weaker safety guarantee, but offers additional reasoning principles, leading to increased convenience and modularity.

In order to establish Theorem 2, we again proceed in three steps:


Step 1. In this step, we keep our concrete implementation of *tick*, namely *tick <sup>c</sup>* (Fig. 4). One difference with the case of time credits, though, is that we plan to initialize *<sup>c</sup>* with <sup>N</sup> <sup>−</sup> 1. Another difference is that, this time, we instantiate the parameter *oops* with *loop*, where *loop* () is an arbitrary divergent term.<sup>7</sup>

Step 2. The next step is to prove that we are able to establish the time receipt interface. We prove the following:

Lemma 7 (Time Receipt Initialization). *For every location c, the following Iris view shift holds:*

$$(c \mapsto N - 1) \quad \boxplus\_{\top} \quad \exists (\mathtt{x}, \mathtt{x} : \mathbb{N} \to iProp) \quad \phantom{\rule{0.5pt}{T} \text{TR}} Intf \ (\mathtt{x}) \ (\mathtt{y}) \ tick\_{cc}$$

We provide only a brief summary of the proof of Lemma 7; for further details, the reader is referred to the extended version of this paper [18, Appendix B]. Roughly speaking, we install the invariant that *<sup>c</sup>* holds <sup>N</sup> <sup>−</sup>1−i, where <sup>i</sup> is some number that satisfies 0 <sup>≤</sup> i<N. We define n as an exclusive contribution of n units to the current value of i, and define n as an observation that i is at least n. (i grows with time, so such an observation is stable.) As part of the proof of the above lemma, we check that the specification of *tick* holds:

$$\{\boxtimes n\} \; trick \; (v) \; \{\lambda w. \; w = v \; \* \; \mathtt{T} \; 1 \; \* \; \boxtimes (n+1)\}$$

In contrast with the case of time credits, in this case, the precondition n does *not* guarantee that *<sup>c</sup>* holds a nonzero value. Thus, it *is* possible for *tick*() to be executed when *<sup>c</sup>* is zero. This is not a problem, though, because *loop*() is safe to execute in any situation: it satisfies the Hoare triple {True} *loop*() {False}. In other words, when *c* is about to fall below zero and therefore the invariant i<N seems about to be broken, *loop* () saves the day by running away and never allowing execution to continue normally.

<sup>6</sup> If the user instead wishes to establish a lower bound on a program's execution time, this is possible as well.

<sup>7</sup> In fact, it is not essential that *loop*() diverges. What matters is that *loop* satisfy the Iris triple *{*True*} loop*() *{*False*}*. A fatal runtime error that Iris does *not* rule out would work just as well, as it satisfies the same specification.

Step 3. In the last reasoning step, we complete the proof of Theorem 2. Suppose the end user has established {True} e {True}-. By Safety Preservation (Lemma 3), to prove that (e, <sup>∅</sup>) is (<sup>N</sup> <sup>−</sup>1)-safe, it suffices to show (for an arbitrary location *<sup>c</sup>*) that e, executed in the initial heap <sup>∅</sup> [*<sup>c</sup>* <sup>←</sup><sup>N</sup> <sup>−</sup> 1], is safe. To do so, beginning with this initial heap, we perform Time Receipt Initialization, that is, we execute the view shift whose statement appears in Lemma 7. This yields two abstract predicates and as well as the assertion *TRIntf* (-) () *tick*. At this point, we unfold {True} e {True}- (see Definition 2), yielding an implication, and apply this implication, yielding the Iris triple {True} e {True}. Because Iris is sound [12], this implies that e is safe. This concludes the proof. For further detail, the reader is again referred to our online repository [17].

#### 6 Marrying Time Credits and Time Receipts

It seems desirable to combine time credits and time receipts in a single program logic, Iris\$-. We have done so [17]. In short, following the scheme of Sects. 4 and 5, the definition of Iris\$ involves composing Iris with the tick translation. This time, *tick* serves two purposes: it consumes one time credit *and* produces one exclusive time receipt (and increments a persistent time receipt). Thus, its specification is as follows:

$$\{\\$1\*\boxtimes n\} \; \mathit{tick}\left(v\right) \; \{\lambda w. w = v \; \ast \mathtt{ } \mathtt{\mathtt{1}} \; \ast \boxtimes (n+1)\}$$

Let us write *TCTRIntf* (\$) (-) () *tick* for the combined interface of time credits and time receipts. This interface combines all of the axioms of Figs. 1 and 3, but declares a single *tick* function<sup>8</sup> and proposes a single specification for it, which is the one shown above.

#### Definition 3 (Iris\$ triple). *An* Iris\$ triple {P} e {Φ}\$ *stands for:*

<sup>∀</sup> (\$) (-) () *tick TCTRIntf* (\$) (-) () *tick* −∗ {P} e*tick* {Φ}

Theorem 3 (Soundness of Iris\$-). *If* {\$n} <sup>e</sup> {True}\$ *holds then the machine configuration* (e, <sup>∅</sup>) *is* (<sup>N</sup> <sup>−</sup> 1)*-safe. If furthermore* n<N *holds, then this machine configuration terminates in at most* n *steps.*

Iris\$ allows exploiting time credits to prove time complexity bounds and, at the same time, exploiting time receipts to prove the absence of certain integer overflows. Our verification of Union-Find (Sect. 8) illustrates these two aspects.

Guéneau *et al*. [7] use time credits to reason about asymptotic complexity, that is, about the manner in which a program's complexity grows as the size of its input grows towards infinity. Does such asymptotic reasoning make sense in Iris\$-, where no program is ever executed for N time steps or beyond? It

<sup>8</sup> Even though the interface provides only one *tick* function, it gets instantiated in the soundness theorem with different implementations depending on whether there are more than *N* time credits or not.

seems to be the case that if a program <sup>p</sup> satisfies the triple {\$n} <sup>p</sup> {Φ}\$ -, then it also satisfies the stronger triple {\$min(n, N)} <sup>p</sup> {Φ}\$ -, therefore also satisfies {\$N} <sup>p</sup> {Φ}\$ -. Can one therefore conclude that p has "constant time complexity"? We believe not. Provided N is considered a parameter, as opposed to a constant, one *cannot* claim that "<sup>N</sup> is <sup>O</sup>(1)", so {\$min(n, N)} <sup>p</sup> {Φ}\$ does not imply that "p runs in constant time". In other words, a universal quantification on N should come *after* the existential quantifier that is implicit in the O notation. We have not yet attempted to implement this idea; this remains a topic for further investigation.

# 7 Application: Thunks in Iris**\$**

In this section, we illustrate the power of Iris\$ by constructing an implementation of thunks as a library in Iris\$. A *thunk*, also known as a *suspension*, is a very simple data structure that represents a suspended computation. There are two operations on thunks, namely *create*, which constructs a new thunk, and *force*, which demands the result of a thunk. A thunk memoizes its result, so that even if it is forced multiple times, the computation only takes place once.

Okasaki [19] proposes a methodology for reasoning about the amortized time complexity of computations that involve shared thunks. For every thunk, he keeps track of a *debit*, which can be thought of as an amount of credit that one must still pay before one is allowed to force this thunk. A ghost operation, *pay*, changes one's view of a thunk, by reducing the debit associated with this thunk. *force* can be applied only to a zero-debit thunk, and has amortized cost <sup>O</sup>(1). Indeed, if this thunk has been forced already, then *force* really requires constant time; and if this thunk is being forced for the first time, then the cost of performing the suspended computation must have been paid for in advance, possibly in several installments, via *pay*. This discipline is sound even in the presence of sharing, that is, of multiple pointers to a thunk. Indeed, whereas duplicating a credit is unsound, duplicating a debit leads to an over-approximation of the true cost, hence is sound. Danielsson [6] formulates Okasaki's ideas as a type system, which he proves sound in Agda. Pilkiewicz and Pottier [20] reconstruct this type discipline in the setting of a lower-level type system, equipped with basic notions of time credits, hidden state, and monotonic state. Unfortunately, their type system is presented in an informal manner and does not come with a proof of type soundness.

We reproduce Pilkiewicz and Pottier's construction in the formal setting of Iris\$. Indeed, Iris\$ offers all of the necessary ingredients, namely time credits, hidden state (invariants, in Iris terminology) and monotonic state (a special case of Iris' ghost state). Our reconstruction is carried out inside Coq [17].

#### 7.1 Concurrency and Reentrancy

One new problem that arises here is that Okasaki's analysis, which is valid in a sequential setting, potentially becomes invalid in a concurrent setting. Suppose we wish to allow multiple threads to safely share access to a thunk. A natural, simple-minded approach would be to equip every thunk with a lock and allow competition over this lock. Then, unfortunately, forcing would become a blocking operation: one thread could waste time waiting for another thread to finish forcing. In fact, in the absence of a fairness assumption about the scheduler, an unbounded amount of time could be wasted in this way. This appears to invalidate the property that *force* has amortized cost <sup>O</sup>(1).

Technically, the manner in which this problem manifests itself in Iris\$ is in the specification of locks. Whereas in Iris a spin lock can be implemented and proved correct with respect to a simple and well-understood specification [2], in Iris\$, it cannot. The *lock*() method contains a potentially infinite loop: therefore, no finite amount of time credits is sufficient to prove that *lock*() is safe. This issue is discussed in greater depth later on (Sect. 9).

A distinct yet related problem is reentrancy. Arguably, an implementation of thunks should guarantee that a suspended computation is evaluated at most once. This guarantee seems particularly useful when the computation has a side effect: the user can then rely on the fact that this side effect occurs at most once. However, this property does not naturally hold: in the presence of heapallocated mutable state, it is possible to construct an ill-behaved "reentrant" thunk which, when forced, attempts to recursively force itself. Thus, something must be done to dynamically reject or statically prevent reentrancy. In Pilkiewicz and Pottier's code [20], reentrancy is detected at runtime, thanks to a three-color scheme, and causes a fatal runtime failure. In a concurrent system where each thunk is equipped with a lock, reentrancy is also detected at runtime, and turned into deadlock; but we have explained earlier why we wish to avoid locks.

Fortunately, Iris provides us with a static mechanism for forbidding both concurrency and reentrancy. We introduce a unique token E, which can be thought of as "permission to use the thunk API", and set things up so that *pay* and *force* require and return E. This forbids concurrency: two operations on thunks cannot take place concurrently. Furthermore, when a user-supplied suspended computation is executed, the token E is *not* transmitted to it. This forbids reentrancy.<sup>9</sup> The implementation of this token relies on Iris' "nonatomic invariants" (Sect. 7.4). With these restrictions, we are able to prove that Okasaki's discipline is sound.

#### 7.2 Implementation of Thunks

A simple implementation of thunks in HeapLang appears in Fig. 5. A thunk can be in one of two states: *White f* and *Black* v. A white thunk is unevaluated:

<sup>9</sup> Therefore, a suspended computation cannot force *any* thunk. This is admittedly a very severe restriction, which rules out many useful applications of thunks. In fact, we have implemented a more flexible discipline, where thunks can be grouped in multiple "regions" and there is one token per region instead of a single global E token. This discipline allows concurrent or reentrant operations on provably distinct thunks, yet can still be proven sound.

*create* λ*f* . (*White f* ) *force* λ*t*. ! *t White f* ⇒ v = *f* () *t Black* v ; v | *Black* v ⇒ v

# Fig. 5. An implementation of thunks


Fig. 6. A simple specification of thunks in Iris\$

the function *f* represents a suspended computation. A black thunk is evaluated: the value v is the result of the computation that has been performed already. Two colors are sufficient: because our static discipline rules out reentrancy, there is no need for a third color, whose purpose would be to dynamically detect an attempt to force a thunk that is already being forced.

#### 7.3 Specification of Thunks in Iris**\$**

Our specification of thunks appears in Fig. 6. It declares an abstract predicate isThunk *t* n Φ, which asserts that *t* is a valid thunk, that the debt associated with this thunk is n, and that this thunk (once forced) produces a value that satisfies the postcondition Φ. The number n, a *debit*, is the number of credits that remain to be paid before this thunk can be forced. The postcondition Φ is chosen by the user when a thunk is created. It must be duplicable (this is required in the specification of *force*) because *force* can be invoked several times and we must guarantee, every time, that the result v satisfies Φ v.

The second axiom states that isThunk *t* n Φ is a persistent assertion. This means that a valid thunk, once created, remains a valid thunk forever. Among other things, it is permitted to create two pointers to a single thunk and to reason independently about each of these pointers.

The third axiom states that isThunk *t* n Φ is covariant in its parameter n. Overestimating a debt still leads to a correct analysis of a program's worst-case time complexity.

Next, the specification declares an abstract assertion E, and provides the user with one copy of this assertion. We refer to it as "the thunderbolt".

The next item in Fig. 6 is the specification of *create*. It is higher-order: the precondition of *create* contains a specification of the function f that is passed as an argument to *create*. This axiom states that, if f represents a computation of cost <sup>n</sup>, then *create* (*<sup>f</sup>* ) produces an <sup>n</sup>-debit thunk. The cost of creation itself is 3 credits. This specification is somewhat simplistic, as it does not allow the function f to have a nontrivial precondition. It is possible to offer a richer specification; we eschew it in favor of simplicity.

Next comes the specification of *force*. Only a 0-debit thunk can be forced. The result is a value v that satisfies Φ. The (amortized) cost of forcing is 11 credits. The thunderbolt appears in the pre- and postcondition of *force*, forbidding any concurrent attempts to force a thunk.

The last axiom in Fig. 6 corresponds to *pay*. It is a view shift, a ghost operation. By paying <sup>k</sup> credits, one turns an <sup>n</sup>-debit thunk into an (<sup>n</sup> <sup>−</sup> <sup>k</sup>)-debit thunk. At runtime, nothing happens: it is the same thunk before and after the payment. Yet, after the view shift, we have a new view of the number of debits associated with this thunk. Here, paying requires the thunderbolt. It should be possible to remove this requirement; we have not yet attempted to do so.

#### 7.4 Proof of Thunks in Iris**\$**

After implementing thunks in HeapLang (Sect. 7.2) and expressing their specification in Iris\$ (Sect. 7.3), there remains to prove that this specification can be established. We sketch the key ideas of this proof.

Following Pilkiewicz and Pottier [20], when a new thunk is created, we install a new Iris invariant, which describes this thunk. The invariant is as follows:

$$\begin{array}{l} \text{ThunkInv } t \uparrow nc \,\Phi \stackrel{\Delta}{=} \\\\ \exists ac. \left( \begin{array}{l} \stackrel{\scriptstyle \neg}{\bullet ac} \\ \stackrel{\scriptstyle \neg}{\lor} \\ \vee \exists v. \; t \mapsto B \,\text{lack } v \end{array} \star \begin{array}{l} \exists f. \; t \mapsto White \, f \ast \,\{\ $nc\} f \, \left( \begin{array}{l} \{\Phi\} \ \* \; \$ ac \end{array} \right) \\\\ \forall \, \exists v. \; t \mapsto B \,\text{lack } v \end{array} \right) \end{array}$$

γ is a ghost location, which we allocate at the same time as the thunk *t*. It holds elements of the authoritative monoid Auth(N, max) [12]. The variable nc, for "necessary credits", is the cost of the suspended computation: it appears in the precondition of f. The variable ac, for "available credits", is the number of credits that have been paid so far. The disjunction inside the invariant states that:


The predicate isThunk *t* n Φ is then defined as follows:

$$\begin{array}{l} \mathsf{isThunk  $t$  n  $\Phi$  \stackrel{\scriptstyle \Delta}{=}} \\ \exists \gamma, nc. \left( \left[ \begin{matrix} \widetilde{\circlearrowright} & \widetilde{\underline{n}} \end{matrix} \right]^{\gamma} \* \mathsf{Naln}(ThunkInv \ t \ \gamma \ nc \ \Phi) \right) . \end{array}$$

The non-authoritative assertion ◦ (nc <sup>−</sup> <sup>n</sup>) <sup>γ</sup> inside isThunk *<sup>t</sup>* n Φ, confronted with the authoritative assertion • ac <sup>γ</sup> that can be obtained by acquiring the invariant, implies the inequality nc <sup>−</sup> <sup>n</sup> <sup>≤</sup> ac, therefore nc <sup>≤</sup> ac + <sup>n</sup>. That is, the credits paid so far (ac) plus the credits that remain to be paid (n) are sufficient to cover for the actual cost of the computation (nc). In particular, in the proof of *force*, we have a 0-debit thunk, so nc <sup>≤</sup> ac holds. In the case where the thunk is white, this means that the ac credits that we have at hand are sufficient to justify the call <sup>f</sup> (), which requires nc credits.

The final aspect that remains to be explained is our use of NaInv(···), an Iris "nonatomic invariant". Indeed, in this proof, we cannot rely on Iris' primitive invariants. A primitive invariant can be acquired only for the duration of an atomic instruction [12]. In our implementation of thunks (Fig. 5), however, we need a "critical section" that encompasses several instructions. That is, we must acquire the invariant before dereferencing *t*, and (in the case where this thunk is white) we cannot release it until we have marked this thunk black. Fortunately, Iris provides a library of "nonatomic invariants" for this very purpose. (This library is used in the RustBelt project [10] to implement Rust's type Cell.) This library offers separate ghost operations for acquiring and releasing an invariant. Acquiring an invariant consumes a unique token, which is recovered when the invariant is released: this guarantees that an invariant cannot be acquired twice, or in other words, that two threads cannot be in a critical section at the same time. The unique token involved in this protocol is the one that we expose to the end user as "the thunderbolt".

#### 8 Application: Union-Find in Iris**\$**-

As an illustration of the use of both time credits and time receipts, we formally verify the functional correctness and time complexity of an implementation of the Union-Find data structure. Our proof [17] is based on Charguéraud and Pottier's work [4]. We port their code from OCaml to HeapLang, and port their proof from Separation Logic with Time Credits to Iris\$-. At this point, the proof exploits just Iris\$, a subset of Iris\$-. The mathematical analysis of Union-Find, which represents a large part of the proof, is unchanged. Our contribution lies in the fact that we modify the data structure to represent ranks as machine integers instead of unbounded integers, and exploit time receipts in Iris\$ to establish the absence of overflow. We equip HeapLang with signed machine integers whose bit width is a parameter <sup>w</sup>. Under the hypothesis log log N<w <sup>−</sup> 1, we are able to prove that, even though the code uses limited-width machine integers, no overflow can occur in a feasible time. If for instance <sup>N</sup> is 2<sup>63</sup>, then this condition boils down to <sup>w</sup> <sup>≥</sup> 7. Ranks can be stored in just 7 bits without risking overflow.

As in Charguéraud and Pottier's work, the Union-Find library advertises an abstract representation predicate isUF DRV , which describes a well-formed, uniquely-owned Union-Find data structure. The parameter D, a set of nodes, is the domain of the data structure. The parameter R, a function, maps a node to the representative element of its equivalence class. The parameter V , also a function, maps a node to a payload value associated with its equivalence class. We do not show the specification of every operation. Instead, we focus on *union*, which merges two equivalence classes. We establish the following Iris\$triple:

$$\begin{array}{c} \{\mathsf{isUF}D\,R\,V\*\,\ $(44\alpha(|D|)+152)\} \\ x\in D \\ y\in D \end{array} \quad \Rightarrow \quad \begin{array}{c} \{\mathsf{isUF}D\,R\,V\*\,\$ (44\alpha(|D|)+152)\} \\ \Rightarrow \\ \left\{\begin{array}{c} \mathsf{isUF}D\,R\,V' \,\ast \\ z=R(x)\vee z=R(y) \end{array} \right\}\_{\mathsf{s}\mathbf{Z}} \end{array}$$

where the functions R and V are defined as follows:<sup>10</sup>

$$(R'(w), V'(w)) = \begin{cases} (z, & V(z)) & \text{if } R(w) = R(x) \text{ or } R(w) = R(y) \\ (R(w), V(w)) & \text{otherwise} \end{cases}$$

The hypotheses x ∈ D and y ∈ D and the conjunct isUF DRV in the precondition require that x and y be two nodes in a valid Union-Find data structure. The postcondition λz. . . . describes the state of the data structure after the operation and the return value z.

The conjunct \$(44α(|D|) + 152) in the precondition indicates that *union* has time complexity <sup>O</sup>(α(n)), where <sup>α</sup> is an inverse of Ackermann's function and n is the number of nodes in the data structure. This is an amortized bound; the predicate isUF also contains a certain number of time credits, known as the potential of the data structure, which are used to justify *union* operations whose actual cost exceeds the advertised cost. The constants 44 and 152 differ from those found in Charguéraud and Pottier's specification [4] because Iris\$ counts every computation step, whereas they count only function calls. Abstracting these constants by using O notation, as proposed by Guéneau *et al*. [7], would be desirable, but we have not attempted to do so yet.

The main novelty, with respect to Charguéraud and Pottier's specification, is the hypothesis log log N<w <sup>−</sup> 1, which is required to prove that no overflow can occur when the rank of a node is incremented. In our proof, N and w are parameters; once their values are chosen, this hypothesis is easily discharged, once and for all. In the absence of time receipts, we would have to publish the hypothesis log log n<w <sup>−</sup> 1, where <sup>n</sup> is the cardinal of <sup>D</sup>, forcing every (direct and indirect) user of the data structure to keep track of this requirement.

For the proof to go through, we store n time receipts in the data structure: that is, we include the conjunct n, where n stands for |D|, in the definition of the invariant isUF DRV . The operation of creating a new node takes at least one

<sup>10</sup> This definition of *R* and *V* has free variables *x, y, z*, therefore in reality must appear inside the postcondition. Here, it is presented separately, for greater readability.

step, therefore produces one new time receipt, which is used to prove that the invariant is preserved by this operation. At any point, then, from the invariant, and from the basic laws of time receipts, we can deduce that n<N holds. Furthermore, it is easy to show that a rank is at most log <sup>n</sup>. Therefore, a rank is at most log <sup>N</sup>. In combination with the hypothesis log log N<w <sup>−</sup> 1, this suffices to prove that a rank is at most 2w−<sup>1</sup> <sup>−</sup> <sup>1</sup>, the largest signed machine integer, and therefore that no overflow can occur in the computation of a rank.

Clochard *et al*. [5, §2] already present Union-Find as a motivating example among several others. They write that "there is obviously no danger of arithmetic overflow here, since [ranks] are only obtained by successive increments by one". This argument would be formalized in their system by representing ranks as either "one-time" or "peano" integers (in our terminology, clocks or snapclocks). This argument could be expressed in Iris\$-, but would lead to requiring log N < <sup>w</sup> <sup>−</sup> 1. In contrast, we use a more refined argument: we note that ranks are logarithmic in n, the number of nodes, and that n itself can never overflow. This leads us to the much weaker requirement log log N<w <sup>−</sup> 1, which means that a rank can be stored in very few bits. We believe that this argument cannot be expressed in Clochard *et al*.'s system.

#### 9 Discussion

One feature of Iris and HeapLang that deserves further discussion is concurrency. Iris is an evolution of Concurrent Separation Logic, and HeapLang has shared-memory concurrency. How does this impact our reasoning about time? At a purely formal level, this does not have any impact: Theorems 1, 2, 3 and their proofs are essentially oblivious to the absence or presence of concurrency in the programming language. At a more informal level, though, this impacts our interpretation of the real-world meaning of these theorems. Whereas in a sequential setting a "number of computation steps" can be equated (up to a constant factor) with "time", in a concurrent setting, a "number of computation steps" is referred to as "work", and is related to "time" only up to a factor of p, the number of processors. In short, our system measures work, not time. The number of available processors should be taken into account when choosing a specific value of N: this value must be so large that N computation steps are infeasible even by p processors. With this in mind, we believe that our system can still be used to prove properties that have physical relevance.

In short, our new program logics, Iris\$, Iris-, and Iris\$-, tolerate concurrency. Yet, is it fair to say that they have "good support" for reasoning about concurrent programs? We believe not yet, and this is an area for future research. The main open issue is that we do not at this time have good support for reasoning about the time complexity of programs that perform busy-waiting on some resource. The root of the difficulty, already mentioned during the presentation of thunks (Sect. 7.1), is that one thread can fail to make progress, due to interference with another thread. A retry is then necessary, wasting time. In a spin lock, for instance, the "compare-and-set" (CAS) instruction that attempts to acquire the lock can fail. There is no bound on the number of attempts that are required until the lock is eventually acquired. Thus, in Iris\$, we are currently unable to assign *any* specification to the *lock* method of a spin lock.

In the future, we wish to take inspiration from Hoffmann, Marmar and Shao [9], who use time credits in Concurrent Separation Logic to establish the lock-freedom of several concurrent data structures. The key idea is to formalize the informal argument that "failure of a thread to make progress is caused by successful progress in another thread". Hoffmann *et al*. set up a "quantitative compensation scheme", that is, a protocol by which successful progress in one thread (say, a successful CAS operation) must transmit a number of time credits to every thread that has encountered a corresponding failure and therefore must retry. Quite interestingly, this protocol is not hardwired into the reasoning rule for CAS. In fact, CAS itself is not primitive; it is encoded in terms of an atomic {...} construct. The protocol is set up by the user, by exploiting the basic tools of Concurrent Separation Logic, including shared invariants. Thus, it should be possible in Iris\$ to reproduce Hoffmann *et al*.'s reasoning and to assign useful specifications to certain lock-free data structures. Furthermore, we believe that, under a fairness assumption, it should be possible to assign Iris\$ specifications also to coarse-grained data structures, which involve locks. Roughly speaking, under a fair scheduler, the maximum time spent waiting for a lock is the maximum number of threads that may compete for this lock, multiplied by the maximum cost of a critical section protected by this lock. Whether and how this can be formalized is a topic of future research.

The axiom N - False comes with a few caveats that should be mentioned. The same caveats apply to Clochard *et al*.'s system [5], and are known to them.

One caveat is that it is possible in theory to use this axiom to write and justify surprising programs. For instance, in Iris-, the loop "*for* <sup>i</sup> = 1 *to* <sup>N</sup> *do* () *done*" satisfies the specification {True} — {False}: that is, it is possible to prove that this loop "never ends". As a consequence, this loop also satisfies every specification of the form {True} — {Φ}. On the face of it, this loop would appear to be a valid solution to every programming assignment! In practice, it is up to the user to exhibit taste and to refrain from exploiting such a paradox. In reality, the situation is no worse than that in plain Iris, a logic of partial correctness, where the infinite loop "*while true do* () *done*" also satisfies {True} — {False}.

Another important caveat is that the compiler must in principle be instructed to never optimize ticks away. If, for instance, the compiler was allowed to recognize that the loop "*for* <sup>i</sup> = 1 *to* <sup>N</sup> *do* () *done*" does nothing, and to replace this loop with a no-op, then this loop, which according to Iris- "never ends", would in reality end immediately. We would thereby be in danger of proving that a source program cannot crash unless it is allowed to run for centuries, whereas in reality the corresponding compiled program does crash in a short time. In practice, this danger can be avoided by actually instrumenting the source code with *tick*() instructions and by presenting *tick* to the compiler as an unknown external function, which cannot be optimized away. However, this seems a pity, as it disables many compiler optimizations.

We believe that, despite these pitfalls, time receipts can be a useful tool. We hope that, in the future, better ways of avoiding these pitfalls will be discovered.

# 10 Related Work

Time credits in an affine Separation Logic are not a new concept. Atkey [1] introduces them in the setting of Separation Logic. Pilkiewicz and Pottier [20] exploit them in an informal reconstruction of Danielsson's type discipline for lazy thunks [6], which itself is inspired by Okasaki's work [19]. Several authors subsequently exploit time credits in machine-checked proofs of correctness and time complexity of algorithms and data structures [4,7,22]. Hoffmann, Marmar and Shao [9], whose work was discussed earlier in this paper (Sect. 9), use time credits in Concurrent Separation Logic to prove that several concurrent data structure implementations are lock-free.

At a metatheoretic level, Charguéraud and Pottier [4] provide a machinechecked proof of soundness of a Separation Logic with time credits. Haslbeck and Nipkow [8] compare three program logics that can provide worst-case time complexity guarantees, including Separation Logic with time credits.

To the best of our knowledge, affine (exclusive and persistent) time receipts are new, and the axiom N - False is new as well. It is inspired by Clochard *et al*.'s idea that "programs cannot run for centuries" [5], but distills this idea into a simpler form.

Our implementation of thunks and our reconstruction of Okasaki's debits [19] in terms of credits are inspired by earlier work [6,20]. Although Okasaki's analysis assumes a sequential setting, we adapt it to a concurrent setting by explicitly forbidding concurrent operations on thunks; to do so, we rely on Iris nonatomic invariants. In contrast, Danielsson [6] views thunks as a primitive construct in an otherwise pure language. He equips the language with a type discipline, where the type *Thunk*, which is indexed with a debit, forms a monad, and he provides a direct proof of type soundness. The manner in which Danielsson inserts *tick* instructions into programs is a precursor of our tick translation; this idea can in fact be traced at least as far back as Moran and Sands [16]. Pilkiewicz and Pottier [20] sketch an encoding of debits in terms of credits. Because they work in a sequential setting, they are able to install a shared invariant by exploiting the anti-frame rule [21], whereas we use Iris' nonatomic invariants for this purpose. The anti-frame rule does not rule out reentrancy, so they must detect it at runtime, whereas in our case both concurrency and reentrancy are ruled out by our use of nonatomic invariants.

Madhavan *et al*. [15] present an automated system that infers and verifies resource bounds for higher-order functional programs with thunks (and, more generally, with memoization tables). They transform the source program to an instrumented form where the state is explicit and can be described by monotone assertions. For instance, it is possible to assert that a thunk has been forced already (which guarantees that forcing it again has constant cost). This seems analogous in Okasaki's terminology to asserting that a thunk has zero debits, also a monotone assertion. We presently do not know whether Madhavan *et al*.'s system could be encoded into a lower-level program logic such as Iris\$; it would be interesting to find out.

#### 11 Conclusion

We have presented two mechanisms, namely time credits and time receipts, by which Iris, a state-of-the-art concurrent program logic, can be extended with means of reasoning about time. We have established soundness theorems that state precisely what guarantees are offered by the extended program logics Iris\$, Iris-, and Iris\$-. We have defined these new logics modularly, by composing Iris with a program transformation. The three proofs follow a similar pattern: the soundness theorem of Iris is composed with a simulation lemma about the tick translation. We have illustrated the power of the new logics by reconstructing Okasaki's debit-based analysis of thunks, by reconstructing Clochard *et al*.'s technique for proving the absence of certain integer overflows, and by presenting an analysis of Union-Find that exploits both time credits and time receipts.

One limitation of our work is that all of our metatheoretic results are specific to HeapLang, and would have to be reproduced, following the same pattern, if one wished to instantiate Iris\$ for another programming language. It would be desirable to make our statements and proofs generic. In future work, we would also like to better understand what can be proved about the time complexity of concurrent programs that involve waiting. Can the time spent waiting be bounded? What specification can one give to a lock, or a thunk that is protected by a lock? A fairness hypothesis about the scheduler seems to be required, but it is not clear yet how to state and exploit such a hypothesis. Hoffmann, Marmar and Shao [9] have carried out pioneering work in this area, but have dealt only with lock-free data structures and only with situations where the number of competing threads is fixed. It would be interesting to transpose their work into Iris\$ and to develop it further.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Meta-F***-***: Proof Automation with SMT, Tactics, and Metaprograms**

Guido Mart´ınez1,2(B) , Danel Ahman3, Victor Dumitrescu4, Nick Giannarakis5, Chris Hawblitzel6, C˘at˘alin Hrit¸cu2, Monal Narasimhamurthy8, Zoe Paraskevopoulou5, Cl´ement Pit-Claudel9, Jonathan Protzenko6, Tahina Ramananandro<sup>6</sup>, Aseem Rastogi<sup>7</sup>, and Nikhil Swamy<sup>6</sup>

> CIFASIS-CONICET, Rosario, Argentina martinez@cifasis-conicet.gov.ar <sup>2</sup> Inria, Paris, France University of Ljubljana, Ljubljana, Slovenia MSR-Inria Joint Centre, Paris, France Princeton University, Princeton, USA Microsoft Research, Redmond, USA Microsoft Research, Bangalore, India University of Colorado Boulder, Boulder, USA MIT CSAIL, Cambridge, USA

**Abstract.** We introduce Meta-F-, a tactics and metaprogramming framework for the F program verifier. The main novelty of Meta-F is allowing the use of tactics and metaprogramming to discharge assertions not solvable by SMT, or to just simplify them into well-behaved SMT fragments. Plus, Meta-Fcan be used to generate verified code automatically.

Meta-F is implemented as an F effect, which, given the powerful effect system of F-, heavily increases code reuse and even enables the lightweight verification of metaprograms. Metaprograms can be either interpreted, or compiled to efficient native code that can be dynamically loaded into the F type-checker and can interoperate with interpreted code. Evaluation on realistic case studies shows that Meta-F provides substantial gains in proof development, efficiency, and robustness.

**Keywords:** Tactics · Metaprogramming · Program verification · Verification conditions · SMT solvers · Proof assistants

### **1 Introduction**

Scripting proofs using tactics and metaprogramming has a long tradition in interactive theorem provers (ITPs), starting with Milner's Edinburgh LCF [37]. In this lineage, properties of *pure* programs are specified in expressive higher-order (and often dependently typed) logics, and proofs are conducted using various imperative programming languages, starting originally with ML.

Along a different axis, program verifiers like Dafny [47], VCC [23], Why3 [33], and Liquid Haskell [59] target both pure *and effectful* programs, with side-effects ranging from divergence to concurrency, but provide relatively weak logics for specification (e.g., first-order logic with a few selected theories like linear arithmetic). They work primarily by computing verification conditions (VCs) from programs, usually relying on annotations such as pre- and postconditions, and encoding them to automated theorem provers (ATPs) such as satisfiability modulo theories (SMT) solvers, often providing excellent automation.

These two sub-fields have influenced one another, though the situation is somewhat asymmetric. On the one hand, most interactive provers have gained support for exploiting SMT solvers or other ATPs, providing push-button automation for certain kinds of assertions [26,31,43,44,54]. On the other hand, recognizing the importance of interactive proofs, Why3 [33] interfaces with ITPs like Coq. However, working over proof obligations translated from Why3 requires users to be familiar not only with both these systems, but also with the specifics of the translation. And beyond Why3 and the tools based on it [25], no other SMT-based program verifiers have full-fledged support for interactive proving, leading to several downsides:

**Limits to expressiveness.** The expressiveness of program verifiers can be limited by the ATP used. When dealing with theories that are undecidable and difficult to automate (e.g., non-linear arithmetic or separation logic), proofs in ATP-based systems may become impossible or, at best, extremely tedious.

**Boilerplate.** To work around this lack of automation, programmers have to construct detailed proofs by hand, often repeating many tedious yet error-prone steps, so as to provide hints to the underlying solver to discover the proof. In contrast, ITPs with metaprogramming facilities excel at expressing domainspecific automation to complete such tedious proofs.

**Implicit proof context.** In most program verifiers, the logical context of a proof is implicit in the program text and depends on the control flow and the preand postconditions of preceding computations. Unlike in interactive proof assistants, programmers have no explicit access, neither visual nor programmatic, to this context, making proof structuring and exploration extremely difficult.

In direct response to these drawbacks, we seek a system that successfully combines the convenience of an automated program verifier for the common case, while seamlessly transitioning to an interactive proving experience for those parts of a proof that are hard to automate. Towards this end, we propose Meta-F-, a tactics and metaprogramming framework for the F-[1,58] program verifier.

#### **Highlights and Contributions of Meta-F***-*

F has historically been more deeply rooted as an SMT-based program verifier. Until now, F discharged VCs exclusively by calling an SMT solver (usually Z3 [28]), providing good automation for many common program verification tasks, but also exhibiting the drawbacks discussed above.

Meta-F is a framework that allows F users to manipulate VCs using *tactics*. More generally, it supports *metaprogramming*, allowing programmers to script the construction of programs, by manipulating their syntax and customizing the way they are type-checked. This allows programmers to (1) implement custom procedures for manipulating VCs; (2) eliminate boilerplate in proofs and programs; and (3) to inspect the proof state visually and to manipulate it programmatically, addressing the drawbacks discussed above. SMT still plays a central role in Meta-F-: a typical usage involves implementing tactics to transform VCs, so as to bring them into theories well-supported by SMT, without needing to (re)implement full decision procedures. Further, the generality of Meta-F allows implementing non-trivial language extensions (e.g., typeclass resolution) entirely as metaprogramming libraries, without changes to the Ftype-checker.

The technical **contributions** of our work include the following:

**"Meta-" is just an effect (Sect.** 3.1**).** Meta-F is implemented using F-'s extensible effect system, which keeps programs and metaprograms properly isolated. Being first-class F programs, metaprograms are typed, call-by-value, direct-style, higher-order functional programs, much like the original ML. Further, metaprograms can be themselves verified (to a degree, see Sect. 3.4) and metaprogrammed.

**Reconciling tactics with VC generation (Sect.** 4.2**).** In program verifiers the programmer often guides the solver towards the proof by supplying intermediate assertions. Meta-F retains this style, but additionally allows assertions to be solved by tactics. To this end, a contribution of our work is extracting, from a VC, a proof state encompassing all relevant hypotheses, including those implicit in the program text.

**Executing metaprograms efficiently (Sect.** 5**).** Metaprograms are executed during type-checking. As a baseline, they can be interpreted using F-'s existing (but slow) abstract machine for term normalization, or a faster normalizer based on normalization by evaluation (NbE) [10,16]. For much faster execution speed, metaprograms can also be run natively. This is achieved by combining the existing extraction mechanism of F to OCaml with a new framework for safely extending the Ftype-checker with such native code.

**Examples (Sect.** 2**) and evaluation (Sect.** 6**).** We evaluate Meta-F on several case studies. First, we present a functional correctness proof for the Poly1305 message authentication code (MAC) [11], using a novel combination of proofs by reflection for dealing with non-linear arithmetic and SMT solving for linear arithmetic. We measure a clear gain in proof robustness: SMT-only proofs succeed only rarely (for reasonable timeouts), whereas our tactic+SMT proof is concise, never fails, and is faster. Next, we demonstrate an improvement in expressiveness, by developing a small library for proofs of heap-manipulating programs in separation logic, which was previously out-of-scope for F-. Finally, we illustrate the ability to automatically construct verified effectful programs, by introducing a library for metaprogramming verified low-level parsers and serializers with applications to network programming, where verification is accelerated by processing the VC with tactics, and by programmatically tweaking the SMT context.

We conclude that tactics and metaprogramming can be prosperously combined with VC generation and SMT solving to build verified programs with better, more scalable, and more robust automation.

The full version of this paper, including appendices, can be found online in https://www.fstar-lang.org/papers/metafstar.

#### **2 Meta-F****by Example**

F is a general-purpose programming language aimed at program verification. It puts together the automation of an SMT-backed deductive verification tool with the expressive power of a language with full-spectrum dependent types. Briefly, it is a functional, higher-order, effectful, dependently typed language, with syntax loosely based on OCaml. F supports refinement types and Hoare-style specifications, computing VCs of computations via a type-level weakest precondition (WP) calculus packed within *Dijkstra monads* [57]. F-'s effect system is also user-extensible [1]. Using it, one can model or embed imperative programming in styles ranging from ML to C [55] and assembly [35]. After verification, F programs can be extracted to efficient OCaml or F# code. A first-order fragment of F-, called Low-, can also be extracted to C via the KreMLin compiler [55].

This paper introduces Meta-F-, a metaprogramming framework for F that allows users to safely customize and extend F in many ways. For instance, Meta-F can be used to preprocess or solve proof obligations; synthesize F expressions; generate top-level definitions; and resolve implicit arguments in user-defined ways, enabling non-trivial extensions. This paper primarily discusses the first two features. Technically, none of these features deeply increase the expressive power of F-, since one could manually program in F terms that can now be metaprogrammed. However, as we will see shortly, manually programming terms and their proofs can be so prohibitively costly as to be practically infeasible.

Meta-F is similar to other tactic frameworks, such as Coq's [29] or Lean's [30], in presenting a set of goals to the programmer, providing commands to break them down, allowing to inspect and build abstract syntax, etc. In this paper, we mostly detail the characteristics where Meta-F *differs* from other engines.

This section presents Meta-F informally, displaying its usage through case studies. We present any necessary Fbackground as needed.

#### **2.1 Tactics for Individual Assertions and Partial Canonicalization**

Non-linear arithmetic reasoning is crucially needed for the verification of optimized, low-level cryptographic primitives [18,64], an important use case for F- [13] and other verification frameworks, including those that rely on SMT solving alone (e.g., Dafny [47]) as well as those that rely exclusively on tactic-based proofs (e.g., FiatCrypto [32]). While both styles have demonstrated significant successes, we make a case for a middle ground, leveraging the SMT solver for the parts of a VC where it is effective, and using tactics only where it is not.

We focus on Poly1305 [11], a widely-used cryptographic MAC that computes a series of integer multiplications and additions modulo a large prime number <sup>p</sup> = 2130−5. Implementations of the Poly1305 multiplication and mod operations are carefully hand-optimized to represent 130-bit numbers in terms of smaller 32-bit or 64-bit registers, using clever tricks; proving their correctness requires reasoning about long sequences of additions and multiplications.

**Previously: Guiding SMT Solvers by Manually Applying Lemmas.** Prior proofs of correctness of Poly1305 and other cryptographic primitives using SMT-based program verifiers, including F- [64] and Dafny [18], use a combination of SMT automation and manual application of lemmas. On the plus side, SMT solvers are excellent at linear arithmetic, so these proofs delegate all associativity-commutativity (AC) reasoning about addition to SMT. Non-linear arithmetic in SMT solvers, even just AC-rewriting and distributivity, are, however, inefficient and unreliable—so much so that the prior efforts above (and other works too [40,41]) simply turn off support for non-linear arithmetic in the solver, in order not to degrade verification performance across the board due to poor interaction of theories. Instead, users need to explicitly invoke lemmas.<sup>1</sup>

For instance, here is a statement and proof of a lemma about Poly1305 in F-. The property and its proof do not really matter; the lines marked "(∗argh! ∗)" do. In this particular proof, working around the solver's inability to effectively reason about non-linear arithmetic, the programmer has spelled out basic facts about distributivity of multiplication and addition, by calling the library lemma distributivity add right, in order to guide the solver towards the proof. (Below, p44 and p88 represent 2<sup>44</sup> and 2<sup>88</sup> respectively)

```
let lemma carry limb unrolled (a0 a1 a2 : nat) : Lemma (ensures (
  a0 % p44 + p44 * ((a1 + a0 / p44) % p44) + p88 * (a2 + ((a1 + a0 / p44) / p44))
  == a0 + p44 * a1 + p88 * a2)) =
let z = a0 % p44 + p44 * ((a1 + a0 / p44) % p44)
          + p88 * (a2 + ((a1 + a0 / p44) / p44)) in
distributivity add right p88 a2 ((a1 + a0 / p44) / p44); (* argh! *)
pow2 plus 44 44;
lemma div mod (a1 + a0 / p44) p44;
distributivity add right p44 ((a1 + a0 / p44) % p44)
          (p44 * ((a1 + a0 / p44) / p44)); (* argh! *)
assert (p44 * ((a1 + a0 / p44) % p44) + p88 * ((a1 + a0 / p44) / p44)
          == p44 * (a1 + a0 / p44) );
distributivity add right p44 a1 (a0 / p44); (* argh! *)
lemma div mod a0 p44
```
Even at this relatively small scale, needing to explicitly instantiate the distributivity lemma is verbose and error prone. Even worse, the user is blind while doing so: the program text does not display the current set of available facts nor

<sup>1</sup> Lemma (requires pre) (ensures post) is F notation for the type of a computation proving pre <sup>=</sup><sup>⇒</sup> post—we omit pre when it is trivial. In F-'s standard library, math lemmas are proved using SMT with little or no interactions between problematic theory combinations. These lemmas can then be explicitly invoked in larger contexts, and are deleted during extraction.

the final goal. Proofs at this level of abstraction are painfully detailed in some aspects, yet also heavily reliant on the SMT solver to fill in the aspects of the proof that are missing.

Given enough time, the solver can sometimes find a proof without the additional hints, but this is usually rare and dependent on context, and almost never robust. In this particular example we find by varying Z3's random seed that, in an isolated setting, the lemma is proven automatically about 32% of the time. The numbers are much worse for more complex proofs, and where the context contains many facts, making this style quickly spiral out of control. For example, a proof of one of the main lemmas in Poly1305, poly multiply, requires 41 steps of rewriting for associativity-commutativity of multiplication, and distributivity of addition and multiplication—making the proof much too long to show here.

**SMT and Tactics in Meta-F**-**.** The listing below shows the statement and proof of poly multiply in Meta-F-, of which the lemma above was previously only a small part. Again, the specific property proven is not particularly relevant to our discussion. But, this time, the proof contains just two steps.

```
let poly multiply (n p r h r0 r1 h0 h1 h2 s1 d0 d1 d2 h1 h2 hh : int) : Lemma
 (requires p > 0 ∧ r1 ≥ 0 ∧ n > 0 ∧ 4*(n * n) == p + 5 ∧ r == r1 * n + r0 ∧
           h == h2 * (n * n) + h1 * n + h0 ∧ s1 == r1 + (r1 / 4) ∧ r1 % 4 == 0 ∧
           d0 == h0 * r0 + h1 * s1 ∧ d1 == h0 * r1 + h1 * r0 + h2 * s1 ∧
           d2 == h2 * r0 ∧ hh == d2 * (n * n) + d1 * n + d0)
 (ensures (h * r) % p == hh % p) =
 let r14 = r1 / 4 in
 let h r expand = (h2 * (n * n) + h1 * n + h0) * ((r14 * 4) * n + r0) in
 let hh expand = (h2 * r0)*(n * n)+(h0 * (r14 * 4) + h1 * r0
                        + h2 * (5 * r14)) * n + (h0 * r0 + h1 * (5 * r14)) in
 let b = (h2 * n + h1) * r14 in
 modulo addition lemma hh expand p b;
 assert (h r expand == hh expand + b * (n * n *4+(−5)))
     by (canon semiring int csr) (* Proof of this step by Meta-F* tactic *)
```
First, we call a single lemma about modular addition from F-'s standard library. Then, we assert an equality annotated with a tactic (assert..by). Instead of encoding the assertion as-is to the SMT solver, it is preprocessed by the canon semiring tactic. The tactic is presented with the asserted equality as its goal, in an environment containing not only all variables in scope but also hypotheses for the precondition of poly multiply and the postcondition of the modulo addition lemma call (otherwise, the assertion could not be proven). The tactic will then canonicalize the sides of the equality, but notably only "up to" linear arithmetic conversions. Rather than fully canonicalizing the terms, the tactic just rewrites them into a sum-of-products canonical form, leaving all the remaining work to the SMT solver, which can then easily and robustly discharge the goal using linear arithmetic only.

This tactic works over terms in the commutative semiring of integers (int csr) using proof-by-reflection [12,20,36,38]. Internally, it is composed of a simpler, also proof-by-reflection based tactic canon monoid that works over monoids, which is then "stacked" on itself to build canon semiring. The basic idea of proof-byreflection is to reduce most of the proof burden to mechanical computation, obtaining much more efficient proofs compared to repeatedly applying lemmas. For canon monoid, we begin with a type for monoids, a small AST representing monoid values, and a denotation for expressions back into the monoid type.

type monoid (a:Type) = { unit : a; mult : (a → a →a); (∗ + monoid laws ... ∗) } type exp (a:Type) = | Unit : exp a | Var : a → exp a | Mult : exp a →exp a →exp a (∗ Note on syntax: #a below denotes that a is an implicit argument ∗) let rec denote (#a:Type) (m:monoid a) (e:exp a) : a = match e with | Unit →m.unit | Var x →x | Mult x y → m.mult (denote m x) (denote m y)

To canonicalize an exp, it is first converted to a list of operands (flatten) and then reflected back to the monoid (mldenote). The process is proven correct, in the particular case of equalities, by the monoid reflect lemma.

```
val flatten : #a:Type →exp a →list a
val mldenote : #a:Type →monoid a → list a →a
let monoid reflect (#a:Type) (m:monoid a) (e1 e2 : exp a)
             : Lemma (requires (mldenote m (flatten e1) == mldenote m (flatten e2)))
                       (ensures (denote m e1 == denote m e2)) = ...
```
At this stage, if the goal is t1== t2, we require two monoidal expressions e<sup>1</sup> and e<sup>2</sup> such that t1== denote m e<sup>1</sup> and t2== denote m e2. They are constructed by the tactic canon monoid by inspecting the *syntax* of the goal, using Meta-F-'s reflection capabilities (detailed ahead in Sect. 3.3). We have no way to prove once and for all that the expressions built by canon monoid correctly denote the terms, but this fact can be proven automatically at each application of the tactic, by simple unification. The tactic then applies the lemma monoid reflect m e1e2, and the goal is changed to mldenote m (flatten e1) == mldenote m (flatten e2). Finally, by normalization, each side will be canonicalized by running flatten and mldenote.

The canon semiring tactic follows a similar approach, and is similar to existing reflective tactics for other proof assistants [9,38], except that it only canonicalizes up to linear arithmetic, as explained above. The full VC for poly multiply contains many other facts, e.g., that p is non-zero so the division is well-defined and that the postcondition does indeed hold. These obligations remain in a "skeleton" VC that is also easily proven by Z3. This proof is much easier for the programmer to write and much more robust, as detailed ahead in Sect. 6.1. The proof of Poly1305's other main lemma, poly reduce, is also similarly well automated.

**Tactic Proofs Without SMT.** Of course, one can verify poly multiply in Coq, following the same conceptual proof used in Meta-F-, but relying on tactics only. Our proof (included in the appendix) is 27 lines long, two of which involve the use of Coq's ring tactic (similar to our canon semiring tactic) and omega tactic for solving formulas in Presburger arithmetic. The remaining 25 lines include steps to destruct the propositional structure of terms, rewrite by equalities, enriching the context to enable automatic modulo rewriting (Coq does not fully automatically recognize equality modulo p as an equivalence relation compatible with arithmetic operators). While a mature proof assistant like Coq has libraries and tools to ease this kind of manipulation, it can still be verbose.

In contrast, in Meta-F all of these mundane parts of a proof are simply dispatched to the SMT solver, which decides linear arithmetic efficiently, beyond the quantifier-free Presburger fragment supported by tactics like omega, handles congruence closure natively, etc.

#### **2.2 Tactics for Entire VCs and Separation Logic**

A different way to invoke Meta-F is over an entire VC. While the exact shape of VCs is hard to predict, users with some experience can write tactics that find and solve particular sub-assertions within a VC, or simply massage them into shapes better suited for the SMT solver. We illustrate the idea on proofs for heap-manipulating programs.

One verification method that has eluded F until now is separation logic, the main reason being that the pervasive "frame rule" requires instantiating existentially quantified heap variables, which is a challenge for SMT solvers, and simply too tedious for users. With Meta-F-, one can do better. We have written a (proof-of-concept) embedding of separation logic and a tactic (sl auto) that performs heap frame inference automatically.

The approach we follow consists of designing the WP specifications for primitive stateful actions so as to make their footprint syntactically evident. The tactic then descends through VCs until it finds an existential for heaps arising from the frame rule. Then, by solving an equality between heap expressions (which requires canonicalization, for which we use a variant of canon monoid targeting *commutative* monoids) the tactic finds the frames and instantiates the existentials. Notably, as opposed to other tactic frameworks for separation logic [4,45,49,51], this is *all* our tactic does before dispatching to the SMT solver, which can now be effective over the instantiated VC.

We now provide some detail on the framework. Below, 'emp' represents the empty heap, '•' is the separating conjunction and '<sup>r</sup> → <sup>v</sup>' is the heaplet with the single reference r set to value v. <sup>2</sup> Our development distinguishes between a "heap" and its "memory" for technical reasons, but we will treat the two as equivalent here. Further, defined is a predicate discriminating valid heaps (as in [52]), i.e., those built from separating conjunctions of *actually* disjoint heaps.

We first define the type of WPs and present the WP for the frame rule:

```
let pre = memory → prop (∗ predicate on initial heaps ∗)
let post a = a → memory → prop (∗ predicate on result values and final heaps ∗)
let wp a = post a → pre (∗ transformer from postconditions to preconditions ∗)
```

```
let frame post (#a:Type) (p:post a) (m0:memory) : post a =
```
λx m<sup>1</sup> →defined (m<sup>0</sup> • m1) ∧ p x (m<sup>0</sup> • m1)

let frame wp (#a:Type) (wp:wp a) (post:post a) (m:memory) = ∃m<sup>0</sup> m1. defined (m<sup>0</sup> • m1) ∧ m == (m<sup>0</sup> • m1) ∧ wp (frame post post m1) m<sup>0</sup>

<sup>2</sup> This differs from the usual presentation where these three operators are heap predicates instead of heaps.

Intuitively, frame post p m<sup>0</sup> behaves as the postcondition p "framed" by m0, i.e., frame post p m<sup>0</sup> x m<sup>1</sup> holds when the two heaps m<sup>0</sup> and m<sup>1</sup> are disjoint and p holds over the result value x and the conjoined heaps. Then, frame wp wp takes a postcondition p and initial heap m, and requires that m can be split into disjoint subheaps m<sup>0</sup> (the footprint) and m<sup>1</sup> (the frame), such that the postcondition p, when properly framed, holds over the footprint.

In order to provide specifications for primitive actions we start in smallfootprint style. For instance, below is the WP for reading a reference:

let read wp (#a:Type) (r:ref a) = λpost m<sup>0</sup> → ∃x. m<sup>0</sup> == r → x ∧ post x m<sup>0</sup>

We then insert framing wrappers around such small-footprint WPs when exposing the corresponding stateful actions to the programmer, e.g.,

val (!) : #a:Type → r:ref a → STATE a (λ p m → frame wp (read wp r) p m)

To verify code written in such style, we annotate the corresponding programs to have their VCs processed by sl auto. For instance, for the swap function below, the tactic successfully finds the frames for the four occurrences of the frame rule and greatly reduces the solver's work. Even in this simple example, not performing such instantiation would cause the solver to fail.

let swap wp (r<sup>1</sup> r<sup>2</sup> : ref int) = λp m → ∃x y. m == (r<sup>1</sup> → x • r<sup>2</sup> → y) ∧ p () (r<sup>1</sup> → y • r<sup>2</sup> → x) let swap (r<sup>1</sup> r<sup>2</sup> : ref int) : ST unit (swap wp r<sup>1</sup> r2) by (sl auto ()) = let x = !r<sup>1</sup> in let y = !r<sup>2</sup> in r<sup>1</sup> := y; r<sup>2</sup> := x

The sl auto tactic: (1) uses syntax inspection to unfold and traverse the goal until it reaches a frame wp—say, the one for !r2; (2) inspects frame wp's first explicit argument (here read wp r2) to compute the references the current command requires (here r2); (3) uses unification variables to build a memory expression describing the required framing of input memory (here r<sup>2</sup> → ?u<sup>1</sup> • ?u2) and instantiates the existentials of frame wp with these unification variables; (4) builds a goal that equates this memory expression with frame wp's third argument (here r<sup>1</sup> → x • r<sup>2</sup> → y); and (5) uses a commutative monoids tactic (similar to Sect. 2.1) with the heap algebra (emp, •) to canonicalize the equality and sort the heaplets. Next, it can solve for the unification variables component-wise, instantiating ?u<sup>1</sup> to y and ?u<sup>2</sup> to r1→ x, and then proceed to the next frame wp.

In general, after frames are instantiated, the SMT solver can efficiently prove the remaining assertions, such as the obligations about heap definedness. Thus, with relatively little effort, Meta-F brings an (albeit simple version of a) widely used yet previously out-of-scope program logic (i.e., separation logic) into F-. To the best of our knowledge, the ability to *script* separation logic into an SMTbased program verifier, without any primitive support, is unique.

#### **2.3 Metaprogramming Verified Low-Level Parsers and Serializers**

Above, we used Meta-F to manipulate VCs for user-written code. Here, we focus instead on generating verified code automatically. We loosely refer to the previous setting as using "tactics", and to the current one as "metaprogramming". In most ITPs, tactics and metaprogramming are not distinguished; however in a program verifier like F-, where some proofs are not materialized at all (Sect. 4.1), proving VCs of existing terms is distinct from generating new terms.

Metaprogramming in F involves programmatically generating a (potentially effectful) term (e.g., by constructing its syntax and instructing F how to typecheck it) and processing any VCs that arise via tactics. When applicable (e.g., when working in a domain-specific language), metaprogramming verified code can substantially reduce, or even eliminate, the burden of manual proofs.

We illustrate this by automating the generation of parsers and serializers from a type definition. Of course, this is a routine task in many mainstream metaprogramming frameworks (e.g., Template Haskell, camlp4, etc). The novelty here is that we produce imperative parsers and serializers extracted to C, with proofs that they are memory safe, functionally correct, and mutually inverse. This section is slightly simplified, more detail can be found the appendix.

We proceed in several stages. First, we program a library of pure, high-level parser and serializer combinators, proven to be (partial) mutual inverses of each other. A parser for a type t is represented as a function possibly returning a t along with the amount of input bytes consumed. The type of a serializer for a given p:parser t contains a refinement<sup>3</sup> stating that p is an inverse of the serializer. A package is a dependent record of a parser and an associated serializer.

```
let parser t = seq byte → option (t ∗ nat)
let serializer #t (p:parser t) = f:(t → seq byte){∀ x. p (f x) == Some (x, length (f x))}
type package t = { p : parser t ; s : serializer p }
```
Basic combinators in the library include constructs for parsing and serializing base values and pairs, such as the following:

```
val p u8 : parse u8
val s u8 : serializer p u8
val p pair : parser t1 → parser t2 → parser (t1 ∗ t2)
val s pair : serializer p1 →serializer p2 → serializer (p pair p1 p2)
```
Next, we define low-level versions of these combinators, which work over mutable arrays instead of byte sequences. These combinators are coded in the Low- subset of F- (and so can be extracted to C) and are proven to both be memory-safe and respect their high-level variants. The type for low-level parsers, parser impl (p:parser t), denotes an imperative function that reads from an array of bytes and returns a t, behaving as the specificational parser p. Conversely, a serializer impl (s:serializer p) writes into an array of bytes, behaving as s.

Given such a library, we would like to build verified, mutually inverse, lowlevel parsers and serializers for specific data formats. The task is mechanical, yet overwhelmingly tedious by hand, with many auxiliary proof obligations of a predictable structure: a perfect candidate for metaprogramming.

*Deriving Specifications from a Type Definition.* Consider the following F type, representing lists of exactly 18 pairs of bytes.

<sup>3</sup> Fsyntax for refinements is x:t {φ}, denoting the type of all x of type t satisfying φ .

type sample = nlist 18 (u8 ∗ u8)

The first component of our metaprogram is gen specs, which generates parser and serializer specifications from a type definition.

let ps sample : package sample = by (gen specs (`sample))

The syntax by τ is the way to call Meta-F for code generation. Meta-F will run the metaprogram τ and, if successful, replace the underscore by the result. In this case, the gen specs (`sample) inspects the syntax of the sample type (Sect. 3.3) and produces the package below (seq p and seq s are sequencing combinators):

let ps sample = { p = p nlist 18 (p u8 `seq p` p u8) ; s = s nlist 18 (s u8 `seq s` s u8) }

*Deriving Low-Level Implementations that Match Specifications.* From this pair of specifications, we can automatically generate Lowimplementations for them:

let p low : parser impl ps sample.p = by gen parser impl let s low : serializer impl ps sample.s = by gen serializer impl

which will produce the following low-level implementations:

let p low = parse nlist impl 18ul (parse u8 impl `seq pi` parse u8 impl) let s low = serialize nlist impl 18ul (serialize u8 impl `seq si` serialize u8 impl)

For simple types like the one above, the generated code is fairly simple. However, for more complex types, using the combinator library comes with non-trivial proof obligations. For example, even for a simple enumeration, type color = Red | Green, the parser specification is as follows:

parse synth (parse bounded u8 2) (λ x2 → mk if t (x2 = 0uy) (λ →Red) (λ →Green)) (λ x → match x with | Green → 1uy | Red → 0uy)

We represent Red with 0uy and Green with 1uy. The parser first parses a "bounded" byte, with only two values. The parse synth combinator then expects functions between the bounded byte and the datatype being parsed (color), which must be proven to be mutual inverses. This proof is conceptually easy, but for large enumerations nested deep within the structure of other types, it is notoriously hard for SMT solvers. Since the proof is inherently computational, a proof that destructs the inductive type into its cases and then normalizes is much more natural. With our metaprogram, we can produce the term and then discharge these proof obligations with a tactic *on the spot*, eliminating them from the final VC. We also explore simply tweaking the SMT context, again via a tactic, with good results. A quantitative evaluation is provided in Sect. 6.2.

#### **3 The Design of Meta-F***-*

Having caught a glimpse of the use cases for Meta-F-, we now turn to its design. As usual in proof assistants (such as Coq, Lean and Idris), Meta-Ftactics work over a set of goals and apply primitive actions to transform them, possibly solving some goals and generating new goals in the process. Since this is standard, we will focus the most on describing the aspects where Meta-F differs from other engines. We first describe how metaprograms are modelled as an effect (Sect. 3.1) and their runtime model (Sect. 3.2). We then detail some of Meta-F-'s syntax inspection and building capabilities (Sect. 3.3). Finally, we show how to perform some (lightweight) verification of metaprograms (Sect. 3.4) within F-.

#### **3.1 An Effect for Metaprogramming**

Meta-F tactics are, at their core, programs that transform the "proof state", i.e. a set of goals needing to be solved. As in Lean [30] and Idris [22], we define a monad combining exceptions and stateful computations over a proof state, along with actions that can access internal components such as the type-checker. For this we first introduce abstract types for the proof state, goals, terms, environments, etc., together with functions to access them, some of them shown below.


We can now define our metaprogramming monad: tac. It combines F-'s existing effect for potential divergence (Div), with exceptions and stateful computations over a proofstate. The definition of tac, shown below, is straightforward and given in F-'s standard library. Then, we use F-'s effect extension capabilities [1] in order to elevate the tac monad and its actions to an effect, dubbed TAC.

```
type error = exn ∗ proofstate (∗ error and proofstate at the time of failure ∗)
type result a = | Success : a →proofstate → result a | Failed : error →result a
let tac a = proofstate → Div (result a)
let t return #a (x:a) = λps → Success x ps
let t bind #a #b (m:tac a) (f:a → tac b) : tac b = λps → ... (∗ omitted, yet simple ∗)
let get () : tac proofstate = λps →Success ps ps
let raise #a (e:exn) : tac a = λps → Failed (e, ps)
new effect { TAC with repr = tac ; return = t return ; bind = t bind
                                  ; get = get ; raise = raise }
```
The new effect declaration introduces *computation types* of the form TAC t wp, where t is the return type and wp a specification. However, until Sect. 3.4 we shall only use the derived form Tac t, where the specification is trivial. These computation types are distinct from their underlying monadic representation type tac t—users cannot directly access the proof state except via the actions. The simplest actions stem from the tac monad definition: get : unit → Tac proofstate returns the current proof state and raise: exn → Tac α fails with the given exception<sup>4</sup>. Failures can be handled using catch : (unit <sup>→</sup> Tac <sup>α</sup>) <sup>→</sup> Tac (either exn <sup>α</sup>), which resets the state on failure, including that of unification metavariables.

<sup>4</sup> We use greek letters α, β, ... to abbreviate universally quantified type variables.

We emphasize two points here. First, there is no "set" action. This is to forbid metaprograms from arbitrarily replacing their proof state, which would be unsound. Second, the argument to catch must be thunked, since in F impure un-suspended computations are evaluated before they are passed into functions.

The only aspect differentiating Tac from other user-defined effects is the existence of effect-specific primitive actions, which give access to the metaprogramming engine proper. We list here but a few:

```
val trivial : unit →Tac unit val tc : term → Tac term val dump : string → Tac unit
```
All of these are given an interpretation internally by Meta-F-. For instance, trivial calls into F-'s logical simplifier to check whether the current goal is a trivial proposition and discharges it if so, failing otherwise. The tc primitive queries the type-checker to infer the type of a given term in the current environment (F- types are a kind of terms, hence the codomain of tc is also term). This does not change the proof state; its only purpose is to return useful information to the calling metaprograms. Finally, dump outputs the current proof state to the user in a pretty-printed format, in support of user interaction.

Having introduced the Tac effect and some basic actions, writing metaprograms is as straightforward as writing any other F code. For instance, here are two metaprogram combinators. The first one repeatedly calls its argument until it fails, returning a list of all the successfully-returned values. The second one behaves similarly, but folds the results with some provided folding function.

```
let rec repeat (τ : unit → Tac α) : Tac (list α) =
  match catch τ with | Inl →[] | Inr x → x :: repeat τ
```

```
let repeat fold f e τ = fold left f e (repeat τ )
```
These two small combinators illustrate a few key points of Meta-F-. As for all other F effects, metaprograms are written in applicative style, without explicit return, bind, or lift of computations (which are inserted under the hood). This also works across different effects: repeat fold can seamlessly combine the pure fold left from F-'s list library with a metaprogram like repeat. Metaprograms are also type- and effect-inferred: while repeat fold was not at all annotated, F infers the polymorphic type (β→ α→β) →β→ (unit → Tac α) → Tac α for it.

It should be noted that, if lacking an effect extension feature, one could embed metaprograms simply via the (properly abstracted) tac monad instead of the Tac effect. It is just more convenient to use an effect, given we are working within an effectful program verifier already. In what follows, with the exception of Sect. 3.4 where we describe specifications for metaprograms, there is little reliance on using an effect; so, the same ideas could be applied in other settings.

#### **3.2 Executing Meta-F***-***Metaprograms**

Running metaprograms involves three steps. First, they are *reified* [1] into their underlying tac representation, i.e. as state-passing functions. User code cannot reify metaprograms: only Fcan do so when about to process a goal.

Second, the reified term is applied to an initial proof state, and then simply evaluated according to F-'s dynamic semantics, for instance using F-'s existing normalizer. For intensive applications, such as proofs by reflection, we provide faster alternatives (Sect. 5). In order to perform this second step, the proof state, which up until this moments exists only internally to F-, must be *embedded* as a term, i.e., as abstract syntax. Here is where its abstraction pays off: since metaprograms cannot interact with a proof state except through a limited interface, it need not be *deeply* embedded as syntax. By simply wrapping the internal proofstate into a new kind of "alien" term, and making the primitives aware of this wrapping, we can readily run the metaprogram that safely carries its alien proof state around. This wrapping of proof states is a constant-time operation.

The third step is interpreting the primitives. They are realized by functions of similar types implemented within the F type-checker, but over an internal tac monad and the concrete definitions for term, proofstate, etc. Hence, there is a translation involved on every call and return, switching between embedded representations and their concrete variants. Take dump, for example, with type string <sup>→</sup> Tac unit. Its internal implementation, implemented within the F typechecker, has type string →proofstate → Div (result unit). When interpreting a call to it, the interpreter must *unembed* the arguments (which are representations of F terms) into a concrete string and a concrete proofstate to pass to the internal implementation of dump. The situation is symmetric for the return value of the call, which must be *embedded* as a term.

#### **3.3 Syntax Inspection, Generation, and Quotation**

If metaprograms are to be reusable over different kinds of goals, they must be able to reflect on the goals they are invoked to solve. Like any metaprogramming system, Meta-F offers a way to inspect and construct the syntax of F terms. Our representation of terms as an inductive type, and the variants of quotations, are inspired by the ones in Idris [22] and Lean [30].

**Inspecting Syntax.** Internally, F uses a locally-nameless representation [21] with explicit, delayed substitutions. To shield metaprograms from some of this internal bureaucracy, we expose a simplified view [61] of terms. Below we present a few constructors from the term view type:

```
val inspect : term →Tac term view
val pack : term view →term
                                  type term view =
                                    | Tv BVar : v:dbvar →term view
                                    | Tv Var : v:name → term view
                                    | Tv FVar : v:qname →term view
                                    | Tv Abs : bv:binder → body:term → term view
                                    | Tv App : hd:term →arg:term → term view
                                    ...
```
The term view type provides the "one-level-deep" structure of a term: metaprograms must call inspect to reveal the structure of the term, one constructor at a time. The view exposes three kinds of variables: bound variables, Tv BVar; named local variables Tv Var; and top-level fully qualified names, Tv FVar. Bound variables and local variables are distinguished since the internal abstract syntax is locally nameless. For metaprogramming, it is usually simpler to use a fullynamed representation, so we provide inspect and pack functions that open and close binders appropriately to maintain this invariant. Since opening binders requires freshness, inspect has effect Tac. <sup>5</sup> As generating large pieces of syntax via the view easily becomes tedious, we also provide some ways of *quoting* terms:

**Static Quotations.** A static quotation `e is just a shorthand for statically calling the F parser to convert e into the abstract syntax of F terms above. For instance, `(f 1 2) is equivalent to the following,

```
pack (Tv App (pack (Tv App (pack (Tv FVar "f"))
                             (pack (Tv Const (C Int 1)))))
              (pack (Tv Const (C Int 2))))
```
**Dynamic Quotations.** A second form of quotation is dquote: #a:Type →a → Tac term, an effectful operation that is interpreted by F-'s normalizer during metaprogram evaluation. It returns the syntax of its argument at the time dquote e is evaluated. Evaluating dquote e substitutes all the free variables in e with their current values in the execution environment, suspends further evaluation, and returns the abstract syntax of the resulting term. For instance, evaluating (λx → dquote (x + 1)) 16 produces the abstract syntax of 16 + 1.

**Anti-quotations.** Static quotations are useful for building big chunks of syntax concisely, but they are of limited use if we cannot combine them with existing bits of syntax. Subterms of a quotation are allowed to "escape" and be substituted by arbitrary expressions. We use the syntax `#t to denote an antiquoted t, where t must be an expression of type term in order for the quotation to be well-typed. For example, `(1 +`#e) creates syntax for an addition where one operand is the integer constant 1 and the other is the term represented by e.

**Unquotation.** Finally, we provide an effectful operation, unquote: #a:Type → t:term → Tac a, which takes a term representation t and an expected type for it a (usually inferred from the context), and calls the F type-checker to check and elaborate the term representation into a well-typed term.

# **3.4 Specifying and Verifying Metaprograms**

Since we model metaprograms as a particular kind of effectful program within F-, which is a program verifier, a natural question to ask is whether F can specify and verify metaprograms. The answer is "yes, to a degree".

To do so, we must use the WP calculus for the TAC effect: TAC-computations are given computation types of the form TAC a wp, where a is the computation's result type and wp is a weakest-precondition transformer of type tacwp a = proofstate → (result a → prop) → prop. However, since WPs tend to not be very

<sup>5</sup> We also provide functions inspect ln, pack ln which stay in a locally-nameless representation and are thus pure, total functions.

intuitive, we first define two variants of the TAC effect: TacH in "Hoare-style" with pre- and postconditions and Tac (which we have seen before), which only specifies the return type, but uses trivial pre- and postconditions. The requires and ensures keywords below simply aid readability of pre- and postconditions—they are identity functions.

```
effect TacH (a:Type) (pre : proofstate → prop) (post : proofstate →result a → prop) =
        TAC a (λ ps post' → pre ps ∧ (∀ r. post ps r =⇒ post' r))
effect Tac (a:Type) = TacH a (requires (λ → 	)) (ensures (λ → 	))
```
Previously, we only showed the simple type for the raise primitive, namely exn → Tac α. In fact, in full detail and Hoare style, its type/specification is:

```
val raise : e:exn→ TacH α (requires (λ → 	))
                           (ensures (λ ps r → r == Failed (e, ps)))
```
expressing that the primitive has no precondition, always fails with the provided exception, and does not modify the proof state. From the specifications of the primitives, and the automatically obtained Dijkstra monad, F can already prove interesting properties about metaprograms. We show a few simple examples.

The following metaprogram is accepted by F as it can conclude, from the type of raise, that the assertion is unreachable, and hence raise flow can have a trivial precondition (as Tac unit implies).

```
let raise flow () : Tac unit = raise SomeExn; assert ⊥
```
For cur goal safe below, F verifies that (given the precondition) the pattern match is exhaustive. The postcondition is also asserting that the metaprogram always succeeds without affecting the proof state, returning some unspecified goal. Calls to cur goal safe must statically ensure that the goal list is not empty.

```
let cur goal safe () : TacH goal (requires (λ ps → ¬(goals of ps == [])))
                                (ensures (λ ps r → ∃g. r == Success g ps)) =
    match goals of (get ()) with | g :: → g
```
Finally, the divide combinator below "splits" the goals of a proof state in two at a given index n, and focuses a different metaprogram on each. It includes a runtime check that the given n is non-negative, and raises an exception in the TAC effect otherwise. Afterwards, the call to the (pure) List.splitAt function requires that n be statically known to be non-negative, a fact which can be proven from the specification for raise and the effect definition, which defines the control flow.

```
let divide (n:int) (tl : unit → Tac α) (tr : unit →Tac β) : Tac (α ∗ β) =
    if n < 0 then raise NegativeN;
    let gsl, gsr = List.splitAt n (goals ()) in ...
```
This enables a style of "lightweight" verification of metaprograms, where expressive invariants about their state and control-flow can be encoded. The programmer can exploit dynamic checks (n < 0) and exceptions (raise) or static ones (preconditions), or a mixture of them, as needed.

Due to type abstraction, though, the specifications of most primitives cannot provide complete detail about their behavior, and deeper specifications (such as ensuring a tactic will correctly solve a goal) cannot currently be proven, nor even stated—to do so would require, at least, an internalization of the typing judgment of F-. While this is an exciting possibility [3], we have for now only focused on verifying basic safety properties of metaprograms, which helps users detect errors early, and whose proofs the SMT can handle well. Although in principle, one can also write tactics to discharge the proof obligations of metaprograms.

#### **4 Meta-F***-***, Formally**

We now describe the trust assumptions for Meta-F- (Sect. 4.1) and then how we reconcile tactics within a program verifier, where the exact shape of VCs is not given, nor known a priori by the user (Sect. 4.2).

# **4.1 Correctness and Trusted Computing Base (TCB)**

As in any proof assistant, tactics and metaprogramming would be rather useless if they allowed to "prove" invalid judgments—care must be taken to ensure soundness. We begin with a taste of the specifics of F-'s static semantics, which influence the trust model for Meta-F-, and then provide more detail on the TCB.

**Proof Irrelevance in F**-**.** The following two rules for introducing and eliminating refinement types are key in F-, as they form the basis of its proof irrelevance.

$$\begin{array}{llll} \text{T-REFinE} & & \\ \hline I \vdash e & : t & \\ \hline \Gamma \vdash e & : x: t \{\phi\} & \\ \end{array} \qquad \qquad \begin{array}{llll} \text{V-REFinE} & & \\ \hline I \vdash e & : x: t \{\phi\} \\ \hline \Gamma \vdash \phi [e/x] & \\ \end{array}$$

The symbol represents F-'s *validity judgment* [1] which, at a high-level, defines a proof-irrelevant, classical, higher-order logic. These validity hypotheses are usually collected by the type-checker, and then encoded to the SMT solver in bulk. Crucially, the irrelevance of validity is what permits efficient interaction with SMT solvers, since reconstructing Fterms from SMT proofs is unneeded.

As evidenced in the rules, validity and typing are mutually recursive, and therefore Meta-F must also construct validity derivations. In the implementation, we model these validity goals as holes with a "squash" type [5,53], where squash φ = :unit{φ }, i.e., a refinement of unit. Concretely, we model Γ φ as Γ - ?u : squash φ using a unification variable. Meta-F does not construct deep solutions to squashed goals: if they are proven valid, the variable ?u is simply solved by the unit value '()'. At any point, any such irrelevant goal can be sent to the SMT solver. Relevant goals, on the other hand, cannot be sent to SMT.

**Scripting the Typing Judgment.** A consequence of validity proofs not being materialized is that type-checking is undecidable in F-. For instance: does the unit value () solve the hole Γ - ?u : squash φ ? Well, only if φ holds—a condition which no type-checker can effectively decide. This implies that the typechecker cannot, in general, rely on proof terms to reconstruct a proof. Hence, the primitives are designed to provide access to the typing judgment of F directly, instead of building syntax for proof terms. One can think of F-'s type-checker as implementing one particular algorithmic heuristic of the typing and validity judgments—a heuristic which happens to work well in practice. For convenience, this default type-checking heuristic is also available to metaprograms: this is in fact precisely what the exact primitive does. Having programmatic access to the typing judgment also provides the flexibility to tweak VC generation as needed, instead of leaving it to the default behavior of F-. For instance, the refine intro primitive implements T-Refine. When applied, it produces two new goals, including that the refinement actually holds. At that point, a metaprogram can run any arbitrary tactic on it, instead of letting the F type-checker collect the obligation and send it to the SMT solver in bulk with others.

**Trust.** There are two common approaches for the correctness of tactic engines: (1) the *de Bruijn criterion* [6], which requires constructing full proofs (or proof terms) and checking them at the end, hence reducing trust to an independent proof-checker; and (2) the LCF style, which applies backwards reasoning while constructing validation functions at every step, reducing trust to primitive, forward-style implementations of the system's inference rules.

As we wish to make use of SMT solvers within F-, the first approach is not easy. Reconstructing the proofs SMT solvers produce, if any, back into a proper derivation remains a significant challenge (even despite recent progress, e.g. [17,31]). Further, the logical encoding from F to SMT, along with the solver itself, are already part of F-'s TCB: shielding Meta-F from them would not significantly increase safety of the combined system.

Instead, we roughly follow the LCF approach and implement F-'s typing rules as the basic user-facing metaprogramming actions. However, instead of implementing the rules in forward-style and using them to validate (untrusted) backwards-style tactics, we implement them directly in backwards-style. That is, they run by breaking down goals into subgoals, instead of combining proven facts into new proven facts. Using LCF style makes the primitives part of the TCB. However, given the primitives are sound, any combination of them also is, and any user-provided metaprogram must be safe due to the abstraction imposed by the Tac effect, as discussed next.

**Correct Evolutions of the Proof State.** For soundness, it is imperative that tactics do not arbitrarily drop goals from the proof state, and only discharge them when they are solved, or when they can be solved by other goals tracked in the proof state. For a concrete example, consider the following program:

let f : int →int = by (intro (); exact (`42))

Here, Meta-F will create an initial proof state with a single goal of the form [∅ - ?u<sup>1</sup> : int →int] and begin executing the metaprogram. When applying the intro primitive, the proof state transitions as shown below.

$$[\emptyset \vdash \text{?} \mathtt{u}\_1 \ \mathsf{int} \rightarrow \mathsf{int}] \leadsto [\mathsf{x} \mathsf{int} \vdash \mathsf{?} \mathtt{u}\_2 \ \mathsf{int}]$$

Here, a solution to the original goal has not yet been built, since it *depends* on the solution to the goal on the right hand side. When it is solved with, say, 42, we can solve our original goal with λx →42. To formalize these dependencies, we say that a proof state φ *correctly evolves (via* f*) to* ψ, denoted φ <sup>f</sup> ψ, when there is a generic transformation f, called a *validation*, from solutions to all of ψ's goals into correct solutions for φ's goals. When φ has n goals and ψ has m goals, the validation f is a function from term<sup>m</sup> into termn. Validations may be composed, providing the transitivity of correct evolution, and if a proof state φ correctly evolves (in any amount of steps) into a state with no more goals, then we have fully defined solutions to all of φ's goals. We emphasize that validations are not constructed explicitly during the execution of metaprograms. Instead we exploit unification metavariables to instantiate the solutions automatically.

Note that validations may construct solutions for more than one goal, i.e., their codomain is not a single term. This is required in Meta-F-, where primitive steps may not only decompose goals into subgoals, but actually combine goals as well. Currently, the only primitive providing this behavior is join, which finds a maximal common prefix of the environment of two irrelevant goals, reverts the "extra" binders in both goals and builds their conjunction. Combining goals using join is especially useful for sending multiple goals to the SMT solver in a single call. When there are common obligations within two goals, joining them before calling the SMT solver can result in a significantly faster proof.

We check that every primitive action respects the preorder. This relies on them modeling F-'s typing rules. For example, and unsurprisingly, the following rule for typing abstractions is what justifies the intro primitive:

$$\frac{\begin{array}{c} \text{T-FUN} \\ \hline \end{array}}{\begin{array}{c} \begin{array}{c} \Gamma, x:t \vdash e:t' \\ \hline \end{array} \end{array}} $$

Then, for the proof state evolution above, the validation function f is the (mathematical, meta-level) function taking a term of type int (the solution for ?u2) and building syntax for its abstraction over x. Further, the intro primitive respects the correct-evolution preorder, by the very typing rule (T-Fun) from which it is defined. In this manner, every typing rule induces a syntax-building metaprogramming step. Our primitives come from this dual interpretation of typing rules, which ensures that logical consistency is preserved.

Since the relation is a preorder, and every metaprogramming primitive we provide the user evolves the proof state according , it is trivially the case that the final proof state returned by a (successful) computation is a correct evolution of the initial one. That means that when the metaprogram terminates, one has indeed broken down the proof obligation correctly, and is left with a (hopefully) simpler set of obligations to fulfill. Note that since is a preorder, Tac provides an interesting example of monotonic state [2].

#### **4.2 Extracting Individual Assertions**

As discussed, the logical context of a goal processed by a tactic is not always syntactically evident in the program. And, as shown in the List.splitAt call in divide from Sect. 3.4, some obligations crucially depend on the control-flow of the program. Hence, the proof state must crucially include these assumptions if proving the assertion is to succeed. Below, we describe how Meta-F finds proper contexts in which to prove the assertions, including control-flow information. Notably, this process is defined over logical formulae and does not depend at all on F-'s WP calculus or VC generator: we believe it should be applicable to any VC generator.

As seen in Sect. 2.1, the basic mechanism by which Meta-F attaches a tactic to a specific sub-goal is assert φ by τ . Our encoding of this expression is built similarly to F-'s existing assert construct, which is simply sugar for a pure function assert of type φ :prop →Lemma (requires φ ) (ensures φ ), which essentially introduces a cut in the generated VC. That is, the term (assert φ ; e) roughly produces the verification condition φ∧ (φ =⇒ VCe), requiring a proof of φ at this point, and assuming φ in the continuation. For Meta-F-, we aim to keep this style while allowing asserted formulae to be decorated with user-provided tactics that are tasked with proving or pre-processing them. We do this in three steps.

First, we define the following "phantom" predicate:

let with tactic (φ : prop) (τ : unit →Tac unit) = φ

Here φ `with tactic`τ simply associates the tactic τ with φ , and is equivalent to φ by its definition. Next, we implement the assert by tactic lemma, and desugar assert φ by τ into assert by tactic φ τ . This lemma is trivially provable by F-.

```
let assert by tactic (φ : prop) (τ : unit → Tac unit)
                             : Lemma (requires (φ `with tactic` τ )) (ensures φ ) = ()
```
Given this specification, the term (assert φ by τ ; e) roughly produces the verification condition φ `with tactic`τ ∧ (φ =⇒ VCe), with a tagged left sub-goal, and φ as an hypothesis in the right one. Importantly, F keeps the with tactic marker uninterpreted until the VC needs to be discharged. At that point, it may contain several annotated subformulae. For example, suppose the VC is VC0 below, where we distinguish an ambient context of variables and hypotheses Δ:

(VC0) Δ |= X =⇒ (∀ (x:t). R `with tactic` τ <sup>1</sup> ∧ (R =⇒ S))

In order to run the τ <sup>1</sup> tactic on R, it must first be "split out". To do so, all logical information "visible" for τ <sup>1</sup> (i.e. the set of premises of the implications traversed and the binders introduced by quantifiers) must be included. As for any program verifier, these hypotheses include the control flow information, postconditions, and any other logical fact that is known to be valid at the program point where the corresponding assert R by τ <sup>1</sup> was called. All of them are collected into Δ as the term is traversed. In this case, the VC for R is:

(VC1) Δ, :X, x:t |= R

Afterwards, this obligation is removed from the original VC. This is done by replacing it with , leaving a "skeleton" VC with all remaining facts.

$$(\mathsf{VC2})\ \Delta \vdash \mathsf{X} \Longrightarrow (\forall \ (\mathsf{x} \mathsf{t} \mathsf{t}) . \top \land (\mathsf{R} \Longrightarrow \mathsf{S})) \ \mathsf{X}$$

The validity of VC1 and VC2 implies that of VC0. F also recursively descends into R and S, in case there are more with tactic markers in them. Then, tactics are run on the the split VCs (e.g., τ <sup>1</sup> on VC1) to break them down (or solve them). All remaining goals, including the skeleton, are sent to the SMT solver.

Note that while the *obligation* to prove R, in VC1, is preprocessed by the tactic τ <sup>1</sup>, the *assumption* R for the continuation of the code, in VC2, is left as-is. This is crucial for tactics such as the canonicalizer from Sect. 2.1: if the skeleton VC2 contained an assumption for the canonicalized equality it would not help the SMT solver show the uncanonicalized postcondition.

However, not all nodes marked with with tactic are proof obligations. Suppose X in the previous VC was given as (Y `with tactic`τ <sup>2</sup>). In this case, one certainly does not want to attempt to prove Y, since it is an hypothesis. While it would be *sound* to prove it and replace it by , it is useless at best, and usually irreparably affects the system. Consider asserting the tautology (⊥`with tactic`τ ) =⇒ ⊥.

Hence, F splits such obligations only in strictly-positive positions. On all others, F simply drops the with tactic marker, e.g., by just unfolding the definition of with tactic. For regular uses of the assert..by construct, however, all occurrences are strictly-positive. It is only when (expert) users use the with tactic marker directly that the above discussion might become relevant.

Formally, the soundness of this whole approach is given by the following metatheorem, which justifies the splitting out of sub-assertions, and by the correctness of evolution detailed in Sect. 4.1. The proof of Theorem 1 is straightforward, and included in the appendix. We expect an analogous property to hold in other verifiers as well (in particular, it holds for first-order logic).

**Theorem 1.** *Let* E *be a context with* Γ - E : prop ⇒ prop*, and* φ *a squashed proposition such that* Γ φ : prop*. Then the following holds:*

$$\frac{\varGamma \models E[\top] \quad \varGamma, \gamma(E) \models \phi}{\varGamma \models E[\phi]}$$

*where* γ(E) *is the set of binders* E *introduces. If* E *is strictly-positive, then the reverse implication holds as well.*

### **5 Executing Metaprograms Efficiently**

F provides three complementary mechanisms for running metaprograms. The first two, F-'s call-by-name (CBN) interpreter and a (newly implemented) callby-value (CBV) NbE-based evaluator, support strong reduction—henceforth we refer to these as "normalizers". In addition, we design and implement a new *native plugin* mechanism that allows both normalizers to interface with Meta-F programs extracted to OCaml, reusing F-'s existing extraction pipeline for this purpose. Below we provide a brief overview of the three mechanisms.

#### **5.1 CBN and CBV Strong Reductions**

As described in Sect. 3.1, metaprograms, once reified, are simply F terms of type proofstate <sup>→</sup> Div (result a). As such, they can be reduced using F-'s existing computation machinery, a CBN interpreter for strong reductions based on the Krivine abstract machine (KAM) [24,46]. Although complete and highly configurable, F-'s KAM interpreter is slow, designed primarily for converting types during dependent type-checking and higher-order unification.

Shifting focus to long-running metaprograms, such as tactics for proofs by reflection, we implemented an NbE-based strong-reduction evaluator for F computations. The evaluator is implemented in F and extracted to OCaml (as is the rest of F-), thereby inheriting CBV from OCaml. It is similar to Boespflug et al.'s [16] NbE-based strong-reduction for Coq, although we do not implement their low-level, OCaml-specific tag-elimination optimizations—nevertheless, it is already vastly more efficient than the KAM-based interpreter.

#### **5.2 Native Plugins and Multi-language Interoperability**

Since Meta-F programs are just F programs, they can also be extracted to OCaml and natively compiled. Further, they can be dynamically linked into F as "plugins". Plugins can be directly called from the type-checker, as is done for the primitives, which is much more efficient than interpreting them. However, compilation has a cost, and it is not convenient to compile every single invocation. Instead, Meta-F enables users to choose which metaprograms are to be plugins (presumably those expected to be computation-intensive, e.g. canon semiring). Users can choose their native plugins, while still quickly scripting their higher-level logic in the interpreter.

This requires (for higher-order metaprograms) a form of multi-language interoperability, converting between representations of terms used in the normalizers and in native code. We designed a small multi-language calculus, with ML-style polymorphism, to model the interaction between normalizers and plugins and conversions between terms. See the appendix for details.

Beyond the notable efficiency gains of running compiled code vs. interpreting it, native metaprograms also require fewer embeddings. Once compiled, metaprograms work over the internal, *concrete* types for proofstate, term, etc., instead of over their F representations (though still treating them abstractly). Hence, compiled metaprograms can call primitives without needing to embed their arguments or unembed their results. Further, they can call each other directly as well. Indeed, operationally there is little operational difference between a primitive and a compiled metaprogram used as a plugin.

Native plugins, however, are not a replacement for the normalizers, for several reasons. First, the overhead in compilation might not be justified by the execution speed-up. Second, extraction to OCaml erases types and proofs. As a result, the F *interface* of the native plugins can only contain types that can also be expressed in OCaml, thereby excluding full-dependent types—internally, however, they can be dependently typed. Third, being OCaml programs, native plugins do not support reducing open terms, which is often required. However, when the programs treat their open arguments parametrically, relying on parametric polymorphism, the normalizers can pass such arguments *as-is*, thereby recovering open reductions in some cases. This allows us to use native datastructure implementations (e.g. List), which is much faster than using the normalizers, even for open terms. See the appendix for details.

# **6 Experimental Evaluation**

We now present an experimental evaluation of Meta-F-. First, we provide benchmarks comparing our reflective canonicalizer from Sect. 2.1 to calling the SMT solver directly without any canonicalization. Then, we return to the parsers and serializers from Sect. 2.3 and show how, for VCs that arise, a domain-specific tactic is much more tractable than a SMT-only proof.

### **6.1 A Reflective Tactic for Partial Canonicalization**

In Sect. 2.1, we have described the canon semiring tactic that rewrites semiring expressions into sums of products. We find that this tactic significantly improves proof robustness. The table below compares the success rates and times for the poly multiply lemma from Sect. 2.1. To test the robustness of each alternative, we run the tests 200 times while varying the SMT solver's random seed. The smtix rows represent asking the solver to prove the lemma without any help from tactics, where i represents the resource limit (rlimit) multiplier given to the solver. This rlimit is memory-allocation based and independent of the particular system or current load. For the interp and native rows, the canon semiring tactic is used, running it using F-'s KAM normalizer and as a native plugin respectively—both with an rlimit of 1.

For each setup, we display the success rate of verification, the average (CPU) time taken for the SMT queries (not counting the time for parsing/processing the theory) with its standard deviation, and the average total time (its standard deviation coincides with that of the queries). When applicable, the time for tactic execution (which is independent of the seed) is displayed. The smt rows show very poor success


rates: even when upping the rlimit to a whopping 100x, over three quarters of the attempts fail. Note how the (relative) standard deviation increases with the rlimit: this is due to successful runs taking rather random times, and failing ones exhausting their resources in similar times. The setups using the tactic show a clear increase in robustness: canonicalizing the assertion causes this proof to always succeed, even at the default rlimit. We recall that the tactic variants still leave goals for SMT solving, namely, the skeleton for the original VC and the canonicalized equality left by the tactic, easily dischargeable by the SMT solver through much more well-behaved linear reasoning. The last column shows that native compilation speeds up this tactic's execution by about 5x.

#### **6.2 Combining SMT and Tactics for the Parser Generator**

In Sect. 2.3, we presented a library of combinators and a metaprogramming approach to automate the construction of verified, mutually inverse, low-level parsers and serializers from type descriptions. Beyond generating the code, tactics are used to process and discharge proof obligations that arise when using the combinators.

We present three strategies for discharging these obligations, including those of bijectivity that arise when constructing parsers and serializers for enumerated types. First, we used F-'s default strategy to present all of these proofs directly to the SMT solver. Second, we programmed a ∼100 line tactic to discharge these proofs without relying on the SMT solver at all. Finally, we used a hybrid approach where a simple, 5-line tactic is used to prune the context of the proof removing redundant facts before presenting the resulting goals to the SMT solver.

The table alongside shows the total time in seconds for verifying metaprogrammed low-level parsers and serializers for enumerations of different sizes. In short, the hybrid approach scales the best; the tactic-only approach is some-


what slower; while the SMT-only approach scales poorly and is an order of magnitude slower. Our hybrid approach is very simple. With some more work, a more sophisticated hybrid strategy could be more performant still, relying on tactic-based normalization proofs for fragments of the VC best handled computationally (where the SMT solver spends most of its time), while using SMT only for integer arithmetic, congruence closure etc. However, with Meta-F-'s ability to manipulate proof contexts programmatically, our simple context-pruning tactic provides a big payoff at a small cost.

#### **7 Related Work**

Many SMT-based program verifiers [7,8,19,34,48], rely on user hints, in the form of assertions and lemmas, to complete proofs. This is the predominant style of proving used in tools like Dafny [47], Liquid Haskell [60], Why3 [33], and F itself [58]. However, there is a growing trend to augment this style of semiautomated proof with interactive proofs. For example, systems like Why3 [33] allow VCs to be discharged using ITPs such as Coq, Isabelle/HOL, and PVS, but this requires an additional embedding of VCs into the logic of the ITP in question. In recent concurrent work, support for *effectful* reflection proofs was added to Why3 [50], and it would be interesting to investigate if this could also be done in Meta-F-. Grov and Tumas [39] present Tacny, a tactic framework for Dafny, which is, however, limited in that it only transforms source code, with the program verifier unchanged. In contrast, Meta-F combines the benefits of an SMT-based program verifier and those of tactic proofs within a single language.

Moving away from SMT-based verifiers, ITPs have long relied on separate languages for proof scripting, starting with Edinburgh LCF [37] and ML, and continuing with HOL, Isabelle and Coq, which are either extensible via ML, or have dedicated tactic languages [3,29,56,62]. Meta-F builds instead on a recent idea in the space of dependently typed ITPs [22,30,42,63] of reusing the object-language as the meta-language. This idea first appeared in Mtac, a Coqbased tactics framework for Coq [42,63], and has many generic benefits including reusing the standard library, IDE support, and type checker of the proof assistant. Mtac can additionally check the partial correctness of tactics, which is also sometimes possible in Meta-F but still rather limited (Sect. 3.4). Meta-F-'s design is instead more closely inspired by the metaprogramming frameworks of Idris [22] and Lean [30], which provide a deep embedding of terms that metaprograms can inspect and construct at will without dependent types getting in the way. However, F-'s effects, its weakest precondition calculus, and its use of SMT solvers distinguish Meta-F from these other frameworks, presenting both challenges and opportunities, as discussed in this paper.

Some SMT solvers also include tactic engines [27], which allow to process queries in custom ways. However, using SMT tactics from a program verifier is not very practical. To do so effectively, users must become familiar not only with the solver's language and tactic engine, but also with the translation from the program verifier to the solver. Instead, in Meta-F-, everything happens within a single language. Also, to our knowledge, these tactics are usually coarselygrained, and we do not expect them to enable developments such as Sect. 2.2. Plus, SMT tactics do not enable metaprogramming.

Finally, ITPs are seeing increasing use of "hammers" such as Sledgehammer [14,15,54] in Isabelle/HOL, and similar tools for HOL Light and HOL4 [43], and Mizar [44], to interface with ATPs. This technique is similar to Meta-F-, which, given its support for a dependently typed logic is especially related to a recent hammer for Coq [26]. Unlike these hammers, Meta-F does not aim to reconstruct SMT proofs, gaining efficiency at the cost of trusting the SMT solver. Further, whereas hammers run in the background, lightening the load on a user otherwise tasked with completing the entire proof, Meta-F relies more heavily on the SMT solver as an end-game tactic in nearly all proofs.

#### **8 Conclusions**

A key challenge in program verification is to balance automation and expressiveness. Whereas tactic-based ITPs support highly expressive logics, the tactic author is responsible for all the automation. Conversely, SMT-based program verifiers provide good, scalable automation for comparatively weaker logics, but offer little recourse when verification fails. A design that allows picking the right tool, at the granularity of each verification sub-task, is a worthy area of research. Meta-F presents a new point in this space: by using hand-written tactics alongside SMT-automation, we have written proofs that were previously impractical in F-, and (to the best of our knowledge) in other SMT-based program verifiers.

**Acknowledgements.** We thank Leonardo de Moura and the Project Everest team for many useful discussions. The work of Guido Mart´ınez, Nick Giannarakis, Monal Narasimhamurthy, and Zoe Paraskevopoulou was done, in part, while interning at Microsoft Research. Cl´ement Pit-Claudel's work was in part done during an internship at Inria Paris. The work of Danel Ahman, Victor Dumitrescu, and C˘at˘alin Hrit¸cu is supported by the MSR-Inria Joint Centre and the European Research Council under ERC Starting Grant SECOMP (1-715753).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Semi-automated Reasoning About Non-determinism in C Expressions**

Dan Frumin1(B), L´eon Gondelman1, and Robbert Krebbers<sup>2</sup>

<sup>1</sup> Radboud University, Nijmegen, The Netherlands {dfrumin,lgg}@cs.ru.nl <sup>2</sup> Delft University of Technology, Delft, The Netherlands

mail@robbertkrebbers.nl

**Abstract.** Research into C verification often ignores that the C standard leaves the evaluation order of expressions unspecified, and assigns undefined behavior to write-write or read-write conflicts in subexpressions so called "sequence point violations". These aspects should be accounted for in verification because C compilers exploit them.

We present a verification condition generator (vcgen) that enables one to semi-automatically prove the absence of undefined behavior in a given C program for *any* evaluation order. The key novelty of our approach is a symbolic execution algorithm that computes a *frame* at the same time as a *postcondition*. The frame is used to automatically determine how resources should be distributed among subexpressions.

We prove correctness of our vcgen with respect to a new monadic definitional semantics of a subset of C. This semantics is modular and gives a concise account of non-determinism in C.

We have implemented our vcgen as a tactic in the Coq interactive theorem prover, and have proved correctness of it using a separation logic for the new monadic definitional semantics of a subset of C.

# **1 Introduction**

The ISO C standard [22]—the official specification of the C language—leaves many parts of the language semantics either *unspecified* (*e.g.,* the order of evaluation of expressions), or *undefined* (*e.g.,* dereferencing a NULL pointer or integer overflow). In case of undefined behavior a program may do literally anything, *e.g.,* it may crash, or it may produce an arbitrary result and side-effects. Therefore, to establish the correctness of a C program, one needs to ensure that the program has no undefined behavior for *all* possible choices of non-determinism due to unspecified behavior.

In this paper we focus on the undefined and unspecified behaviors related to C's expression semantics, which have been ignored by most existing verification tools, but are crucial for establishing the correctness of realistic C programs. The C standard does not require subexpressions to be evaluated in a specific order (*e.g.,* from left to right), but rather allows them to be evaluated in *any* order. Moreover, an expression has undefined behavior when there is a conflicting writewrite or read-write access to the same location between two *sequence points* [22, 6.5p2] (so called "sequence point violation"). Sequence points occur *e.g.,* at the end of a full expression (;), before and after each function call, and after the first operand of a conditional expression (-?-:-) has been evaluated [22, Annex C]. Let us illustrate this by means of the following example:

```
int main() {
 int x; int y = (x = 3) + (x = 4);
 printf("%d-
             %d\n", x, y);
}
```
Due to the unspecified evaluation order, one would naively expect this program to print either "3 7" or "4 7", depending on which assignment to x was evaluated first. But this program exhibits undefined behavior due to a sequence point violation: there are two conflicting writes to the variable x. Indeed, when compiled with GCC (version 8.2.0), the program in fact prints "4 8", which does not correspond to the expected results of any of the evaluation orders.

One may expect that these programs can be easily ruled out statically using some form of static analysis, but this is not the case. Contrary to the simple program above, one can access the values of arbitrary pointers, making it impossible to statically establish the absence of write-write or read-write conflicts. Besides, one should not merely establish the absence of undefined behavior due to conflicting accesses to the same locations, but one should also establish that there are no other forms of undefined behavior (*e.g.,* that no NULL pointers are dereferenced) for *any evaluation order*.

To deal with this issue, Krebbers [29,30] developed a program logic based on Concurrent Separation Logic (CSL) [46] for establishing the absence of undefined behavior in C programs in the presence of non-determinism. To get an impression of how his logic works, let us consider the rule for the addition operator:

$$\frac{\left\{P\_1\right\}\mathfrak{e}\_1\left\{\Psi\_1\right\}}{} \quad \begin{aligned} \left\{P\_2\right\}\mathfrak{e}\_2\left\{\Psi\_2\right\} \quad \quad \forall \mathbf{v}\_1 \,\mathbf{v}\_2.\,\Psi\_1 \,\mathbf{v}\_1 \* \Psi\_2 \,\mathbf{v}\_2 &\vdash \Phi\left(\mathbf{v}\_1 + \mathbf{v}\_2\right) \\ \left\{P\_1 \* P\_2\right\}\mathfrak{e}\_1 + \mathfrak{e}\_2\left\{\Phi\right\} \end{aligned}}{\left\{P\_1 \* P\_2\right\}\mathfrak{e}\_1 + \mathfrak{e}\_2\left\{\Phi\right\}} \end{aligned}$$

This rule is much like the rule for parallel composition in CSL—the precondition should be separated into two parts P<sup>1</sup> and P<sup>2</sup> describing the resources needed for proving the Hoare triples of both operands. Crucially, since P<sup>1</sup> and P<sup>2</sup> describe disjoint resources as expressed by the *separating conjunction* <sup>∗</sup>, it is guaranteed that e<sup>1</sup> and e<sup>2</sup> do not interfere with each other, and hence cannot cause sequence point violations. The purpose of the rule's last premise is to ensure that for all possible return values v<sup>1</sup> and v2, the postconditions Ψ<sup>1</sup> and Ψ<sup>2</sup> of both operands can be combined into the postcondition Φ of the whole expression.

Krebbers's logic [29,30] has some limitations that impact its usability:


In this paper we address both of these problems.

We present a new algorithm for symbolic execution in separation logic. Contrary to ordinary symbolic execution in separation logic [5], our symbolic executor takes an expression and a precondition as its input, and computes not only the postcondition, but also simultaneously computes a *frame* that describes the resources that have *not* been used to prove the postcondition. The frame is used to infer the pre- and postconditions of adjacent subexpressions. For example, in e<sup>1</sup> + e2, we use the frame of e<sup>1</sup> to symbolically execute e2.

In order to enable semi-automated reasoning about C programs, we integrate our symbolic executor into a *verification condition generator (vcgen)*. Our vcgen does not merely turn programs into proof goals, but constructs the proof goals only as long as it can discharge goals automatically using our symbolic executor. When an attempt to use the symbolic executor fails, our vcgen will return a new goal, from which the vcgen can be called back again after the user helped out. This approach is useful when integrated into an interactive theorem prover.

We prove soundness of the symbolic executor and verification condition generator with respect to a refined version of the separation logic by Krebbers [29,30]. Our new logic has been developed on top of the Iris framework [24–26,33], and thereby inherits all advanced features of Iris (like its expressive support for ghost state and invariants), without having to model these explicitly. To make our new logic better suited for proving the correctness of the symbolic executor and verification condition generator, our new logic comes with a weakest precondition connective instead of Hoare triples as in Krebbers's original logic.

To streamline the soundness proof of our new program logic, we give a new *monadic definitional translation* of a subset of C relevant for non-determinism and sequence points into an ML-style functional language with concurrency. Contrary to the direct style operational semantics for a subset of C by Krebbers [29,30], our approach leads to a semantics that is both easier to understand, and easier to extend with additional language features.

We have mechanized our whole development in the Coq interactive theorem prover. The symbolic executor and verification condition generator are defined as computable functions in Coq, and have been integrated into tactics in the Iris Proof Mode/MoSeL framework [32,34]. To obtain end-to-end correctness, we mechanized the proofs of soundness of our symbolic executor and verification condition generator with respect to our new separation logic and new monadic definitional semantics for a subset of C. The Coq development is available at [18].

**Contributions.** We describe an approach to semi-automatically prove the absence of undefined behavior in a given C program for *any* evaluation order. While doing so, we make the following contributions:


# **<sup>2</sup>** *<sup>λ</sup>***MC: A Monadic Definitional Semantics of C**

In this section we describe a small C-style language called λMC, which features non-determinism in expressions. We define its semantics by translation into a ML-style functional language with concurrency called HeapLang.

We briefly describe the λMC source language (Sect. 2.1) and the HeapLang target language (Sect. 2.2) of the translation. Then we describe the translation scheme itself (Sect. 2.3). We explain in several steps how to exploit concurrency and monadic programming to give a concise and clear definitional semantics.

# **2.1 The Source Language** *<sup>λ</sup>***MC**

The syntax of our source language called λMC is as follows:

$$\begin{aligned} \mathbf{v} \in \mathsf{val} \, ::= \mathbf{z} \mid \mathbf{f} \mid \mathbf{1} \mid \mathbf{NLL} \mid \{\mathbf{v}\_{1}, \mathbf{v}\_{2}\} \mid \{\} & (\mathbf{z} \in \mathbb{Z}, \mathbf{1} \in \mathbf{Loc})\\ \mathbf{e} \in \mathsf{expr} \, ::= \mathbf{v} \mid \mathbf{x} \mid \{\mathbf{e}\_{1}, \mathbf{e}\_{2}\} \mid \mathbf{e}. \mathbf{1} \mid \mathbf{e}. 2 \mid \mathbf{e}\_{1} \otimes \mathbf{e}\_{2} \mid \begin{array}{c} (\odot \in \{+, -, \dots, \}) \\ (\odot \in \{+, -, \dots, \}) \end{array}\\ \mathbf{x} \in \mathsf{e}\_{1}; \mathbf{e}\_{2} \mid \mathsf{if} \{\mathsf{e}\_{1}\} \{\mathsf{e}\_{2}\} \mid \mathsf{while} \{\mathsf{e}\_{1}\} \{\mathsf{e}\_{2}\} \mid \mathsf{e}\_{1} \{\mathsf{e}\_{2}\} \mid \\ \mathbf{a} \mathbf{1} \mathsf{lo} \mathbf{c} \langle \mathsf{e} \rangle \mid \mathsf{\*} \mathsf{e}\_{1} = \mathsf{e}\_{2} \mid \mathsf{f} \mathsf{re} \mathbf{e} \langle \mathsf{e} \rangle \end{aligned} $$

The values include integers, NULL pointers, concrete locations l, function pointers f, structs with two fields (tuples), and the unit value () (for functions without return value). There is a global list of function definitions, where each definition is of the form f(x){e}. Most of the expression constructs resemble standard C notation, with some exceptions. We do not differentiate between expressions and statements to keep our language uniform. As such, if-then-else and sequencing constructs are not duplicated for both expressions and statements. Moreover, we do not differentiate between *lvalues* and *rvalues* [22, 6.3.2.1]. Hence, there is no address operator &, and, similarly to ML, the load (\*e) and assignment (e<sup>1</sup> = e2) operators take a reference as their first argument.

The *sequenced bind* operator <sup>x</sup> <sup>←</sup> <sup>e</sup><sup>1</sup> ; <sup>e</sup><sup>2</sup> generalizes the normal sequencing operator e<sup>1</sup> ; e<sup>2</sup> of C by binding the result of e<sup>1</sup> to the variable x in e2. As such, x ← e<sup>1</sup> ; e<sup>2</sup> can be thought of as the declaration of an immutable local variable x. We omit mutable local variables for now, but these can be easily added as an extension to our method, as shown in Sect. 7. We write e<sup>1</sup> ; e<sup>2</sup> for a sequenced bind ← e<sup>1</sup> ; e<sup>2</sup> in which we do not care about the return value of e1.

To focus on the key topics of the paper—non-determinism and the sequence point restriction—we take a minimalistic approach and omit most other features of C. Notably, we omit non-local control (return, break, continue, and goto). Our memory model is simplified; it only supports structs with two fields (tuples), but no arrays, unions, or machine integers. In Sect. 7 we show that some of these features (arrays, pointer arithmetic, and mutable local variables) can be incorporated.

# **2.2 The Target Language HeapLang**

The target language of our definitional semantics of λMC is an ML-style functional language with concurrency primitives and a call-by-value semantics. This language, called HeapLang, is included as part of the Iris Coq development [21]. The syntax is as follows:

<sup>v</sup> <sup>∈</sup> *Val* ::= <sup>z</sup> <sup>|</sup> true <sup>|</sup> false <sup>|</sup> rec f x <sup>=</sup> <sup>e</sup> <sup>|</sup> <sup>|</sup> () <sup>|</sup> ... (<sup>z</sup> <sup>∈</sup> <sup>Z</sup>, <sup>∈</sup> *Loc*) <sup>e</sup> <sup>∈</sup> *Expr* ::= <sup>v</sup> <sup>|</sup> *<sup>x</sup>* <sup>|</sup> <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>|</sup> ref(e) <sup>|</sup> !HL <sup>e</sup> <sup>|</sup> <sup>e</sup><sup>1</sup> :=HL <sup>e</sup><sup>2</sup> <sup>|</sup> assert(e) <sup>|</sup> e<sup>1</sup> ||HL e<sup>2</sup> | newmutex | acquire | release | ...

The language contains some concurrency primitives that we will use to model non-determinism in λMC. Those primitives are (||HL), newmutex, acquire, and release. The first primitive is the parallel composition operator, which executes expressions e<sup>1</sup> and e<sup>2</sup> in parallel, and returns a tuple of their results. The expression newmutex () creates a new mutex. If *lk* is a mutex that was created this way, then acquire *lk* tries to acquire it and blocks until no other thread is using *lk*. An acquired mutex can be released using release *lk*.

# **2.3 The Monadic Definitional Semantics of** *<sup>λ</sup>***MC**

We now give the semantics of λMC by translation into HeapLang. The translation is carried out in several stages, each iteration implementing and illustrating a specific aspect of C. First, we model non-determinism in expressions by concurrency, parallelizing execution of subexpressions (step 1). After that, we add checks for sequence point violations in the translation of the assignment and dereferencing operations (step 2). Finally, we add function calls and demonstrate how the translation can be simplified using a monadic notation (step 3).

**Step 1: Non-determinism via Parallel Composition.** We model the unspecified evaluation order in binary expressions like e<sup>1</sup> + e<sup>2</sup> and e<sup>1</sup> = e<sup>2</sup> by executing the subexpressions in parallel using the (||HL) operator:

$$\begin{aligned} \left[\mathfrak{e}\_1 + \mathfrak{e}\_2\right] & \stackrel{\scriptstyle \Delta}{=} \mathtt{let}\left(v\_1, v\_2\right) = \left[\mathfrak{e}\_1\right] \left[\vert\_{\text{il}}\left[\mathfrak{e}\_2\right]\right] \text{ in } v\_1 +\_{\text{nl}} v\_2\\ \left[\mathfrak{e}\_1 = \mathfrak{e}\_2\right] & \stackrel{\scriptstyle \Delta}{=} \mathtt{let}\left(v\_1, v\_2\right) = \left[\mathfrak{e}\_1\right] \left[\vert\_{\text{il}}\left[\mathfrak{e}\_2\right]\right] \text{ in }\\ & \text{match } v\_1 \text{ with}\\ & \mid \text{None} \rightarrow \mathtt{assert}(\mathtt{false}) \quad \text{(\* \text{NULL} \text{ } pointer \ \*)}\\ & \mid \text{Some } l \rightarrow \mathtt{match} \; l\_{\text{nl}} \, l \,\text{with}\\ & \mid \text{None} \rightarrow \mathtt{assert}(\mathtt{false}) \quad \text{(\* \text{ Use } \mathtt{after} \text{ } \mathtt{true} \text{ } \*)}\\ & \mid \text{Some } \mathtt{\shortrightarrow} l \coloneqq\_{\mathtt{in}} \mathtt{Some } v\_2 \text{; } v\_2 \end{aligned}$$

Since our memory model is simple, the value interpretation is straightforward:

$$\begin{array}{cc} \left\[ \mathbf{z} \right\}\_{val} \stackrel{\scriptstyle \Delta}{=} \mathbf{z} & \left( \mathbf{if} \ \mathbf{z} \in \mathbb{Z} \right) \\\\ \left\[ \left\{ \mathbf{v}\_{1}, \mathbf{v}\_{2} \right\} \right]\_{val} \stackrel{\scriptstyle \Delta}{=} \left( \left\[ \mathbf{v}\_{1} \right\}\_{val}, \left\[ \mathbf{v}\_{2} \right\}\_{val} \right) & \left[ \left\{ \mathbf{J} \right\} \right]\_{val} \stackrel{\scriptstyle \Delta}{=} \left( \left\{ \mathbf{1} \right\}\_{val} \stackrel{\scriptstyle \Delta}{=} \mathbf{Some} \ \mathbf{1} \right) \end{array}$$

The only interesting case is the translation of locations. Since there is no concept of a NULL pointer in HeapLang, we use the option type to distinguish NULL pointers from concrete locations (l). The interpretation of assignments thus contains a pattern match to check that no NULL pointers are dereferenced. A similar check is performed in the interpretation of the load operation (\*e). Moreover, each location contains an option to distinguish freed from active locations.

**Step 2: Sequence Points.** So far we have not accounted for undefined behavior due to sequence point violations. For instance, the program (x = 3) + (x = 4) gets translated into a HeapLang expression that updates the value of the location x non-deterministically to either 3 or 4, and returns 7. However, in C, the behavior of this program is *undefined*, as it exhibits a sequence point violation: there is a write conflict for the location x.

To give a semantics for sequence point violations, we follow the approach by Norrish [44], Ellison and Rosu [17], and Krebbers [29,30]. We keep track of a set of locations that have been written to since the last sequence point. We refer to this set as the *environment* of our translation, and represent it using a global variable *env* of the type mset *Loc*. Because our target language HeapLang is concurrent, all updates to the environment *env* must be executed *atomically*, *i.e.,* inside a critical section, which we enforce by employing a global mutex *lk*. The interpretation of assignments e<sup>1</sup> = e<sup>2</sup> now becomes:

ret <sup>e</sup> λ . e e<sup>1</sup> || e<sup>2</sup> <sup>λ</sup> *env lk*. (e<sup>1</sup> *env lk*) ||HL (e<sup>2</sup> *env lk*) *<sup>x</sup>* <sup>←</sup> <sup>e</sup>1; <sup>e</sup><sup>2</sup> <sup>λ</sup> *env lk*. let *<sup>x</sup>* <sup>=</sup> <sup>e</sup><sup>1</sup> *env lk* in <sup>e</sup><sup>2</sup> *env lk* atomic env <sup>e</sup> <sup>λ</sup> *env lk*. acquire *lk*; let <sup>a</sup> <sup>=</sup> <sup>e</sup> *env* in release *lk*; <sup>a</sup> atomic <sup>e</sup> <sup>λ</sup> *env lk*. acquire *lk*; let <sup>a</sup> <sup>=</sup> <sup>e</sup> *env* (newmutex ()) in release *lk*; <sup>a</sup> run(e) <sup>e</sup> (mset create ()) (newmutex ())

**Fig. 1.** The monadic combinators.

$$\begin{aligned} \left[\mathbf{e}\_1 = \mathbf{e}\_2\right] & \stackrel{\scriptstyle \mathbf{d}}{=} \mathbf{1} \mathbf{t} \left(v\_1, v\_2\right) = \left[\mathbf{e}\_1\right] \left|\mathbf{e}\_2\right] \mathbf{n} \\ & \text{acque } lk; \\ & \text{match } v\_1 \text{ with } \\ & \left|\text{None} \rightarrow \text{assert}(\text{false}) \quad \text{(\* NUL point } \*\text{)} \\ & \left|\text{ some } l \rightarrow \\ & \text{assert}(\text{---mset.maber } l \text{ } env); \ (\*\text{ Seq. point violation } \*\text{)} \right| \\ & \text{match } l\_{\text{tl}} \text{ } l \text{ with } \\ & \left|\text{None} \rightarrow \text{assert}(\text{false}) \quad \text{(\* \*\*Use after } \text{true} \*\text{)} \\ & \left|\text{ some } \rightarrow \text{nset.add } l \text{ } env; \ l \mathrel{:=} \text{tl. Some } v\_2; \\ & \text{release } lk; v\_2 \end{aligned}$$

Whenever we assign to (or read from) a location l, we check if the location l is not already present in the environment *env*. If the location l is present, then it was already written to since the last sequence point. Hence, accessing the location constitutes undefined behavior (see the assert in the interpretation of assignments above). In the interpretation of assignments, we furthermore insert the location l into the environment *env*.

In order to make sure that one can access a variable again after a sequence point, we define the *sequenced bind* operator <sup>x</sup> <sup>←</sup> <sup>e</sup><sup>1</sup> ; <sup>e</sup><sup>2</sup> as follows:

x ← e<sup>1</sup> ; e2 let x = e1 in acquire *lk*; mset clear *env*; release *lk*; e2

After we finished executing the expression e1, we clear the environment *env*, so that all locations are accessible in e<sup>2</sup> again.

**Step 3: Non-interleaved Function Calls.** As the final step, we present the correct translation scheme for function calls. Unlike the other expressions, function calls are not interleaved during the execution of subexpressions [22, 6.5.2.2p10]. For instance, in the program f() + g() the possible orders of execution are: either all the instructions in f() followed by all the instructions in g(), or all the instructions in g() followed by all the instructions in f().

```
e1 + e2 -
           (v1, v2) e1 || e2 ; ret (v1 +HL v2)
   e1 = e2 -
           (v1, v2) e1 || e2 ;
            atomic env (λ env.
              match v1 with
               | None assert(false) (* NULL pointer *)
               | Some l
                  assert(¬mset member l env); (* Seq. point violation *)
                   match !HL l with
                   | None assert(false) (* Use after free *)
                   | Some mset add l env; l :=HL Some v2; ret v2)
x e1 ; e2 -
           x e1 ; (atomic env mset clear); e2
   e1(e2) -
           (f, a) e1 || e2 ; atomic (atomic env mset clear; f a)
 f(x){e} -
           let rec f x = v e ; (atomic env mset clear); ret v
```
**Fig. 2.** Selected clauses from the monadic definitional semantics.

To model this, we execute each function call *atomically*. In the previous step we used a global mutex for guarding the access to the environment. We could use that mutex for function calls too. However, reusing a single mutex for entering each critical section would not work because a body of a function may contain invocations of other functions. To that extent, we use multiple mutexes to reflect the hierarchical structure of function calls.

To handle multiple mutexes, each C expression is interpreted as a HeapLang function that receives a mutex and returns its result. That is, each C expression is modeled by a monadic expression in the *reader monad* <sup>M</sup>(A) mset *Loc* <sup>→</sup> mutex → A. For consistency's sake, we now also use the monad to thread through the reference to the environment (mset *Loc*), instead of using a global variable *env* as we did in the previous step.

We use a small set of monadic combinators, shown in Fig. 1, to build the translation in a more abstract way. The return and bind operators are standard for the reader monad. The parallel operator runs two monadic expressions concurrently, propagating the environment and the mutex. The atomic combinator invokes a monadic expression with a fresh mutex. The atomic env combinator atomically executes its body with the current environment as an argument. The run function executes the monadic computation by instantiating it with a fresh mutex and a new environment. Selected clauses for the translation are presented in Fig. 2. The translation of the binary operations remains virtually unchanged, except for the usage of monadic parallel composition instead of the standard one. The translation for the assignment and the sequenced bind uses the atomic env combinator for querying and updating the environment. We also have to adapt our translation of values, by wrapping it in ret : v ret v*val* .

A global function definition f(x){e} is translated as a top level let-binding. A function call is then just an atomically executed function invocation in HeapLang, modulo the fact that the function pointer and the arguments are computed in parallel. In addition, sequence points occur at the beginning of each function call and at the end of each function body [22, Annex C], and we reflect that in our translation by clearing the environment at appropriate places.

Our semantics by translation can easily be extended to cover other features of C, *e.g.,* a more advanced memory model (see Sect. 7). However the fragment presented here already illustrates the challenges that non-determinism and sequence point violations pose for verification. In the next section we describe a logic for reasoning about the semantics by translation given in this section.

# **3 Separation Logic with Weakest Preconditions for** *<sup>λ</sup>***MC**

In this section we present a separation logic with weakest precondition propositions for reasoning about λMC programs. The logic tackles the main features of our semantics—non-determinism in expressions evaluation and sequence point violations. We will discuss the high-level rules of the logic pertaining to C connectives by going through a series of small examples.

The logic presented here is similar to the separation logic by Krebbers [29], but it is given in a weakest precondition style, and moreover, it is constructed *synthetically* on top of the separation logic framework Iris [24–26,33], whereas the logic by Krebbers [29] is interpreted directly in a bespoke model.

The following grammar defines the formulas of the logic:

$$\begin{aligned} \text{If } P, Q \in \mathsf{Prop} ::= \mathsf{True} \mid \mathsf{False} \mid \forall x. P \mid \exists x. P \mid \mathbf{v}\_1 = \mathbf{v}\_2 \mid \mathbf{1} \stackrel{q}{\underset{\xi}{\longmapsto}}\_{\xi} v \mid & (q \in (0, 1]) \\ P \* Q \mid P \dashrightarrow Q \mid \mathsf{wp} \, e \, \{\Phi\} \mid \ldots & (\xi \in \{L, U\}) \end{aligned}$$

Most of the connectives are commonplace in separation logic, with the exception of the modified points-to connective, which we describe in this section.

As is common, Hoare triples {P} e {Φ} are syntactic sugar for P wp e {Φ}. The weakest precondition connective wp e {Φ} states that the program e is safe (the program has defined behavior), and if e terminates to a value v, then v satisfies the predicate Φ. We write wp e {v. Φ v} for wp e {λv. Φ v}.

Contrary to the paper by Krebbers [29], we use weakest preconditions instead of Hoare triples throughout this paper. There are several reasons for doing so:


A selection of rules is presented in Fig. 3. Each inference rule P<sup>1</sup> ... P<sup>n</sup> Q in

this paper should be read as the entailment P<sup>1</sup> ∗ ... ∗ P<sup>n</sup> Q. We now explain and motivate the rules of our logic.

$$\begin{array}{ccccc}\hline\textbf{W-VALUE} & \textbf{W-W-WAND} & \textbf{W}\textbf{-V-WAND} \\\hline\textbf{wp}\textbf{v\{\{\emptyset\}} } & \textbf{wp}\textbf{e\{\emptyset\}} & \textbf{\{\psi\}}\textbf{v\{\!}\!= \boldsymbol{\Psi}\textbf{-V\{\!\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/}} & \textbf{wp\,\mathbf{e}\{\!\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/}} \\\hline\textbf{wp}\textbf{e\{\!\/\/\/\/\/\/\/}} & \textbf{wp\,\mathbf{e}\{\!\/\/\/\/\/\/\/\/\/}} & \textbf{wp\,\mathbf{e}\{\!\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/}} \\\hline\textbf{wp\,\mathbf{e}\,\mathbf{i}\,\{\{\psi\}\textbf{j}\}} & \textbf{wp\,\mathbf{e}\,\mathbf{e}\{\!\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/}} & \textbf{\{\!\/\/\/\/\/\/\/\/}} \\\hline\textbf{wp\,\mathbf{e}\,\{\!\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/}} & \textbf{wp\,\mathbf{e}\,\{\!\/\/\/\/\/\/\/\/\/\/}} & \textbf{wp\,\mathbf{e}\,\{\!\/\/\/\/\/\/\/\/\/}} \\\hline\textbf{wp\,\mathbf{e}\,\{\!\/^\/\/\/\/\/\/\/\/}} & \textbf{wp\,\mathbf{e}\,\{\!\/^\/\/\/\/\/\/\/\/$$

$$\begin{array}{llll} \text{IMPSTO-SPLI} & & \begin{array}{l} \text{MPSTO-VALUES-AGREE} \\ \mathbf{1} \stackrel{q\_1}{\underset{\xi\_1}{\longleftarrow}} \boldsymbol{\xi\_1} \ast \mathbf{1} \stackrel{q\_2}{\underset{\xi\_2}{\longleftarrow}} \boldsymbol{\xi\_2} \text{ v} \ \mathsf{ }\!\!=\!1 \ } \stackrel{q\_1+q\_2}{\underset{\xi\_1}{\longleftarrow}} \boldsymbol{\xi\_1} \circ \boldsymbol{\xi\_2} & \mathsf{ }\!\!\!=\!\!2 \end{array} & \begin{array}{l} \text{MAPSTO-VALUES-AGREE} \\ \mathbf{1} \stackrel{q\_1}{\underset{\xi\_1}{\longleftarrow}} \boldsymbol{\xi\_1} \;\mathbf{v\_1} & \mathsf{ }\!\!\!=\!\!\!\!\!\!} \end{array} \\\hline \begin{array}{ll} \text{MAPSTO-VALUES-AGREE} \\ \mathbf{v\_1} = \mathbf{v\_2} \end{array} \\\hline \begin{array}{ll} \text{U-UNLOC} & \mathsf{ U-NTOI} \\ P \xrightarrow[P \quad \mathsf{U} \;\!\!=\!\!\!\!\!\!\perp \,\!\!\!\!\!\!\/\/] \\\ \mathsf{U} \, P \xrightarrow[P \quad \mathsf{U} \;\!\!=\!\!\!\!\!\perp \,\!\!\!\!\/) \\\ \mathsf{U} \, P \xrightarrow[P \quad \mathsf{U} \;\!\!=\!\!\!\!\perp \,\!\!\!\perp \,\!\!\!\/) \end{array}$$

**Fig. 3.** Selected rules for weakest preconditions.

**Non-determinism.** In the introduction (Sect. 1) we have already shown the rule for addition from Krebbers's logic [29], which was written using Hoare triples. Using weakest preconditions, the corresponding rule (wp-bin-op) is:

$$\frac{\mathsf{w}\mathsf{w}\mathsf{e}\_{1}\left\{\Psi\_{1}\right\}}{}{} \frac{\mathsf{w}\mathsf{p}\ \mathsf{e}\_{2}\left\{\Psi\_{2}\right\}}{}{} \frac{\left(\forall\mathsf{w}\_{1}\mathsf{w}\_{2}.\,\Psi\_{1}\ \mathsf{w}\_{1}\*\Psi\_{2}\ \mathsf{w}\_{2}\ \mathsf{e}\*\Phi(\mathsf{w}\_{1}\left\{\mathbb{G}\right\}\,\mathsf{w}\_{2})\right)}{}$$

This rule closely resembles the usual rule for parallel composition in ordinary concurrent separation logic [46]. This should not be surprising, as we have given a definitional semantics to binary operators using the parallel composition operator. It is important to note that the premises wp-bin-op are combined using the *separating conjunction* <sup>∗</sup>. This ensures that the weakest preconditions wp <sup>e</sup><sup>1</sup> {Ψ1} and wp e<sup>2</sup> {Ψ2} for the subexpressions e<sup>1</sup> and e<sup>2</sup> are verified with respect to disjoint resources. As such they do not interfere with each other, and can be evaluated in parallel without causing sequence point violations.

To see how one can use the rule wp-bin-op, let us verify P wp (e<sup>1</sup> + e2) {Φ}. That is, we want to show that (e<sup>1</sup> + e2) satisfies the postcondition Φ assuming the precondition P. This goal can be proven by separating the precondition P into disjoint parts P<sup>1</sup> ∗ P<sup>2</sup> ∗ R P. Then using wp-bin-op the goal can be reduced to proving P<sup>i</sup> wp e<sup>i</sup> {Ψi} for i ∈ {0, 1}, and R ∗ Ψ<sup>1</sup> w<sup>1</sup> ∗ Ψ<sup>2</sup> w<sup>2</sup> Φ(w<sup>1</sup> - w2) for any return values w<sup>i</sup> of the expressions ei.

**Fractional Permissions.** Separation logic includes the *points-to connective* l → v, which asserts unique ownership of a location l with value v. This connective is used to specify the behavior of stateful operations, which becomes apparent in the following proposed rule for load:

$$\frac{\mathsf{wp} \bullet \{1.\exists \mathsf{w}.1 \longmapsto \mathsf{w} \* (1 \longmapsto \mathsf{w} \to \mathsf{\Phi} \ \mathsf{w})\}}{\mathsf{wp}\,(^{\mathsf{\*}\mathsf{e}})\,\{\mathsf{\Phi}\}}.$$

In order to verify \*e we first make sure that e evaluates to a location l, and then we need to provide the points-to connective l → w for some value stored at the location. This rule, together with wp-value, allows for verification of simple programs like l → v wp (\*l) {w. w = v ∗ l → v}.

However, the rule above is too weak. Suppose that we wish to verify the program \*l+\*l from the precondition l → v. According to wp-bin-op, we have to separate the proposition l → v into two disjoint parts, each used to verify the load operation. In order to enable sharing of points-to connectives we use *fractional permissions* [7,8]. In separation logic with fractional permissions each points-to connective is annotated with a fraction q ∈ (0, 1], and the resources can be split in accordance with those fractions:

$$\mathbf{1} \xleftarrow{q\_1 + q\_2} \mathbf{v} \xleftarrow{} \mathbf{1} \xleftarrow{q\_1} \mathbf{v} \ast \mathbf{1} \xleftarrow{q\_2} \mathbf{v} \dots$$

A connective l <sup>1</sup> −→ v provides a unique ownership of the location, and we refer to it as a *write permission*. A points-to connective with <sup>q</sup> <sup>≤</sup> 1 provides shared ownership of the location, referred to as a *read permission*. By convention, we write <sup>l</sup> <sup>→</sup> <sup>v</sup> to denote the write permission <sup>l</sup> <sup>1</sup> −→ v.

With fractional permissions at hand, we can relax the proposed load rule, by allowing to dereference a location even if we only have a read permission:

$$\frac{\mathsf{w}\mathsf{p}\mathsf{e}\left\{\mathsf{1}.\exists\mathsf{w}\mathsf{q}.\mathsf{1}\stackrel{q}{\leftrightarrow}\mathsf{w}\*\left(\mathsf{1}\stackrel{q}{\leftrightarrow}\mathsf{w}\dashrightarrow\mathsf{q}\mathsf{b}\mathsf{w}\right)\right\}}{\mathsf{w}\mathsf{p}\left(^{\ast}\mathsf{e}\right)\left\{\mathsf{p}\right\}}\,\,\middle|$$

This corresponds to the intuition that multiple subexpressions can safely dereference the same location, but not write to them.

Using the rule above we can verify l → 1 wp (\*l + \*l) {v. v = 2 ∗ l → 1} by splitting the assumption into <sup>l</sup> <sup>0</sup>.<sup>5</sup> −−→ <sup>1</sup> <sup>∗</sup> <sup>l</sup> <sup>0</sup>.<sup>5</sup> −−→ 1 and first applying wp-bin-op with <sup>Ψ</sup><sup>1</sup> and <sup>Ψ</sup><sup>2</sup> being <sup>λ</sup>v. (<sup>v</sup> = 1) <sup>∗</sup> <sup>l</sup> <sup>0</sup>.<sup>5</sup> −−→ 1. Then we apply wp-load on both subgoals. After that, we can use mapsto-split to prove the remaining formula:

$$\mathbf{1} \left(\mathbf{v}\_1 = 1\right) \ast \mathbf{1} \xrightarrow{0.5} 1 \ast \left(\mathbf{v}\_2 = 1\right) \ast \mathbf{1} \xrightarrow{0.5} 1 \ \vdash \ \left(\mathbf{v}\_1 + \mathbf{v}\_2 = 2\right) \ast \mathbf{1} \hookrightarrow 1.$$

**The Assignment Operator.** The second main operation that accesses the heap is the assignment operator e<sup>1</sup> = e2. The arguments on the both sides of the assignment are evaluated in parallel, and a points-to connective is required to perform an update to the heap. A naive version of the assignment rule can be obtained by combining the binary operation rule and the load rule:

$$\frac{\left(\mathsf{wp}\ \mathsf{e}\_{1}\ \{\Psi\_{1}\}\right)\quad\mathsf{wp}\ \mathsf{e}\_{2}\ \{\Psi\_{2}\}\qquad\left(\forall 1\ \mathsf{v}.\,\Psi\_{1}\ 1\*\Psi\_{2}\ \mathsf{v}\ \mathsf{-\*}\ \exists\mathsf{v}.\,1\mapsto\mathsf{v}\*\left(1\mapsto\mathsf{w}\ \mathsf{-\*}\Phi\ \mathsf{v}\right)\right)}{\mathsf{wp}\left(\mathsf{e}\_{1}=\mathsf{e}\_{2}\right)\left\{\Phi\right\}}$$

The write permission l → v can be obtained by combining the resources of both sides of the assignment. This allows us to verify programs like l = \*l + \*l.

However, the rule above is unsound, because it fails to account for sequence point violations. We could use the rule above to prove safety of undefined programs, *e.g.,* the program l = (l = 3).

To account for sequence point violations we decorate the points-to connectives l <sup>q</sup> −→<sup>ξ</sup> <sup>v</sup> with *access levels* <sup>ξ</sup> ∈ {L, U}. These have the following semantics: we can read from and write to a location that is unlocked (U), and the location becomes locked (L) once someone writes to it. Proposition l <sup>q</sup> −→<sup>U</sup> v (resp. l <sup>q</sup> −→<sup>L</sup> v) asserts ownership of the unlocked (resp. locked) location l. We refer to such propositions as *lockable points-to connectives*. Using lockable points-to connectives we can formulate the correct assignment rule:

$$\begin{array}{c} \mathsf{wp}\ \mathsf{e}\_{1}\ \{\mathsf{\varPsi}\_{1}\} \qquad \mathsf{wp}\ \mathsf{e}\_{2}\ \{\mathsf{\varPsi}\_{2}\} \qquad \left(\mathsf{\varPsi}\ \mathsf{1}\ \mathsf{w}.\varPsi\_{1}\ \mathsf{1}\*\mathsf{\varPsi}\_{2}\ \mathsf{w}\ \mathsf{\varr}\ \mathsf{\varr}.\mathsf{1}\mapsto\mathsf{v}\*\left(\mathsf{1}\mapsto\_{L}\mathsf{w}\ \mathsf{r}\*\mathsf{\varPhi}\ \mathsf{w}\right)\right) \\\hline \mathsf{wp}\ \left(\mathsf{e}\_{1}=\mathsf{e}\_{2}\right)\ \{\mathsf{\varPhi}\} \end{array}$$

The set {L, U} has a lattice structure with L ≤ U, and the levels can be combined with a join operation, see mapsto-split. By convention, l <sup>q</sup> −→ v denotes l <sup>q</sup> −→<sup>U</sup> v.

**The Unlocking Modality.** As locations become locked after using the assignment rule, we wish to unlock them in order to perform further heap operations. For instance, in the expression l = 4 ; \*l the location l becomes unlocked after the sequence point ";" between the store and the dereferencing operations. To reflect this in the logic, we use the rule wp-seq which features the *unlocking modality* U (which is called the unlocking assertion in [29, Definition 5.6]):

$$\frac{\mathsf{wp}\ \mathsf{e}\_{1}\ \{\mathsf{\small\mkern-1.5mu\mathsf{p}\ \mathsf{e}\_{2}\ \{\mathsf{\kern-1.5mu\mathsf{p}\}\mathsf{i}\}\}}{\mathsf{wp}\ \mathsf{(\kern-1.5mu\mathsf{p}\ \{\mathsf{e}\_{1}\ \mathsf{i}\ \mathsf{e}\_{2}\}\ \{\mathsf{p}\}}}$$

Intuitively, UP states that P holds, after unlocking all locations. The rules of U in Fig. <sup>3</sup> allow one to turn (P<sup>1</sup> <sup>∗</sup> ... <sup>∗</sup> <sup>P</sup>m) <sup>∗</sup> (l<sup>1</sup> <sup>→</sup><sup>L</sup> <sup>v</sup><sup>1</sup> <sup>∗</sup> ... <sup>∗</sup> <sup>l</sup><sup>m</sup> <sup>→</sup><sup>L</sup> <sup>v</sup>m) <sup>U</sup><sup>Q</sup> into (P<sup>1</sup> ∗...∗Pm)∗(l<sup>1</sup> →<sup>U</sup> v<sup>1</sup> ∗...∗l<sup>m</sup> →<sup>U</sup> vm) Q. This is done by applying either U-unlock or U-intro to each premise; then collecting all premises into one formula under U by U-sep; and finally, applying U-mono to the whole sequent.

# **4 Soundness of Weakest Preconditions for** *<sup>λ</sup>***MC**

In this section we prove adequacy of the separation logic with weakest preconditions for λMC as presented in Sect. 3. We do this by giving a model using the Iris framework that is structured in a similar way as the translation that we gave in Sect. 2. This translation consisted of three layers: the target HeapLang language, the monadic combinators, and the λMC operations themselves. In the model, each corresponding layer abstracts from the details of the previous layer, in such a way that we never have to break the abstraction of a layer. At the end, putting all of this together, we get the following adequacy statement:

**Theorem 4.1 (Adequacy of Weakest Preconditions).** *If* wp <sup>e</sup> {Φ} *is derivable, then* e *has no undefined behavior for any evaluation order. In other words,* run(e) *does not assert false.*

The proof of the adequacy theorem closely follows the layered structure, by combining the correctness of the monadic run combinator with adequacy of HeapLang in Iris [25, Theorem 6]. The rest of this section is organized as:


# **4.1 Weakest Preconditions for HeapLang**

We recall the most essential Iris connectives for reasoning about HeapLang programs: wpHL e {Φ} and →HL v, which are the HeapLang weakest precondition proposition and the HeapLang points-to connective, respectively. Other Iris connectives are described in [6, Section 8.1] or [25,33]. An example rule is the store rule for HeapLang, shown in Fig. 4. The rule requires a points-to connective →HL v, and the user receives the updated points-to connective →HL w back for proving Φ (). Note that the rule is formulated for a concrete location and a value w, instead of arbitrary expressions. This does not limit the expressive power; since the evaluation order in HeapLang is deterministic<sup>1</sup>, arbitrary expressions can be handled using the wphl-bind rule. Using this rule, one can bind an expression e in an arbitrary evaluation context K. We can thus use the wphl-bind rule twice to derive a more general store rule for HeapLang:

$$\frac{\mathtt{w}\mathtt{p}\_{\mathtt{\mathsf{H}}}\,e\_{2}\left\{w.\,\mathtt{w}\mathtt{p}\_{\mathtt{\mathsf{H}}}\,e\_{1}\left\{\ell.\left(\exists v.\,\ell\mapsto\_{\mathtt{\mathsf{H}}}v\right)\*\left(\ell\mapsto\_{\mathtt{\mathsf{H}}}w\twoarrow\star\Phi\left(\right)\right)\right\}\right\}}{\mathtt{w}\mathtt{p}\_{\mathtt{\mathsf{H}}}\,\left(e\_{1}\left\{:=\_{\mathtt{\mathsf{H}}}e\_{2}\right\}\left\{\Phi\right\}\right)}$$

<sup>1</sup> And right-to-left, although our monadic translation does not rely on that.

$$\begin{array}{lll} (\ell \mapsto\_{\mathsf{HL}} \upsilon) \* (\ell \mapsto\_{\mathsf{HL}} \upsilon \dashrightarrow \Phi \upsilon) \vdash \mathsf{wp}\_{\mathsf{HL}} \restriction\_{\mathsf{HL}} \ell \{\Phi\} & \mathsf{wp}\_{\mathsf{HL}} \mathrel{\mathop{:}\mathsf{BIND}} & \mathsf{wp}\_{\mathsf{HL}} \mathrel{\mathop{:}\mathsf{BIND}} \{\Phi\} \\ (\ell \mapsto\_{\mathsf{HL}} \upsilon) \* (\ell \mapsto\_{\mathsf{HL}} w \dashrightarrow \Phi \ ()) \vdash \mathsf{wp}\_{\mathsf{HL}} \ell \mathrel{\mathop{:}\mathsf{w} ::\_{\mathsf{HL}} w \left\{\Phi\right\} & \mathsf{wp}\_{\mathsf{HL}} K[e \;] \; \{\Phi\} \\ \end{array}$$

$$\begin{aligned} R\*(\forall \gamma \,lk \, \mathsf{is} \, \mathsf{mut}(\gamma, \,lk, R) \dashrightarrow \mathsf{op}\_{\mathsf{HL}} \, \mathsf{new} \, \mathsf{true} \, \mathsf{ex} \, \mathsf{true} \, \mathsf{ex} \, \mathsf{i} \, \{\mathsf{\oplus}\} \\ \mathsf{is} \, \mathsf{mut}(\gamma, \,lk, R) \* (R\* \mathsf{block}(\gamma) \dashrightarrow \mathsf{d}^{\mathsf{c}}()) \vdash \mathsf{w}p\_{\mathsf{HL}} \, \mathsf{ac} \, \mathsf{query} \, \, lk \, \{\mathsf{\oplus}\} \\ \mathsf{is} \, \mathsf{mut}(\gamma, \,lk, R) \* R\* \mathsf{block}(\gamma) \* \Phi \, () \vdash \mathsf{w}p\_{\mathsf{HL}} \, \mathsf{rule} \, \, lk \, \{\Phi\} \\ \mathsf{is} \, \mathsf{mut}(\gamma, \,lk, R) \* \mathsf{is} \, \mathsf{mut}(\gamma, \,lk, R) \dashrightarrow \mathsf{is} \, \mathsf{mut}(\gamma, \,lk, R) \qquad \{\mathsf{is} \,\mathsf{M} \, U \mathsf{EX} \, \mathsf{D} \,\|\, \mathsf{y}\} \end{aligned}$$

To verify the monadic combinators and the translation of λMC operations in the upcoming Sects. 4.2 and 4.4, we need the specifications for all the functions that we use, including those on mutable sets and mutexes. The rules for mutable sets are standard, and thus omitted. They involve the usual abstract predicate is mset(*s*, X) stating that the reference *s* represents a set with contents X. The rules for mutexes are presented in Fig. 4. When a new mutex is created, a user gets access to a proposition is mutex(γ, *lk*, R), which states that the value *lk* is a mutex containing the resources R. This proposition can be duplicated freely (ismutex-dupl). A thread can acquire the mutex and receive the resources contained in it. In addition, the thread receives a token locked(γ) meaning that it has entered the critical section. When a thread leaves the critical section and releases the mutex, it has to give up both the token and the resources R.

#### **4.2 Weakest Preconditions for Monadic Expressions**

As a next step, we define a weakest precondition proposition wpmon e {Φ} for a monadic expression e. The definition is constructed in the ambient logic, and it encapsulates the monadic operations in a separate layer. Due to that, we are able to carry out proofs of high-level specifications without breaking the abstraction (Sect. 4.4). The specifications for selected monadic operations in terms of wpmon are presented in Fig. 5. We define the weakest precondition for a monadic expression e as follows:

$$\mathsf{wsp}\_{\mathsf{men}} \, e \left\{ \Phi \right\} \stackrel{\Delta}{=} \mathsf{wsp}\_{\mathsf{H}} \, e \left\{ \begin{aligned} g. \forall \gamma \,\,\,\epsilon nv \,\,lk \,\,\mathsf{is}. \mathsf{mut} \,\mathsf{ex}(\gamma, \,\,lk, \mathsf{env}. \mathsf{inv}(\,\,\mathsf{env})) \,\,\,\,\ast \\ \mathsf{wp}\_{\mathsf{H}} \, \left( \begin{aligned} g. \,\,\,\,\mathsf{env} \,\,\,\,\,\mathsf{k} \right) \,\{\Phi\} \end{aligned} \right\} \end{aligned} \right\}$$

The idea is that we first reduce e to a monadic value *g*. To perform this reduction we have the outermost wpHL connective in the definition of wpmon. This monadic value is then evaluated with an arbitrary environment and an arbitrary mutex. Note that we universally quantify over any mutex *lk* to support nested locking in atomic . This definition is parameterized by an *environment invariant* env inv(*env*), which describes the resources accessible in the critical sections. We show how to define env inv in the next subsection.

$$\begin{array}{c} \text{WP-RET} \\ \begin{array}{c} \text{\(\mathsf{wp}\_{\mathsf{HL}} \text{ } e \,\{\Phi\}\text{)} \\ \text{\(\mathsf{wp}\_{\mathsf{man}} \text{ } (\mathsf{ret} \, e) \,\{\Phi\}\text{)} \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \text{\(\mathsf{WP-BIND}\)} \\ \text{\(\mathsf{wp}\_{\mathsf{man}} \text{ } e\_1 \,\{\mathsf{v}. \, \mathsf{wp}\_{\mathsf{man}} \, e\_2 [\,\mathsf{v}/x] \,\{\Phi\}\text{)} \end{array} \end{array}$$

wp-par

$$\frac{\mathfrak{w}\mathfrak{p}\_{\mathfrak{m}m}{}\_{e1}\left\{\Psi\_{1}\right\}}{}\_{1} \quad \mathfrak{w}\mathfrak{p}\_{\mathfrak{m}m}{}\_{e2}\left\{\Psi\_{2}\right\}} \quad \quad \quad \left(\forall w\_{1}w\_{2}.\,\Psi\_{1}\ w\_{1}\*\Psi\_{2}\ w\_{2}\*\Phi\left(w\_{1},w\_{2}\right)\right)}{}\_{1}$$

wp-atomic-env <sup>∀</sup>*env*. env inv(*env*) −∗ wpHL (<sup>v</sup> *env*) {w. env inv(*env*) <sup>∗</sup> Φ w} wpmon (atomic env <sup>v</sup>) {Φ}

**Fig. 5.** Selected monadic wpmon rules.

Using this definition we derive the monadic rules in Fig. 5. In a monad, the expression evaluation order is made explicit via the bind operation x ← e1; e2. To that extent, contrary to HeapLang, we no longer have a rule like wphl-bind, which allows to bind an expression in a general evaluation context. Instead, we have the rule wp-bind, which reflects that the only evaluation context we have is the monadic bind x ← [ *•* ]; e.

#### **4.3 Modeling the Heap**

The monadic rules in Fig. 5 are expressive enough to derive some of the λMClevel rules, but we are still missing one crucial part: handling of the heap. In order to do that, we need to define lockable points-to connectives l <sup>q</sup> −→<sup>ξ</sup> v in such a way that they are linked to the HeapLang points-to connectives →HL v.

The key idea is the following. The environment invariant env inv of monadic weakest preconditions will track *all* HeapLang points-to connectives <sup>→</sup>HL <sup>v</sup> that have ever been allocated at the λMC level. Via Iris ghost state, we then connect this knowledge to the lockable points-to connectives l <sup>q</sup> −→<sup>ξ</sup> v. We refer to the construction that allows us to carry this out as the *lockable heap*. Note that the description of lockable heap is fairly technical and requires an understanding of the ghost state mechanism in Iris.

A lockable heap is a map σ : *Loc* fin <sup>−</sup> {L, U} × *Val* that keeps track of the access levels and values associated with the locations. The connective full heap(σ) asserts the ownership of all the locations present in the domain of σ. Specifically, it asserts <sup>→</sup>HL <sup>v</sup> for each {←(ξ,v)} ∈ <sup>σ</sup>. The connective <sup>q</sup> −→<sup>ξ</sup> v then states that {←(ξ,v)} is part of the global lockable heap, and it asserts this with the fractional permission q. We treat the lockable heap as an opaque abstraction, whose exact implementation via Iris ghost state is described in the Coq formalization [18]. The main interface for the locking heap are the rules in Fig. 6. The rule heap-alloc states that we can turn a HeapLang points-to connective →HL v into −→<sup>ξ</sup> v by changing the lockable heap σ accordingly. The

$$
\begin{array}{cc}
\mathsf{HEAD-ALLIC} \\
\ell \mapsto\_{\mathsf{HL}} \upsilon & \mathsf{full\\_head}(\sigma) \\
\hline
\mathsf{find} \mapsto\_{U} \upsilon \* \mathsf{full\\_head}(\sigma \,[\ell \gets (U, \upsilon)])
\end{array}
$$

heap-upd

 <sup>U</sup> v full heap(σ) <sup>|</sup>σ()=(U, v) <sup>∗</sup> HL <sup>v</sup> <sup>∗</sup> (∀v ξ . HL <sup>v</sup> ≡−∗ <sup>ξ</sup> <sup>v</sup> <sup>∗</sup> full heap(<sup>σ</sup> (ξ , v ) ))

rule heap-upd states that given −→<sup>ξ</sup> v, we can temporarily get a HeapLang points-to connective →HL v out of the locking heap and update its value.

The environment invariant env inv(*env*) in the definition of wpmon ties the contents of the lockable heap to the contents of the environment *env*:

$$\mathsf{enw.inv}(env) \triangleq \exists \sigma \, X. \, \mathsf{is.set}(env, X) \* \mathsf{full.head}(\sigma) \* \left(\forall \ell \in X. \exists v. \sigma(\ell) = (L, v)\right)$$

The first conjunct states that X : ℘fin(*Loc*) is a set of locked locations, according to the environment *env*. The second conjunct asserts ownership of the global lockable heap σ. Finally, the last conjunct states that the contents of *env* agrees with the lockable heap: every location that is in X is locked according to σ.

**The Unlocking Modality.** The unlocking modality is defined in the logic as:

$$\mathsf{U}P \triangleq \exists S. \left(\bigotimes\_{(1,\mathsf{v},q)\in S} \mathsf{1}\xleftarrow{q}\_{L} \mathsf{v}\right) \ast \left(\left(\bigotimes\_{(1,\mathsf{v},q)\in S} \mathsf{1}\xleftarrow{q}\_{U} \mathsf{v}\right) \twoheadrightarrow P\right)$$
 
$$\text{is a finite multiset of tuples containing locations, values, and } \mathsf{d}$$

Here S is a finite multiset of tuples containing locations, values, and fractions. The update modality accumulates the locked locations, waiting for them to be unlocked at a sequence point.

# **4.4 Deriving the** *<sup>λ</sup>***MC Rules**

To model weakest preconditions for λMC (Fig. 3) we compose the construction we have just defined with the translation of Sect. 2 wp e {Φ} wpmon e {Φ }. Here, Φ is the obvious lifting of Φ from λMC values to HeapLang values. Using the rules from Figs. 5 and 6 we derive the high-level λMC rules without unfolding the definition of the monadic wpmon.

*Example 4.2.* Consider the rule wp-store for assignments e<sup>1</sup> = e2. Using wp-bind and wp-par, the soundness of wp-store can be reduced to verifying the assignment with e<sup>1</sup> being l, e<sup>2</sup> being v , under the assumption l →<sup>U</sup> v. We use wp-atomic-env to turn our goal into a HeapLang weakest precondition proposition and to gain access an environment *env*, and to the proposition env inv(*env*), from which we extract the lockable heap σ. We then use heap-upd to get access to the underlying HeapLang location and obtain that l is not locked according to σ. Due to the environment invariant, we obtain that l is not in *env*, which allows us to prove the assert for sequence point violation in the interpretation of the assignment. Finally, we perform the physical update of the location.

# **5 A Symbolic Executor for** *<sup>λ</sup>***MC**

In order to turn our program logic into an automated procedure, it is important to have rules for weakest preconditions that have an algorithmic form. However, the rules for binary operators in our separation logic for λMC do not have such a form. Take for example the rule wp-bin-op for binary operators e<sup>1</sup> e2. This rule cannot be applied in an algorithmic manner. To use the rule one should supply the postconditions for e<sup>1</sup> and e2, and frame the resources from the context into two disjoint parts. This is generally impossible to do automatically.

To address this problem, we first describe how the rules for binary operators can be transformed into algorithmic rules by exploiting the notion of *symbolic execution* [5] (Sect. 5.1). We then show how to implement these algorithmic rules as part of an automated symbolic execution procedure (Sect. 5.2).

#### **5.1 Rules for Symbolic Execution**

We say that we can *symbolically execute* an expression e using a *precondition* P, if we can find a *symbolic execution tuple* (w, Q, R) consisting of a *return value* w, a *postcondition* Q, and a *frame* R satisfying:

$$P \vdash \mathbf{w} \mathbf{p} \; \mathbf{e} \; \{\mathbf{v} . \mathbf{v} = \mathbf{w} \* Q\} \* R$$

This specification is much like that of ordinary symbolic execution in separation logic [5], but there is important difference. Apart from computing the postcondition Q and the return value w, there is also the frame R, which describes the resources that are *not used* for proving e. For instance, if the precondition P is <sup>P</sup> <sup>∗</sup><sup>l</sup> <sup>q</sup> −→ w and e is a load operation \*l, then we can symbolically execute e with the postcondition <sup>Q</sup> being <sup>l</sup> q/<sup>2</sup> −−→ <sup>w</sup>, and the frame <sup>R</sup> being <sup>P</sup> <sup>∗</sup><sup>l</sup> q/<sup>2</sup> −−→ <sup>w</sup>. Clearly, P is not needed for proving the load, so it can be moved into the frame. More interestingly, since loading the contents of l requires a read permission l <sup>p</sup> −→ w, with <sup>p</sup> <sup>∈</sup> (0, 1], we can split the hypothesis <sup>l</sup> <sup>q</sup> −→ w into two halves and move one into the frame. Below we will see why that matters.

If we can symbolically execute one of the operands of a binary expression e<sup>1</sup> e2, say e<sup>1</sup> in P, and find a symbolic execution tuple (w1, Q, R), then we can use the following admissible rule:

$$\frac{R \vdash \mathsf{wp} \; \mathsf{e}\_2 \; \left\{ \mathsf{w}\_2 . Q \; \twoheadrightarrow \not\Phi \; \left( \mathsf{w}\_1 \; \left[ \left[ \odot \right] \; \mathsf{w}\_2 \right) \right) \right\}}{P \vdash \mathsf{wp} \; \left( \mathsf{e}\_1 \; \left@ \; \mathsf{e}\_2 \right) \; \left\{ \Phi \right\} \right)}$$

This rule has a much more algorithmic flavor than the rule wp-bin-op. Applying the above rule now boils down to finding such a tuple (w, Q, R), instead of having to infer postconditions for both operands, as we need to do to apply wp-bin-op.

For instance, given an expression (\*l) <sup>e</sup><sup>2</sup> and a precondition <sup>P</sup> <sup>∗</sup> <sup>l</sup> <sup>q</sup> −→ v, we can derive the following rule:

$$\frac{P' \ast \mathbf{1} \stackrel{q/2}{\longrightarrow} \mathbf{v} \vdash \mathsf{wp} \; \mathsf{e}\_2 \left\{ \mathbf{w}\_2 \mathbf{1} \stackrel{q/2}{\longmapsto} \mathbf{v} \multimap \Phi \left( \mathbf{v} \left[ \left@{\odot} \right] \mathbf{w}\_2 \right) \right\} \right\}}{P' \ast \mathbf{1} \stackrel{q}{\stackrel{q}{\longrightarrow}} \mathbf{v} \vdash \mathsf{wp} \left( \mathbf{\*} \mathbf{1} \stackrel{\mathrm{\odot}}{\odot} \mathbf{e}\_2 \right) \left\{ \Phi \right\}}$$

This rule matches the intuition that only a fraction of the permission l <sup>q</sup> −→ v is needed to prove a load \*l, so that the remaining half of the permission can be used to prove the correctness of e<sup>2</sup> (which may contain other loads of l).

#### **5.2 An Algorithm for Symbolic Execution**

For an arbitrary expression e and a proposition P, it is unlikely that one can find such a symbolic execution tuple (w, Q, R) automatically. However, for a certain class of C expressions that appear in actual programs we can compute a choice of such a tuple. To illustrate our approach, we will define such an algorithm for a small subset expr of C expressions described by the following grammar:

$$
\mathfrak{g} \in \overline{\mathtt{expr}} ::= \mathtt{v} \mid \, \ast \mathtt{\mathfrak{g}} \mid \mathfrak{g}\_1 = \mathfrak{g}\_2 \mid \mathfrak{g}\_1 \odot \mathfrak{g}\_2 \dots
$$

We keep this subset small to ease presentation. In Sect. 7 we explain how to extend the algorithm to cover the sequenced bind operator x ← ¯e<sup>1</sup> ; ¯e2.

Moreover, to implement symbolic execution, we cannot manipulate arbitrary separation logic propositions. We thus restrict to *symbolic heaps* (<sup>m</sup> <sup>∈</sup> sheap), which are defined as finite partial functions Loc fin − ({L, U} × (0, 1] × val) representing a collection of points-to propositions:

$$\begin{array}{rcl} \lbrack m \rbrack & \triangleq & \mathsf{PK} \begin{array}{c} 1 \stackrel{q}{\underset{\xi}{\rightleftharpoons}} \mathsf{v}. \\ \mathsf{I} \in \mathsf{dom}(m) \\ m(\mathsf{1}) = (\xi, \mathsf{q}, \mathsf{v}) \end{array} \end{array}$$

We use the following operations on symbolic heaps:


$$(m\_1 \sqcup m\_2)(\mathbf{1}) = \begin{cases} m\_i(\mathbf{1}) & \text{if } \mathbf{1} \in \text{dom}(m\_i) \text{ and } \mathbf{1} \notin \text{dom}(m\_j) \\\\ (\xi \vee \xi', q + q', \mathbf{v}) & \text{if } m\_1(\mathbf{1}) = (\xi, q, \mathbf{v}) \text{ and } m\_2(\mathbf{1}) = (\xi', q', \mathbf{1}). \end{cases}$$

With this representation of propositions, we define the symbolic execution algorithm as a partial function forward : (sheap × expr) → (val × sheap × sheap), which satisfies the specification stated in Sect. 5.1, *i.e.,* for which the following holds:

**Theorem 5.1.** *Given an expression* e *and an symbolic heap* m*, if* forward(m, e) *returns a tuple* (w, m<sup>o</sup> <sup>1</sup>, m1)*, then* m wp e {v. v = w ∗ m<sup>o</sup> <sup>1</sup>} ∗ m1.

The definition of the algorithm is shown in Fig. 7. Given a tuple (m, e), a call to forward(m, e) either returns a tuple (v, mo, m ) or fails, which either happens when e ∈ expr or when one of intermediate steps of computation fails. In the latter cases, we write forward(m, e) = ⊥.

The algorithm proceeds by case analysis on the expression e. In each case, the expected output is described by the equation forward(m, e) = (v, mo, m ). The results of the intermediate computations appear on separate lines under the clause "**where** ...". If one of the corresponding equations does not hold, *e.g.,* a recursive call fails, then the failure is propagated. Let us now explain the case for the assignment operator.

If e is an assignment operator e<sup>1</sup> = e2, we first evaluate e<sup>1</sup> and then e2. Fixing the order of symbolic execution from left to right does not compromise the non-determinism underlying the C semantics of binary operators. Indeed, when forward(m, e1) = (v1, m<sup>o</sup> <sup>1</sup>, m1), we evaluate the expression e2, using the frame m1, *i.e.,* only the resources of m that remain after the execution of e1. When forward(m, e1) = (l, m<sup>o</sup> <sup>1</sup>, m1), with <sup>l</sup> <sup>∈</sup> Loc, and forward(m1, <sup>e</sup>2) = (v2, m<sup>o</sup> <sup>2</sup>, m2), the function delete full 2(l, m2, m<sup>o</sup> <sup>1</sup> <sup>m</sup><sup>o</sup> <sup>2</sup>) checks whether (m<sup>2</sup> <sup>m</sup><sup>o</sup> <sup>1</sup> <sup>m</sup><sup>o</sup> <sup>2</sup>)(l)

forward(m, v) - (v, <sup>∅</sup>, m) forward(m, <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup>) - (v<sup>1</sup> <sup>v</sup><sup>2</sup>, m<sup>o</sup> <sup>1</sup> <sup>m</sup><sup>o</sup> <sup>2</sup>, m2) **where** (v<sup>1</sup>, m<sup>o</sup> <sup>1</sup>, m1) = forward(m, e<sup>1</sup>) (v<sup>2</sup>, m<sup>o</sup> <sup>2</sup>, m2) = forward(m1, e<sup>2</sup>) forward(m, \*e<sup>1</sup>) - (w, m<sup>o</sup> <sup>2</sup> {l (U, q, w)}, m2) **where** (l, m<sup>o</sup> <sup>1</sup>, m1) = forward(m, e<sup>1</sup>) provided l <sup>∈</sup> Loc (m2, m<sup>o</sup> <sup>2</sup>, q, <sup>w</sup>) = delete frac <sup>2</sup>(l, m1, m<sup>o</sup> 1) forward(m, <sup>e</sup><sup>1</sup> <sup>=</sup> <sup>e</sup><sup>2</sup>) - (v<sup>2</sup>, m<sup>o</sup> <sup>3</sup> {l (L, <sup>1</sup>, <sup>v</sup><sup>2</sup>)}, m3) **where** (l, m<sup>o</sup> <sup>1</sup>, m1) = forward(m, e<sup>1</sup>) provided l <sup>∈</sup> Loc (v<sup>2</sup>, m<sup>o</sup> <sup>2</sup>, m2) = forward(m1, e<sup>2</sup>) (m3, m<sup>o</sup> <sup>3</sup>) = delete full <sup>2</sup>(l, m2, m<sup>o</sup> <sup>1</sup> <sup>m</sup><sup>o</sup> 2) forward(m, e) -<sup>⊥</sup> if <sup>e</sup> ∈ expr

Auxiliary functions:

$$\begin{aligned} \text{delete.} & \mathtt{frac.2} \{ \mathtt{1}, m\_{1}, m\_{2} \} \triangleq \begin{cases} (m\_{1} \{ \mathtt{1} \mapsto (U, q/2, \mathtt{v} \} \mathtt{1}, m\_{2}, q/2, \mathtt{v} \} & \text{if } m\_{1} \{ \mathtt{1} \} = (U, q, \mathtt{v} \} \\\\ (m\_{1}, m\_{2} \{ \mathtt{1} \mapsto (U, q/2, \mathtt{v} \} \mathtt{1}, q/2, \mathtt{v}) & \text{if } m\_{1} \{ \mathtt{1} \} \neq (U, \mathtt{v} \} \mathtt{1}, m\_{2} \{ \mathtt{1} \} = (U, q, \mathtt{v} \} \\\\ \bot & \text{otherwise} \end{cases} \\\\ \text{delete } & \mathtt{if} \ \mathtt{1} \ \mathtt{2} \ \mathtt{f} \ \mathtt{1} \ m\_{1} \{ m\_{1} \{ \mathtt{1} \mapsto \mathtt{1} \} \mathtt{m}\_{2} \{ \mathtt{1} \mapsto \mathtt{1} \} \\ \end{aligned}$$

delete full <sup>2</sup>(l, m1, m<sup>2</sup>) - (m<sup>1</sup> \ {<sup>l</sup> }, m<sup>2</sup> \ {<sup>l</sup> }) **where** (U, <sup>1</sup>, )=(m<sup>1</sup> <sup>m</sup>2)(l)

**Fig. 7.** The definition of the symbolic executor.

contains the write permission l −→<sup>U</sup> . If this holds, it removes the location l, so that the write permission is now consumed. Finally, we merge {l → (L, 1, v2)} with the output heap m<sup>o</sup> <sup>3</sup>, so that after assignment, the write permission l −→<sup>L</sup> v<sup>2</sup> is given back in a locked state.

# **6 A Verification Condition Generator for** *<sup>λ</sup>***MC**

To establish correctness of programs, we need to prove goals P wp e {Φ}. To prove such a goal, one has to repeatedly apply the rules for weakest preconditions, intertwined with logical reasoning. In this section we will automate this process for λMC by means of a *verification condition generator* (vcgen).

As a first attempt to define a vcgen, one could try to recurse over the expression e and apply the rules in Fig. 3 eagerly. This would turn the goal into a separation logic proposition that subsequently should be solved. However, as we pointed out in Sect. 5.1, the resulting separation logic proposition will be very difficult to prove—either interactively or automatically—due to the existentially quantified postconditions that appear because of uses of the rules for binary operators (*e.g.,* wp-bin-op). We then proposed alternative rules that avoid the need for existential quantifiers. These rules look like:

$$\frac{R \vdash \mathsf{wp} \; \mathsf{e}\_2 \; \left\{ \mathsf{v}\_2 . Q \to \mathsf{q} \; \left( \mathsf{v}\_1 \; \left[ \left@ \right] \; \mathsf{v}\_2 \right) \right\} \right\}}{P \vdash \mathsf{wp} \; \left( \mathsf{e}\_1 \; \left@ \; \mathsf{e}\_2 \right) \; \left\{ \Phi \right\} \right}$$

To use this rule, the crux is to symbolically execute e<sup>1</sup> with precondition P into a symbolic execution triple (v1, Q, R), which we alluded could be automatically computed by means of the symbolic executor if e<sup>1</sup> ∈ expr (Sect. 5.2).

We can only use the symbolic executor if P is of the shape m for a symbolic heap m. However, in actual program verification, the precondition P is hardly ever of that shape. In addition to a series of points-to connectives (as described by a symbolic heap), we may have arbitrary propositions of separation logic, such as pure facts, abstract predicates, nested Hoare triples, Iris ghost state, *etc.* These propositions may be needed to prove intermediate verification conditions, *e.g.,* for function calls. As such, to effectively apply the above rule, we need to separate our precondition P into two parts: a symbolic heap m and a remainder P . Assuming forward(m, e1) = (v1, m<sup>o</sup> <sup>1</sup>, m1), we may then use the following rule:

$$\frac{P' \ast \begin{bmatrix} m\_1 \end{bmatrix} \vdash \mathsf{wp} \ \mathsf{e}\_2 \ \begin{Bmatrix} \mathsf{v}\_2.\ \begin{bmatrix} m\_1^o \end{bmatrix} \rightharpoonup \Phi \ \begin{pmatrix} \mathsf{v}\_1 \ \begin{bmatrix} \odot \end{bmatrix} \ \mathsf{v}\_2 \end{Bmatrix} \end{Bmatrix}}{P' \ast \begin{bmatrix} m \end{bmatrix} \vdash \mathsf{wp} \ \begin{pmatrix} \mathsf{e}\_1 \ \begin{pmatrix} \mathsf{e}\_2 \ \begin{pmatrix} \mathsf{e}\_1 \ \mathsf{e}\_2 \end{pmatrix} \end{pmatrix} \begin{Bmatrix} \Phi \end{Bmatrix}}$$

It is important to notice that by applying this rule, the remainder P remains in our precondition as is, but the symbolic heap is changed from m into m1, *i.e.,* into the frame that we obtained by symbolically executing e1.

It should come as no surprise that we can automate this process, by applying rules, such as the one we have given above, recursively, and threading through symbolic heaps. Formally, we do this by defining the vcgen as a total function: vcg : (sheap × expr × (sheap → val → Prop)) → Prop where Prop is the type of propositions of our logic. The definition of vcg is given in Fig. 8. Before explaining the details, let us state its correctness theorem:

**Theorem 6.1.** *Given an expression* e*, a symbolic heap* m*, and a postcondition* Φ*, the following statement holds:*

$$\frac{P' \vdash \mathsf{vcg}\,\langle m, \mathsf{e}, \lambda m' \mathbf{v}.\left[m'\right] \dashrightarrow \Phi \,\mathbf{v}\rangle}{P' \ast \left[m\right] \vdash \mathsf{wp} \,\mathsf{e}\,\{\Phi\}}$$

This theorem reflects the general shape of the rules we previously described. We start off with a goal P ∗m wp e {Φ}, and after using the vcgen, we should prove that the generated goal follows from P . It is important to note that the continuation in the vcgen is not only parameterized by the return value, but also by a symbolic heap corresponding to the resources that remain. To get these resources back, the vcgen is initiated with the continuation λm v. m −∗ Φ v.

Most clauses of the definition of the vcgen (Fig. 8) follow the approach we described so far. For unary expressions like load we generate a condition that corresponds to the weakest precondition rule. For binary expressions, we symbolically execute either operand, and proceed recursively in the other. There are a number of important bells and whistles that we will discuss now.

**Sequencing.** In the case of sequenced binds x ← e<sup>1</sup> ; e2, we recursively compute the verification condition for e<sup>1</sup> with the continuation:

$$\left(\lambda m' \mathbf{v}. \mathbb{U} \text{ (}\mathsf{vcg}\{\mathsf{un}\mathsf{loock}\{m'\}, \mathsf{e}\_2\{\mathbf{v}/\mathbf{x}\}, \mathsf{K}\} \right).$$

Due to a sequence point, all locations modified by e<sup>1</sup> will be in the unlocked state after it is finished executing. Therefore, in the recursive call to e<sup>2</sup> we unlock all locations in the symbolic heap (*c.f.* unlock(m )), and we include a U modality in the continuation. The U modality is crucial so that the resources that are not given to the vcgen (the remainder P in Theorem 6.1) can also be unlocked.

**Handling Failure.** In the case of binary operators e<sup>1</sup> e2, it could be that the symbolic executor fails on both e<sup>1</sup> and e2, because neither of the arguments were of the right shape (*i.e.,* not an element of expr), or the required resources were not present in the symbolic heap. In this case the vcgen generates the goal of the form m −∗ wp (e<sup>1</sup> e2) {Kret} where Kret λw. ∃m . m ∗ K m w. What appears here is that the current symbolic heap m is given back to the user, which they can use to prove the weakest precondition of e<sup>1</sup> e<sup>2</sup> by hand. Through the postcondition ∃m . m ∗ K m w the user can resume the vcgen, by choosing a new symbolic heap m and invoking the continuation K m w.

For assignments e<sup>1</sup> = e<sup>2</sup> we have a similar situation. Symbolic execution of both e<sup>1</sup> and e<sup>2</sup> may fail, and then we generate a goal similar to the one for binary operators. If the location l that we wish to assign to is not in the symbolic heap, we use the continuation m −∗ ∃w. l −→<sup>U</sup> w ∗ (l −→<sup>L</sup> v −∗ Kret v). As before, the user gets back the current symbolic heap m, and could resume the vcgen through the postcondition Kret v by picking a new symbolic heap.

vcg(m, v, <sup>K</sup>) - <sup>K</sup> <sup>m</sup> v vcg(m, <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup>, <sup>K</sup>) - ⎧ ⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎩ vcg(m2, <sup>e</sup><sup>2</sup>,λm <sup>v</sup><sup>2</sup>. <sup>K</sup> (m <sup>m</sup><sup>o</sup>) (v<sup>1</sup> <sup>v</sup><sup>2</sup>)) if forward(m, <sup>e</sup><sup>1</sup>) = (v<sup>1</sup>, m<sup>o</sup>, m2) vcg(m1, <sup>e</sup><sup>1</sup>,λm <sup>v</sup><sup>1</sup>. <sup>K</sup> (m <sup>m</sup><sup>o</sup>) (v<sup>1</sup> <sup>v</sup><sup>2</sup>)) if forward(m, <sup>e</sup><sup>1</sup>) <sup>=</sup> <sup>⊥</sup> and forward(m, <sup>e</sup><sup>2</sup>) = (v<sup>2</sup>, m<sup>o</sup>, m1) <sup>m</sup> −∗ wp (e<sup>1</sup> <sup>e</sup><sup>2</sup>) {Kret} otherwise vcg(m, \*e, <sup>K</sup>) vcg(m, e, <sup>K</sup> ) **with** K λ m l. ⎧ ⎨ ⎩ <sup>K</sup> <sup>m</sup> w if l <sup>∈</sup> Loc and <sup>m</sup>(l)=(U, q, w) <sup>m</sup> −∗ ∃w q. l <sup>q</sup> <sup>U</sup> <sup>w</sup> <sup>∗</sup> (<sup>l</sup> <sup>q</sup> <sup>U</sup> <sup>w</sup> −∗ Kret <sup>w</sup>) otherwise vcg(m, <sup>e</sup><sup>1</sup> <sup>=</sup> <sup>e</sup><sup>2</sup>, <sup>K</sup>) - ⎧ ⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎩ vcg(m2, <sup>e</sup><sup>2</sup>,λm <sup>v</sup>. <sup>K</sup> (m <sup>m</sup><sup>o</sup>)(l, v)) if forward(m, <sup>e</sup><sup>1</sup>) = (l, m<sup>o</sup>, m2) vcg(m1, <sup>e</sup><sup>1</sup>,λm <sup>l</sup>. <sup>K</sup> (m <sup>m</sup><sup>o</sup>)(l, v)) if forward(m, <sup>e</sup><sup>1</sup>) <sup>=</sup> <sup>⊥</sup> and forward(m, e<sup>2</sup>) = (v, m<sup>o</sup>, m1) <sup>m</sup> −∗ wp (e<sup>1</sup> <sup>=</sup> <sup>e</sup><sup>2</sup>) {Kret} otherwise **with** K λ m ⎧ (l, v). ⎨ ⎩ <sup>K</sup> (m {l (L, <sup>1</sup>, v)}) <sup>v</sup> if <sup>l</sup> <sup>∈</sup> Loc and delete full(l, m) <sup>=</sup> <sup>m</sup> <sup>m</sup> −∗ ∃w. <sup>l</sup> <sup>U</sup> <sup>w</sup> <sup>∗</sup> (<sup>l</sup> <sup>L</sup> <sup>v</sup> −∗ Kret <sup>v</sup>) otherwise )

vcg(m, x e<sup>1</sup> ; <sup>e</sup><sup>2</sup>, <sup>K</sup>) vcg(m, e<sup>1</sup>,λm v. <sup>U</sup> vcg(unlock(m ), e<sup>2</sup>[v/x], <sup>K</sup>)

Auxiliary functions:

$$\begin{aligned} \{\mathcal{K}\_{\mathtt{r}\mathtt{at}} : \mathtt{v}\mathtt{al} \to \mathtt{Prop} \triangleq \lambda \mathtt{w} . (\exists m'. \{m'\} \ast \mathcal{K} \ m' \mathtt{w}) \quad & \mathsf{unblock}\{m\} \triangleq \bigsqcup\_{\begin{subarray}{c} 1 \in \operatorname{dom}(m) \\ m \langle 1 \rangle = \langle \mathtt{r}, \mathtt{q}, \mathtt{v} \rangle \end{subarray}} \{\mathtt{1} \mapsto \langle U, \mathtt{q}, \mathtt{v}\rangle\} . \end{aligned}$$

**Fig. 8.** Selected cases of the verification condition generator.

#### **7 Discussion**

**Extensions of the Language.** The memory model that we have presented in this paper was purposely oversimplified. In Coq, the memory model for λMC additionally supports mutable local variables, arrays, and pointer arithmetic. Adding support for these features was relatively easy and required only local changes to the definitional semantics and the separation logic.

For implementing mutable local variables, we tag each location with a Boolean that keeps track of whether it is an allocated or a local variable. That way, we can forbid deallocating local variables using the free(−) operator.

Our extended memory model is block/offset-based like CompCert's memory model [38]. Pointers are not simply represented as locations, but as pairs (, i), where is a HeapLang reference to a memory block containing a list of values, and i is an offset into that block. The points-to connectives of our separation logic then correspondingly range over block/offset-based pointers.

**Symbolic Execution of Sequence Points.** We adapt our forward algorithm to handle sequenced bind operators x ← e<sup>1</sup> ; e2. The subtlety lies in supporting nested sequenced binds. For example, in an expression (x ← e<sup>1</sup> ; e2) + e<sup>3</sup> the postcondition of e<sup>1</sup> can be used (along with the frame) for the symbolic execution of e2, but it cannot be used for the symbolic execution of e3. In order to solve this, our forward algorithm takes a *stack* of symbolic heaps as an input, and returns a *stack* of symbolic heaps (of the same length) as a frame. All the cases shown in Fig. 7 are easily adapted w.r.t. this modification, and the following definition captures the case for the sequence point bind:

$$\begin{aligned} \text{forward}(\vec{m}, \mathbf{x} \leftarrow \mathbf{e}\_1 \mathbf{i}, \mathbf{e}\_2) & \stackrel{\scriptstyle \Delta}{=} (\mathbf{v}\_2, m\_2^o \sqcup m', \vec{m}\_2) \\ \text{where} \quad (\mathbf{v}\_1, m\_1^o, \vec{m}\_1) &= \text{forward}(\vec{m}, \mathbf{e}\_1) \\ (\mathbf{v}\_2, m\_2^o, m' \vec{ \imath} \ \vec{m}\_2) &= \text{forward}(\textbf{unblock}(m\_1^o) \vec{ \imath} \ \vec{m}\_1, \mathbf{e}\_2 \{\mathbf{v}\_1/\mathbf{x}\}) \end{aligned}$$

**Shared Resource Invariants.** As in Krebbers's logic [29], the rules for binary operators in Fig. 3 require the resources to be separated into disjoint parts for the subexpressions. If both sides of a binary operator are function calls, then they can only share read permissions despite that both function calls are executed atomically. Following Krebbers, we address this limitation by adding a shared resource invariant R to our weakest preconditions and add the following rules:

$$\frac{R\_1 \qquad \mathsf{w}\mathsf{p}\_{R\_1 \ast R\_2}\ \mathsf{e}\left\{\mathsf{v}.\,R\_1\ \mathsf{x}\ \Phi\mathsf{v}\right\}}{\mathsf{w}\mathsf{p}\_{R\_2}\ \mathsf{e}\left\{\Phi\right\}}\qquad\quad\frac{R\ \mathsf{x}\ \mathsf{U}(\mathsf{w}\mathsf{p}\_{\mathsf{T}\mathsf{u}\mathsf{e}}\ \mathsf{e}\left\{\mathsf{x}/\mathsf{v}\right\}\left\{\mathsf{u}.\,R\ast\Phi\mathsf{v}\right\})}{\mathsf{w}\mathsf{p}\_{R}\ \mathsf{f}\left\{\mathsf{v}\right\}\left\{\Phi\right\}}$$

To temporarily transfer resources into the invariant, one can use the first rule. Because function calls are not interleaved, one can use the last rule to gain access to the shared resource invariant for the duration of the function call.

Our handling of shared resource invariants generalizes the treatment by Krebbers: using custom ghost state in Iris we can endow the resource invariant with a protocol. This allows us to verify examples that were previously impossible [29]:

int f(int \*p, int y) { return (\*p = y); } int main() { int x; f(&x, 3) + f(&x, 4); return x; }

Krebbers could only prove that main returns 0, 3 or 4, whereas we can prove it returns 3 or 4 by combining resource invariants with Iris's ghost state.

**Implementation in Coq**. In the Coq development [18] we have:


This last point allowed us to leverage the existing machinery for separation logic proofs in Coq. Firstly, we get basic building blocks for implementing the vcgen tactic for free. Secondly, when the vcgen is unable to solve the goal, one can use the Iris Proof Mode/MoSeL tactics to help out in a convenient manner.

To implement the symbolic executor and vcgen, we had to reify the terms and values of λMC. To see why reification is needed, consider the data type for symbolic heaps, which uses locations as keys. In proofs, those locations appear as universally quantified variables. To compute using these, we need to reify them into some symbolic representation. We have implemented the reification mechanism using type classes, following Spitters and van der Weegen [47].

With all the mechanics in place, our vcgen is able to significantly aid us. Consider the following program that copies the contents of one array into another:

```
int arraycopy(int *p, int *q, int n) {
 int pend = p + n;
 while (p < pend) { *(p++) = *(q++); }
}
```
We proved {p →x∗q →y∗(|x| = |y| = n)}arraycopy(p,q,n){p →y∗q →y} in 11 lines of Coq code. The vcgen can automatically process the program up until the while loop. At that point, the user has to manually perform an induction on the array, providing a suitable induction hypothesis. The vcgen is then able to discharge the base case automatically. In the inductive case, it will automatically process the program until the next iteration of the while loop, where the user has to apply the induction hypothesis.

#### **8 Related Work**

**C Semantics.** There has been a considerable body of work on formal semantics for the C language, including several large projects that aimed to formalize substantial subsets of C [17,20,30,37,41,44], and projects that focused on specific aspects like its memory model [10,13,27,28,31,38,40,41], weak memory concurrency [4,36,43], non-local control flow [35], verified compilation [37,48], *etc.*

The focus of this paper—non-determinism in C expressions—has been treated formally a number of times, notably by Norrish [44], Ellison and Rosu [17], Krebbers [31], and Memarian *et al.* [41]. The first three have in common that they model the sequence point restriction by keeping track of the locations that have been written to. The treatment of sequence points in our definitional semantics is closely inspired by the work of Ellison and Rosu [17], which resembles closely what is in the C standard. Krebbers [31] used a more restrictive version of the semantics by Ellison and Rosu—he assigned undefined behavior in some corner cases to ease the soundness theorem of his logic. We directly proved soundness of the logic w.r.t. the more faithful model by Ellison and Rosu.

Memarian *et al.* [41] give a semantics to C by elaboration into a language they call Core. Unspecified evaluation order in Core is modeled using an unseq operation, which is similar to our ||HL operation. Compared to our translation, Core is much closer to C (it has function calls, memory operations, *etc.* as primitives, while we model them with monadic combinators), and supports concurrency.

**Reasoning Tools and Program Logics for C**. Apart from formalizing the semantics of C, there have been many efforts to create reasoning tools for the C language in one way or another. There are standalone tools, like VeriFast [23], VCC [12], and the Jessie plugin of Frama-C [42], and there are tools built on top of general purpose proof assistants like VST [1,10] in Coq, or AutoCorres [19] in Isabelle/HOL. Although, admittedly, all of these tools cover larger subsets of C than we do, as far as we know, they all ignore non-determinism in expressions.

There are a few exceptions. Norrish proved confluence for a certain class of C expressions [45]. Such a confluence result may be used to justify proofs in a tool that does not have an underlying non-deterministic semantics.

Another exception is the separation logic for non-determinism in C by Krebbers [29]. Our work is inspired by his, but there are several notable differences:


To handle missing features of C as part of our vcgen, we plan to explore approaches by other verification projects in proof assistants. A notable example of such a project is VST, which supports machine arithmetic [16] and data types like structs and unions [10] as part of its tactics for symbolic execution.

**Separation Logic and Symbolic Execution.** In their seminal work, Berdine *et al.* [5] demonstrate the application of symbolic execution to automated reasoning in separation logic. In their setting, frame inference is used to perform symbolic execution of function calls. The frame has to be computed when the call site has more resources than needed to invoke a function. In our setting we compute frames for subexpressions, which, unlike functions, do not have predefined specifications. Due to that, we have to perform frame inference simultaneously with symbolic execution. The symbolic execution algorithm of Berdine *et al.* can handle inductive predicates, and can be extended with shape analysis [15]. We do not support such features, and leave them to future work.

Caper [14] is a tool for automated reasoning in concurrent separation logic, and it also deals with non-determinism, although the nature of non-determinism in Caper is different. Non-determinism in Caper arises due to branching on unknown conditionals and due to multiple possible ways to apply ghost state related rules (rules pertaining to abstract regions and guards). The former cause is tackled by considering sets of symbolic execution traces, and the latter is resolved by employing heuristics based on bi-abduction [9]. Applications of abductive reasoning to our approach to symbolic execution are left for future work.

Recently, Bannister *et al.* [2,3] proposed a new separation logic connective for performing forwards reasoning whilst avoiding frame inference. This approach, however, is aimed at sequential deterministic programs, focusing on a notion of partial correctness that allows for failed executions. Another approach to verification of sequential stateful programs is based on characteristic formulae [11]. A stateful program is transformed into a higher-order logic predicate, implicitly encoding the frame rule. The resulting formula is then proved by a user in Coq.

When implementing a vcgen in a proof assistant (see *e.g.,* [10,39]) it is common to let the vcgen return a new goal when it gets stuck, from which the user can help out and call back the vcgen. The novelty of our work is that this approach is applied to operations that are called in parallel.

**Acknowledgments.** We are grateful to Gregory Malecha and the anonymous reviewers and for their comments and suggestions. This work was supported by the Netherlands Organisation for Scientific Research (NWO), project numbers STW.14319 (first and second author) and 016.Veni.192.259 (third author).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Safe Deferred Memory Reclamation with Types

Ismail Kuru(B) and Colin S. Gordon

Drexel University, Philadelphia, USA {ik335,csgordon}@drexel.edu

Abstract. Memory management in lock-free data structures remains a major challenge in concurrent programming. Design techniques including read-copy-update (RCU) and hazard pointers provide workable solutions, and are widely used to great effect. These techniques rely on the concept of a grace period: nodes that should be freed are not deallocated immediately, and all threads obey a protocol to ensure that the deallocating thread can detect when all possible readers have completed their use of the object. This provides an approach to safe deallocation, but only when these subtle protocols are implemented correctly.

We present a static type system to ensure correct use of RCU memory management: that nodes removed from a data structure are always scheduled for subsequent deallocation, and that nodes are scheduled for deallocation at most once. As part of our soundness proof, we give an abstract semantics for RCU memory management primitives which captures the fundamental properties of RCU. Our type system allows us to give the first proofs of memory safety for RCU linked list and binary search tree implementations without requiring full verification.

### 1 Introduction

For many workloads, lock-based synchronization – even fine-grained locking – has unsatisfactory performance. Often lock-free algorithms yield better performance, at the cost of more complex implementation and additional difficulty reasoning about the code. Much of this complexity is due to memory management: developers must reason about not only other threads violating local assumptions, but whether other threads are *finished accessing* nodes to deallocate. At the time a node is unlinked from a data structure, an unknown number of additional threads may have already been using the node, having read a pointer to it before it was unlinked in the heap.

A key insight for manageable solutions to this challenge is to recognize that just as in traditional garbage collection, the unlinked nodes need not be reclaimed immediately, but can instead be reclaimed later after some protocol finishes running. Hazard pointers [29] are the classic example: all threads actively collaborate on bookkeeping data structures to track who is using a certain reference. For structures with read-biased workloads, Read-Copy-Update (RCU) [23] provides an appealing alternative. The programming style resembles a combination of reader-writer locks and lock-free programming. Multiple concurrent readers perform minimal bookkeeping – often nothing they wouldn't already do. A single writer at a time runs in parallel with readers, performing additional work to track which readers may have observed a node they wish to deallocate. There are now RCU implementations of many common tree data structures [3,5,8,19,24,33], and RCU plays a key role in Linux kernel memory management [27].

However, RCU primitives remain non-trivial to use correctly: developers must ensure they release each node exactly once, from exactly one thread, *after* ensuring other threads are finished with the node in question. Model checking can be used to validate correctness of implementations for a mock client [1,7,17,21], but this does not guarantee correctness of arbitrary client code. Sophisticated verification logics can prove correctness of the RCU primitives and clients [12,15,22,32]. But these techniques require significant verification expertise to apply, and are specialized to individual data structures or implementations. One important reason for the sophistication in these logics stems from the complexity of the underlying memory reclamation model. However, Meyer and Wolff [28] show that a suitable abstraction enables separating verifying *correctness* of concurrent data structures from its underlying reclamation model under the assumption of *memory safety*, and study proofs of correctness assuming memory safety.

We propose a type system to ensure that RCU client code uses the RCU primitives safely, ensuring memory safety for concurrent data structures using RCU memory management. We do this in a general way, not assuming the client implements any specific data structure, only one satisfying some basic properties common to RCU data structures (such as having a *tree* memory footprint). In order to do this, we must also give a formal operational model of the RCU primitives that abstracts many implementations, without assuming a particular implementation of the RCU primitives. We describe our RCU semantics and type system, prove our type system sound against the model (which ensures memory is reclaimed correctly), and show the type system in action on two important RCU data structures.

Our contributions include:


#### 2 Background and Motivation

In this section, we recall the general concepts of read-copy-update concurrency. We use the RCU linked-list-based bag [25] from Fig. 1 as a running example. It includes annotations for our type system, which will be explained in Sect. 4.2.

Fig. 1. RCU client: singly linked list based bag implementation.

As with concrete RCU implementations, we assume threads operating on a structure are either performing read-only traversals of the structure—*reader threads*—or are performing an update—*writer threads*—similar to the use of many-reader single-writer reader-writer locks.<sup>1</sup> It differs, however, in that readers may execute concurrently with the (single) writer.

This distinction, and some runtime bookkeeping associated with the readand write-side critical sections, allow this model to determine at modest cost when a node unlinked by the writer can safely be reclaimed.

Figure 1 gives the code for adding and removing nodes from a bag. Type checking for all code, including membership queries for bag, can be found in our technical report [20]. Algorithmically, this code is nearly the same as any sequential implementation. There are only two differences. First, the read-side critical section in member is indicated by the use of ReadBegin and ReadEnd; the write-side critical section is between WriteBegin and WriteEnd. Second, rather than immediately reclaiming the memory for the unlinked node, remove calls

<sup>1</sup> RCU implementations supporting multiple concurrent writers exist [3], but are the minority.

SyncStart to begin a *grace period*—a wait for reader threads that may still hold references to unlinked nodes to finish their critical sections. SyncStop blocks execution of the writer thread until these readers exit their read critical section (via ReadEnd). These are the essential primitives for the implementation of an RCU data structure.

These six primitives together track a critical piece of information: which reader threads' critical sections overlapped the writer's. Implementing them efficiently is challenging [8], but possible. The Linux kernel for example finds ways to reuse existing task switch mechanisms for this tracking, so readers incur no additional overhead. The reader primitives are semantically straightforward – they atomically record the start, or completion, of a read-side critical section.

The more interesting primitives are the write-side primitives and memory reclamation. WriteBegin performs a (semantically) standard mutual exclusion with regard to other writers, so only one writer thread may modify the structure *or the writer structures used for grace periods*.

SyncStart and SyncStop implement *grace periods* [31]: a mechanism to wait for readers to finish with any nodes the writer may have unlinked. A grace period begins when a writer requests one, and finishes when all reader threads active *at the start of the grace period* have finished their current critical section. Any nodes a writer unlinks before a grace period are physically unlinked, but not logically unlinked until after one grace period.

An attentive reader might already realize that our usage of logical/physical unlinking is different than the one used in data-structures literature where typically a *logical deletion* (e.g., marking) is followed by a *physical deletion* (unlinking). Because all threads are forbidden from holding an interior reference into the data structure after leaving their critical sections, waiting for active readers to finish their critical sections ensures they are no longer using any nodes the writer unlinked prior to the grace period. This makes actually freeing an unlinked node after a grace period safe.

SyncStart conceptually takes a snapshot of all readers active when it is run. SyncStop then blocks until all those threads in the snapshot have finished at least one critical section. SyncStop does not wait for *all* readers to finish, and does not wait for all overlapping readers to simultaneously be out of critical sections.

To date, every description of RCU semantics, most centered around the notion of a grace period, has been given algorithmically, as a specific (efficient) implementation. While the implementation aspects are essential to real use, the lack of an abstract characterization makes judging the correctness of these implementations – or clients – difficult in general. In Sect. 3 we give formal *abstract*, *operational* semantics for RCU implementations – inefficient if implemented directly, but correct from a memory-safety and programming model perspective, and not tied to specific low-level RCU implementation details. To use these semantics or a concrete implementation correctly, client code must ensure:


In practice, RCU data structures typically ensure additional invariants to simplify the above, e.g.:


and our type system in Sect. 4 guarantees these invariants.

# 3 Semantics

In this section, we outline the details of an abstract semantics for RCU implementations. It captures the core client-visible semantics of most RCU primitives, but not the implementation details required for efficiency [27]. In our semantics, shown in Fig. 2, an abstract machine state, MState, contains:


The lock l enforces mutual exclusion between write-side critical sections. The root location rt is the root of an RCU data structure. We model only a single global RCU data structure; the generalization to multiple structures is straightforward but complicates formal development later in the paper. The reader set R tracks the thread IDs (TIDs) of all threads currently executing a read block. The bounding set B tracks which threads the writer is *actively* waiting for during a grace period—it is empty if the writer is not waiting.

Figure 2 gives operational semantics for *atomic* actions; conditionals, loops, and sequencing all have standard semantics, and parallel composition uses sequentially-consistent interleaving semantics.

The first few atomic actions, for writing and reading fields, assigning among local variables, and allocating new objects, are typical of formal semantics for heaps and mutable local variables. Free is similarly standard. A writer thread's critical section is bounded by WriteBegin and WriteEnd, which acquire and release the lock that enforces mutual exclusion between writers. WriteBegin only reduces (acquires) if the lock is unlocked.

Standard RCU APIs include a primitive synchronize\_rcu() to wait for a grace period for the current readers. We decompose this here into two actions, SyncStart and SyncStop. SyncStart initializes the blocking set to the current set of readers—the threads that may have already observed any nodes the writer has unlinked. SyncStop blocks until the blocking set is emptied by completing

Fig. 2. Operational semantics for RCU.

reader threads. However, it does not wait for *all* readers to finish, and does not wait for all overlapping readers to simultaneously be out of critical sections. If two reader threads A and B overlap some SyncStart-SyncStop's critical section, it is possible that A may exit and re-enter a read-side critical section before B exits, and vice versa. Implementations must distinguish subsequent read-side critical sections from earlier ones that overlapped the writer's initial request to wait: since SyncStart is used *after* a node is physically removed from the data structure and readers may not retain RCU references across critical sections, A re-entering a fresh read-side critical section will not permit it to re-observe the node to be freed.

Reader thread critical sections are bounded by ReadBegin and ReadEnd. ReadBegin simply records the current thread's presence as an active reader. ReadEnd removes the current thread from the set of active readers, and also removes it (if present) from the blocking set—if a writer was waiting for a certain reader to finish its critical section, this ensures the writer no longer waits once that reader has finished its current read-side critical section.

Grace periods are implemented by the combination of ReadBegin, ReadEnd, SyncStart, and SyncStop. ReadBegin ensures the set of active readers is known. When a grace period is required, SyncStart;SyncStop; will store (in B) the active readers (which may have observed nodes before they were unlinked), and wait for reader threads to record when they have completed their critical section (and implicitly, dropped any references to nodes the writer wants to free) via ReadEnd.

These semantics do permit a reader in the blocking set to finish its read-side critical section and enter a *new* read-side critical section before the writer wakes. In this case, *the writer waits only for the first critical section of that reader to complete*, since entering the new critical section adds the thread's ID back to R, but not B.

# 4 Type System and Programming Language

In this section, we present a simple imperative programming language with two block constructs for modeling RCU, and a type system that ensures proper (memory-safe) use of the language. The type system ensures memory safety by enforcing these sufficient conditions:


We also demonstrate that the type system is not only sound, but useful: we show how it types Fig. 1's list-based bag implementation [25]. We also give type checked fragments of a binary search tree to motivate advanced features of the type system; the full typing derivation can be found in our technical report [20] Appendix B. The BST requires type narrowing operations that refine a type based on dynamic checks (e.g., determining which of several fields links to a node). In our system, we presume all objects contain all fields, but the number of fields is finite (and in our examples, small). This avoids additional overhead from tracking well-established aspects of the type system—class and field types and presence, for example—and focus on checking correct use of RCU primitives. Essentially, we assume the code our type system applies to is already type-correct for a system like C or Java's type system.

# 4.1 RCU Type System for Write Critical Section

Section 4.1 introduces RCU types and the need for subtyping. Section 4.2, shows how types describe program states, through code for Fig. 1's list-based bag example. Section 4.3 introduces the type system itself.

RCU Types. There are six types used in Write critical sections

# <sup>τ</sup> ::= rcuItr <sup>ρ</sup> N | rcuFresh N | unlinked <sup>|</sup> undef <sup>|</sup> freeable <sup>|</sup> rcuRoot

*rcuItr* is the type given to references pointing into a shared RCU data structure. A rcuItr type can be used in either a write region or a read region (without the additional components). It indicates both that the reference points into the shared RCU data structure and that the heap location referenced by rcuItr reference is reachable by following the path ρ from the root. A component N is a set of field mappings taking the field name to local variable names. Field maps are extended when the referent's fields are read. The field map and path components track reachability from the root, and local reachability between nodes. These are used to ensure the structure remains acyclic, and for the type system to recognize exactly when unlinking can occur.

Read-side critical sections use rcuItr without path or field map components. These components are both unnecessary for readers (who perform no updates) and would be invalidated by writer threads anyways. Under the assumption that reader threads do not hold references across critical sections, the readside rules essentially only ensure the reader performs no writes, so we omit the reader critical section type rules. They can be found in our technical report [20] Appendix E.

*unlinked* is the type given to references to unlinked heap locations—objects previously part of the structure, but now unreachable via the heap. A heap location referenced by an unlinked reference may still be accessed by reader threads, which may have acquired their own references before the node became unreachable. Newly-arrived readers, however, will be unable to gain access to these referents.

*freeable* is the type given to references to an unlinked heap location that is safe to reclaim because it is known that no concurrent readers hold references to it. Unlinked references become freeable after a writer has waited for a full grace period.

*undef* is the type given to references where the content of the referenced location is inaccessible. A local variable of type freeable becomes undef after reclaiming that variable's referent.

*rcuFresh* is the type given to references to freshly allocated heap locations. Similar to rcuItr type, it has field mappings set N . We set the field mappings in the set of an existing rcuFresh reference to be the same as field mappings in the set of rcuItr reference when we replace the heap referenced by rcuItr with the heap referenced by rcuFresh for memory safe replacement.

*rcuRoot* is the type given to the fixed reference to the root of the RCU data structure. It may not be overwritten.

Subtyping. It is sometimes necessary to use imprecise types—mostly for control flow joins. Our type system performs these abstractions via subtyping on individual types and full contexts, as in Fig. 3.

Figure <sup>3</sup> includes four judgments for subtyping. The first two—N ≺: <sup>N</sup> and <sup>ρ</sup> <sup>≺</sup>: <sup>ρ</sup> —describe relaxations of field maps and paths respectively. N ≺: <sup>N</sup> is read as "the field map <sup>N</sup> is more precise than <sup>N</sup> " and similarly for paths. The third judgment <sup>T</sup> <sup>≺</sup>: <sup>T</sup> uses path and field map subtyping to give subtyping among rcuItr types—one rcuItr is a subtype of another if its paths

$$\mathcal{N} = \{f\_0\} \dots \bigcup\_{f\_N} f\_n \to \{y\} \mid f\_1 \in \mathsf{F}\text{Rike} \land 0 \le i \le n \land (y \in \mathsf{Vars} \land y \in \{\text{mult}\})\}\quad\mathcal{N}\_{f,g} = \mathcal{N} \mid \{f\\_-,g\}$$

$$\mathcal{N}\_0 = \{\} \quad \mathcal{N}(\cup f \rightharpoonup y) = \mathcal{N} \cup \{f \rightharpoonup y\} \quad\mathcal{N}(\{f \rightharpoonup y\}) = \mathcal{N} \quad\{f \rightharpoonup y\}$$

$$\mathcal{N}(\{f \rightharpoonup y\}) = \mathcal{N} \text{ where } f \rightharpoonup y \in \mathcal{N} \quad\mathcal{N}(f \rightharpoonup x \cdot \{y\}) = \mathcal{N} \mid \{f \righth\_x \cdot y\} \cup \{f \righth\_y \cdot y\}$$

$$\begin{array}{|c|c|} \hline \mathsf{T} \cdot \mathsf{N} \cdot \mathsf{T} \cdot \mathcal{N}' \qquad\qquad \left(\begin{array}{c} \mathsf{T} \cdot \mathsf{N} \mathsf{S} \|\mathsf{0}\|\mathsf{3}\rangle \\\ \mathsf{T} \cdot \mathcal{N}\_f \cdot \mathsf{N} \cdot \mathsf{N}' \|\|f \righth\_x \cdot y\|\end{array}\right) & \begin{array}{|c|c|} \hline \mathsf{T} \cdot \mathsf{N} \mathsf{S} \|\mathsf{0}\|\mathsf{4} \\\ \mathsf{T} \cdot \mathsf{N} \|\mathsf{0}\|\mathsf{2}\| \\\ \mathsf{T} \cdot \mathsf{N} \|f\_2 \righth\_x \cdot y\|\mathsf{1} \\\ \mathsf{T} \cdot \mathsf{N} \|f\_3\|\mathsf{0}\|\mathsf{1}\|\mathsf{2} \\\ \mathsf{T} \cdot \mathsf{N} \cdot \mathsf{N}' \|f\_4\|\mathsf{0}$$

#### Fig. 3. Subtyping rules.

#### Fig. 4. Type rules for control-flow.

and field maps are similarly more precise—and to allow rcuItr references to be "forgotten"—this is occasionally needed to satisfy non-interference checks in the type rules. The final judgment <sup>Γ</sup> <sup>≺</sup>: <sup>Γ</sup> extends subtyping to all assumptions in a type context.

It is often necessary to abstract the contents of field maps or paths, without simply forgetting the contents entirely. In a binary search tree, for example, it may be the case that one node is a child of another, but *which* parent field points to the child depends on which branch was followed in an earlier conditional (consider the lookup in a BST, which alternates between following left and right children). In Fig. 5, we see that cur aliases different fields of par – either Lef t or Right – in different branches of the conditional. The types after the conditional must overapproximate this, here as Lef t|Right → cur in par's field map, and a similar path disjunction in cur's path. This is reflected in Fig. 3's T-NSub1-5 and T-PSub1-2 – within each branch, each type is coerced to a supertype to validate the control flow join.

Another type of control flow join is handling loop invariants – where paths entering the loop meet the back-edge from the end of a loop back to the start for repetition. Because our types include paths describing how they are reachable from the root, some abstraction is required to give loop invariants that work for any number of iterations – in a loop traversing a linked list, the iterator pointer would naïvely have different paths from the root on each iteration, so the exact path is not loop invariant. However, the paths explored by a loop are regular, so we can abstract the paths by permitting (implicitly) existentially quantified indexes on path fragments, which express the existence of *some* path, without saying *which* path. The use of an explicit abstract repetition allows the type system to preserve the fact that different references have common path prefixes, even after a loop.

Assertions for the add function in lines 19 and 20 of Fig. 1 show the *loop*'s effects on paths of iterator references used inside the loop, cur and par. On line 20, par's path contains has (Next)*<sup>k</sup>*. The k in the (Next)*<sup>k</sup>* abstracts the number of loop iterations run, implicitly assumed to be non-negative. The trailing Next in cur's path on line 19 – (Next)*<sup>k</sup>*.Next – expresses the relationship between cur and par: par is reachable from the root by following Next k times, and cur is reachable via one additional Next. The types of 19 and 20, however, are not the same as lines 23 and 24, so an additional adjustment is needed for the types to become loop-invariant. *Reindexing* (T-ReIndex in Fig. 4) effectively increments an abstract loop counter, contracting (Next)*<sup>k</sup>*.Next to Next*<sup>k</sup>* everywhere in a type environment. This expresses the same relationship between par and cur as before the loop, but the choice of k to make these paths accurate after each iteration would be one larger than the choice before. Reindexing the type environment of lines 23–24 yields the type environment of lines 19–20, making the types loop invariant. The reindexing essentially chooses a new value for the abstract k. This is sound, because the uses of framing in the heap mutation related rules of the type system ensure uses of any indexing variable are never separated – either all are reindexed, or none are.

While abstraction is required to deal with control flow joins, reasoning about whether and which nodes are unlinked or replaced, and whether cycles are created, requires precision. Thus the type system also includes means (Fig. 4) to refine imprecise paths and field maps. In Fig. 5, we see a conditional with the condition par.Lef t == cur. The type system matches this condition to the imprecise types in line 1's typing assertion, and refines the initial type assumptions in each branch accordingly (lines 2 and 7) based on whether execution reflects the truth or falsity of that check. Similarly, it is sometimes required to check – and later remember – whether a field is null, and the type system supports this.

Fig. 5. Choosing fields to read.

#### 4.2 Types in Action

The system has three forms of typing judgement: Γ C for standard typing outside RCU critical sections; Γ *<sup>R</sup>* C Γ for reader critical sections, and Γ *<sup>M</sup>* C Γ for writer critical sections. The first two are straightforward, essentially preventing mutation of the data structure, and preventing nesting of a writer critical section inside a reader critical section. The last, for writer critical sections, is flow sensitive: the types of variables may differ before and after program statements. This is required in order to reason about local assumptions at different points in the program, such as recognizing that a certain action may unlink a node. Our presentation here focuses exclusively on the judgment for the write-side critical sections.

Below, we explain our types through the list-based bag implementation [25] from Fig. 1, highlighting how the type rules handle different parts of the code. Figure 1 is annotated with "assertions" – local type environments – in the style of a Hoare logic proof outline. As with Hoare proof outlines, these annotations can be used to construct a proper typing derivation.

Reading a Global RCU Root. All RCU data structures have fixed roots, which we characterize with the rcuRoot type. Each operation in Fig. 1 begins by reading the root into a new rcuItr reference used to begin traversing the structure. After each initial read (line 12 of add and line 4 of remove), the path of cur reference is the empty path () and the field map is empty ({}), because it is an alias to the root, and none of its field contents are known yet.

Reading an Object Field and a Variable. As expected, we explore the heap of the data structure via reading the objects' fields. Consider line 6 of remove and its corresponding pre- and post- type environments. Initially par's field map is empty. After the field read, its field map is updated to reflect that its Next field is aliased in the local variable cur. Likewise, after the update, cur's path is Next (<sup>=</sup> · Next), extending the par node's path by the field read. This introduces field aliasing information that can subsequently be used to reason about unlinking.

Unlinking Nodes. Line 24 of remove in Fig. 1 unlinks a node. The type annotations show that before that line cur is in the structure (rcuItr), while afterwards its type is unlinked. The type system checks that this unlink disconnects only one node: note how the types of par, cur, and curl just before line 24 completely describe a section of the list.

Grace and Reclamation. After the referent of cur is unlinked, concurrent readers traversing the list may still hold references. So it is not safe to actually reclaim the memory until after a grace period. Lines 28–29 of remove initiate a grace period and wait for its completion. At the type level, this is reflected by the change of cur's type from unlinked to freeable, reflecting the fact that the grace period extends until any reader critical sections that might have observed the node in the structure have completed. This matches the precondition required by our rules for calling Free, which further changes the type of cur to undef reflecting that cur is no longer a valid reference. The type system also ensures no local (writer) aliases exist to the freed node and understanding this enforcement is twofold. First, the type system requires that only unlinked heap nodes can be freed. Second, framing relations in rules related to the heap mutation ensure no local aliases still consider the node linked.

Fresh Nodes. Some code must also allocate new nodes, and the type system must reason about how they are incorporated into the shared data structure. Line 8 of the add method allocates a new node nw, and lines 10 and 29 initialize its fields. The type system gives it a fresh type while tracking its field contents, until line 32 inserts it into the data structure. The type system checks that nodes previously reachable from cur remain reachable: note the field maps of cur and nw in lines 30–31 are equal (trivially, though in general the field need not be null).

#### 4.3 Type Rules

Figure 6 gives the primary type rules used in checking write-side critical section code as in Fig. 1.

T-Root reads a root pointer into an rcuItr reference, and T-ReadS copies a local variable into another. In both cases, the free variable condition ensures that updating the modified variable does not invalidate field maps of other variables in Γ. These free variable conditions recur throughout the type system, and we will not comment on them further. T-Alloc and T-Free allocate and reclaim objects. These rules are relatively straightforward. T-ReadH reads a field into a local variable. As suggested earlier, this rule updates the post-environment to reflect that the overwritten variable z holds the same value as x.f. T-WriteFH updates a field of a *fresh* (thread-local) object, similarly tracking the update in the fresh object's field map at the type level. The remaining rules are a bit more involved, and form the heart of the type system.

Grace Periods. T-Sync gives pre- and post-environments to the compound statement SyncStart;SyncStop implementing grace periods. As mentioned earlier, this updates the environment afterwards to reflect that any nodes unlinked before the wait become freeable afterwards.

Fig. 6. Type rules for write side critical section.

Unlinking. T-UnlinkH type checks heap updates that remove a node from the data structure. The rule assumes three objects x, z, and r, whose identities we will conflate with the local variable names in the type rule. The rule checks the case where x.f<sup>1</sup> == z and z.f<sup>2</sup> == r initially (reflected in the path and field map components, and a write x.f<sup>1</sup> = r removes z from the data structure (we assume, and ensure, the structure is a tree).

The rule must also avoid unlinking multiple nodes: this is the purpose of the first (smaller) implication: it ensures that beyond the reference from z to r, all fields of z are null.

Finally, the rule must ensure that no types in Γ are invalidated. This could happen one of two ways: either a field map in Γ for an alias of x duplicates

Fig. 7. Replacing *existing* heap nodes with *fresh* ones. Type rule T-Replace.

the assumption that x.f<sup>1</sup> == z (which is changed by this write), or Γ contains a descendant of r, whose path from the root will change when its ancestor is modified. The final assumption of T-UnlinkH (the implication) checks that for every rcuItr reference n in Γ, it is not a path alias of x, z, or r; no entry of its field map (m) refers to r or z (which would imply n aliased x or z initially); and its path is not an extension of r (i.e., it is not a descendant). MayAlias is a predicate on two paths (or a path and set of paths) which is true if it is possible that any concrete paths the arguments may abstract (e.g., via adding non-determinism through|or abstracting iteration with indexing) *could* be the same. The negation of a MayAlias use is true only when the paths are guaranteed to refer to different locations in the heap.

Replacing with a Fresh Node. Replacing with a rcuFresh reference faces the same aliasing complications as direct unlinking. We illustrate these challenges in Figs. 7a and b. Our technical report [20] also includes Figures 32a and 32b in Appendix D to illustrate complexities in unlinking. The square R nodes are root nodes, and H nodes are general heap nodes. All resources in thick straight lines and dotted lines form the memory foot print of a node replacement. The hollow thick circular nodes – pr and cr – point to the nodes involved in replacing H<sup>1</sup> (referenced by cr) with H*<sup>f</sup>* (referenced by cf) in the structure. We may have a<sup>0</sup> and a<sup>1</sup> which are aliases with pr and cr respectively. They are *path-aliases* as they share the same path from root to the node that they reference. Edge labels l and r are abbreviations for the Lef t and Right fields of a binary search tree. The thick dotted H*<sup>f</sup>* denotes the freshly allocated heap node referenced by thick dotted cf. The thick dotted field l is set to point to the referent of cl and the thick dotted field r is set to point to the referent of the heap node referenced by lm.

H*<sup>f</sup>* initially (Fig. 7a) is not part of the shared structure. If it was, it would violate the tree shape requirement imposed by the type system. This is why we highlight it separately in thick dots—its static type would be rcuFresh. Note that we cannot duplicate a rcuFresh variable, nor read a field of an object it points to. This restriction localizes our reasoning about the effects of replacing with a fresh node to just one fresh reference and the object it points to. Otherwise another mechanism would be required to ensure that once a fresh reference was linked into the heap, there were no aliases still typed as fresh—since that would have risked linking the same reference into the heap in two locations.

The transition from the Fig. 7a to b illustrates the effects of the heap mutation (replacing with a fresh node). The reasoning in the type system for replacing with a fresh node is nearly the same as for unlinking an existing node, with one exception. In replacing with a fresh node, there is no need to consider the paths of nodes deeper in the tree than the point of mutation. In the unlinking case, those nodes' static paths would become invalid. In the case of replacing with a fresh node, those descendants' paths are preserved. Our type rule for ensuring safe replacement (T-Replace) prevents path aliasing (representing the nonexistence of a<sup>0</sup> and a<sup>1</sup> via dashed lines and circles) by negating a MayAlias query and prevents field mapping aliasing (nonexistence of any object field from any other context pointing to cr) via asserting (<sup>y</sup> <sup>=</sup> <sup>o</sup>). It is important to note that objects (H4, H2) in the field mappings of the cr whose referent is to be unlinked captured by the heap node's field mappings referenced by cf in rcuFresh. This is part of enforcing locality on the heap mutation and captured by assertion <sup>N</sup> <sup>=</sup> <sup>N</sup> in the type rule (T-Replace).

Inserting a Fresh Node. T-Insert type checks heap updates that link a fresh node into a linked data structure. Inserting a rcuFresh reference also faces some of the aliasing complications that we have already discussed for direct unlinking and replacing a node. Unlike the replacement case, the path to the last heap node (the referent of o) from the root is *extended* by f, which risks falsifying the paths for aliases and descendants of o. The final assumption (the implication) of T-Insert checks for this inconsistency.

There is also another rule, T-LinkF-Null, not shown in Fig. 6, which handles the case where the fields of the fresh node are not object references, but instead all contain null (e.g., for appending to the end of a linked list or inserting a leaf node in a tree).

Critical Sections (*Referencing inside RCU Blocks*). We introduce the *syntactic sugaring* RCUWrite x.f as y in {C} for write-side critical sections where the analogous syntactic sugaring can be found for read-side critical sections in Appendix E of the technical report [20].

The type system ensures unlinked and freeable references are handled linearly, as they cannot be dropped – coerced to undef. The top-level rule ToRCUWrite in Fig. 6 ensures unlinked references have been freed by forbidding them in the critical section's post-type environment. Our technical report [20] also includes the analogous rule ToRCURead for the read critical section in Figure 33 of Appendix E.

Preventing the reuse of rcuItr references across critical sections is subtler: the non-critical section system is not flow-sensitive, and does not include rcuItr. Therefore, the initial environment lacks rcuItr references, and trailing rcuItr references may not escape.

#### 5 Evaluation

We have used our type system to check correct use of RCU primitives in two RCU data structures representative of the broader space.

Figure 1 gives the type-annotated code for add and remove operations on a linked list implementation of a bag data structure, following McKenney's example [25]. Our technical report [20] contains code for membership checking.

We have also type checked the most challenging part of an RCU binary search tree, the deletion (which also contains the code for a lookup). Our implementation is a slightly simplified version of the Citrus BST [3]: their code supports fine-grained locking for multiple writers, while ours supports only one writer by virtue of using our single-writer primitives. For lack of space the annotated code is only in Appendix B of the technical report [20], but here we emphasise the important aspects our type system via showing its capabilities of typing BST delete method, which also includes looking up for the node to be deleted.

In Fig. 8, we show the steps for deleting the heap node H1. To locate the node H1, as shown in Fig. 8a, we first traverse the subtree T<sup>0</sup> with references pr and cr, where pr is the parent of cr during traversal:

$$\{pr : rcuIter(l|r)^k \{l|r \to cr\}, \, cr : rcuIter(l|r)^k.(l|r)\{\}\}$$

Traversal of <sup>T</sup><sup>0</sup> is summarized as (l|k)*<sup>k</sup>*. The most subtle aspect of the deletion is the final step in the case the node H<sup>1</sup> to remove has both children; as shown in Fig. 8b, the code must traverse the subtree T<sup>4</sup> to locate the next element in collection order: the node H*s*, the left-most node of H1's right child (sc) and its parent (lp):

$$\{lp : (l|r)^k. (l|r). r. (l|r)^m. \{l|r \to sc\}, \; sc : (l|r)^k. (l|r). r. l. (l)^m. l\{\}\}$$

where the traversal of <sup>T</sup><sup>4</sup> is summarized as (l|m)*<sup>m</sup>*.

Then H*<sup>s</sup>* is copied into a new *freshly-allocated* node as shown in Fig. 8b, which is then used to *replace* node H<sup>1</sup> as shown in Fig. 8c: the replacement's fields exactly match <sup>H</sup>1's except for the data (T-Replace via <sup>N</sup><sup>1</sup> <sup>=</sup> <sup>N</sup>2) as shown in Fig. 8b, and the parent is updated to reference the replacement, unlinking H1.

At this point, as shown in Figs. 8c and d, there are two nodes with the same value in the tree (the *weak* BST property of the Citrus BST [3]): the replacement node, and what was the left-most node under H1's right child. This latter (original) node H*<sup>s</sup>* must be unlinked as shown in Fig. 8e, which is simpler because by being left-most the left child is null, avoiding another round of replacement (T-UnlinkH via <sup>∀</sup>*<sup>f</sup>*∈*dom*(N1). f <sup>=</sup> <sup>f</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> (N1(f) = null).

Traversing T<sup>4</sup> to find successor complicates the reasoning in an interesting way. After the successor node H*<sup>s</sup>* is found in Fig. 8b, there are *two* local unlinking operations as shown in Figs. 8c and e, at different depths of the tree. This is why the type system must keep separate abstract iteration counts, e.g., <sup>k</sup> of (l|r)*<sup>k</sup>* or <sup>m</sup> of (l|r)*<sup>m</sup>*, for traversals in loops—these indices act like multiple cursors into the data structure, and allow the types to carry enough information to keep those changes separate and ensure neither introduces a cycle.

Fig. 8. Delete of a heap node with two children in BST [3].

To the best of our knowledge, we are the first to check such code for memorysafe use of RCU primitives modularly, without appeal to the specific implementation of RCU primitives.

### 6 Soundness

This section outlines the proof of type soundness – our full proof appears the accompanying technical report [20]. We prove type soundness by embedding the type system into an abstract concurrent separation logic called the Views Framework [9], which when given certain information about proofs for a specific language (primitives and primitive typing) gives back a full program logic including choice and iteration. As with other work taking this approach [13,14], this consists of several key steps explained in the following subsections, but a high-level informal soundness argument is twofold. First, because the parameters given to the Views framework ensure the Views logic's Hoare triples {−}C{−} are sound, this proves soundness of the type rules with respect to type denotations. Second, as our denotation of types encodes the property that the post-environment of any type rule accurately characterizes which memory is linked vs. unlinked, etc., and the global invariants ensure all allocated heap memory is reachable from the root or from some thread's stack, this entails that our type system prevents memory leaks.

#### 6.1 Proof

This section provides more details on how the Views Framework [9] is used to prove soundness, giving the major parameters to the framework and outlining global invariants and key lemmas.

Logical State. Section 3 defined what Views calls *atomic actions* (the primitive operations) and their semantics on runtime *machine states*. The Views Framework uses a separate notion of instrumented (logical) state over which the logic is built, related by a concretization function − taking an instrumented state to the machine states of Sect. 3. Most often—including in our proof—the logical state adds useful auxiliary state to the machine state, and the concretization is simply projection. Thus we define our logical states LState as:


The thread ID set T includes the thread ID of all running threads. The free map F tracks which reader threads may hold references to each location. It is not required for execution of code, and for validating an implementation could be ignored, but we use it later with our type system to help prove that memory deallocation is safe. The (per-thread) variables in the undefined variable map U are those that should not be accessed (e.g., dangling pointers).

The remaining component, the observation map O, requires some further explanation. Each memory allocation/object can be *observed* in one of the following states by a variety of threads, depending on how it was used.

```
obs := iterator tid | unlinked | fresh | freeable | root
```
An object can be observed as part of the structure (iterator), removed but possibly accessible to other threads, freshly allocated, safe to deallocate, or the root of the structure.

Invariants of RCU Views and Denotations of Types. Next, we aim to convey the intuition behind the predicate WellFormed which enforces global invariants on logical states, and how it interacts with the denotations of types (Fig. 9) in key ways.

WellFormed is the conjunction of a number of more specific invariants, which we outline here. For full details, see Appendix A.2 of the technical report [20].

*The Invariant for Read Traversal.* Reader threads access valid heap locations even during the grace period. The validity of their heap accesses ensured by the observations they make over the heap locations—which can only be iterator as they can only use local rcuItr references. To this end, a Readers-Iterators-Only invariant asserts that reader threads can only observe a heap location as iterator.

*Invariants on Grace-Period.* Our logical state includes a "free list" auxiliary state tracking which readers are still accessing *each* unlinked node during grace periods. This must be consistent with the bounding thread set B in the machine state, and this consistency is asserted by the Readers-In-Free-List invariant. This is essentially tracking which readers are being "shown grace" for each location. The Iterators-Free-List invariant complements this by asserting all readers with such observations on unlinked nodes are in the bounding thread set.

The writer thread can refer to a heap location in the free list with a local reference either in type freeable or unlinked. Once the writer unlinks a heap node, it first observes the heap node as unlinked then freeable. The denotation of freeable is only valid following a grace period: it asserts no readers hold aliases of the freeable reference. The denotation of unlinked permits the either the same (perhaps no readers overlapped) or that it is in the to-free list.

*Invariants on Safe Traversal Against Unlinking.* The write-side critical section must guarantee that no updates to the heap cause invalid memory accesses. The Writer-Unlink invariant asserts that a heap location observed as iterator by the writer thread cannot be observed differently by other threads. The denotation of the writer thread's rcuItr reference, rcuItr ρ N *tid*, asserts that following a path from the root compatible with ρ reaches the referent, and all are observed as iterator.

The denotation of a reader thread's rcuItr reference, rcuItr*tid* and the invariants Readers-Iterator-Only, Iterators-Free-List and Readers-In-Free-List all together assert that a reader thread (which can also be a bounding thread) can view an unlinked heap location (which can be in the free list) only as iterator. At the same time, it is essential that reader threads arriving after a node is unlinked cannot access it. The invariants Unlinked-Reachability and Free-List-Reachability ensure that any unlinked nodes are reachable only from other unlinked nodes, and never from the root.

$$\begin{aligned} \left[\begin{subarray} \left[x:\mathsf{crtl}\right] \rho \mathcal{N}\right]\_{\mathit{i}\,{i}\,{i}} &= \left\{ \begin{subarray} \left[\begin{subarray}{c} m \in \mathcal{M}\left([\mbox{tter}\ \mathsf{r}\ \mathsf{id}\ \mathsf{e}\right)\left(x,\mathsf{t}\ \mathsf{id}\right)\right)\wedge\left(x\notin\mathsf{e}\ \mathsf{t}\ \mathsf{i}\right)\right] \\ \left(\begin{subarray}{c} \forall\!\left(\!\!\!\_{f\in\operatorname{\mathsf{d}\mathsf{crs}}\left(\mathcal{N}\right)\!\!\_{m\left(\mathcal{N}\right)\!\!\_{m\left(\mathcal{N}\right)\!\!\_{m\left(\mathcal{N}\right)\!\!\_{m\left(\mathcal{N}\right)\!\!\!\_{m\left(\mathcal{N}\right)\!\!\!\_{m\left(\mathcal{N}\right)\!\!\!\_{m\left(\mathcal{N}\right)\!\!\!\!\!\_{m\left(\mathcal{N}\right)\!\!\!\!\!\!\!\_{m\left(\mathcal{N}\right)\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!}} \end{\left\{\begin{subarray}{c} \left[\begin{subarray}{c} \left(\mathcal{N}\right),\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\$$

Fig. 9. Type environments

*Invariants on Safe Traversal Against Inserting/Replacing.* A writer replacing an existing node with a fresh one or inserting a single fresh node assumes the fresh (before insertion) node is unreachable to readers before it is published/linked. The Fresh-Writes invariant asserts that a fresh heap location can only be allocated and referenced by the writer thread. The relation between a freshly allocated heap and the rest of the heap is established by the Fresh-Reachable invariant, which requires that there exists no heap node pointing to the freshly allocated one. This invariant supports the preservation of the tree structure. The Fresh-Not-Reader invariant supports the safe traversal of the reader threads via asserting that they cannot observe a heap location as fresh. Moreover, the denotation of the rcuFresh type, rcuFresh N *tid*, enforces that fields in N point to valid heap locations (observed as iterator by the writer thread).

*Invariants on Tree Structure.* Our invariants enforce the *tree* structure heap layouts for data structures. The Unique-Reachable invariant asserts that every heap location reachable from root can only be reached with following an unique path. To preserve the tree structure, Unique-Root enforces unreachability of the root from any heap location that is reachable from root itself.

Type Environments. Assertions in the Views logic are (almost) sets of the logical states that satisfy a validity predicate WellFormed, outlined above:

$$\mathcal{M} \stackrel{def}{=} \{ m \in (\mathsf{MState} \times O \times U \times T \times F) \mid \mathsf{WellFormed}(m) \}$$

Every type environment represents a set of possible views (WellFormed logical states) consistent with the types in the environment. We make this precise with a denotation function

$$[-]\\_: \mathsf{TypeEn} \mathsf{v} \to \mathsf{T} \mathsf{lD} \to \mathcal{P}(\mathcal{M})$$

that yields the set of states corresponding to a given type environment. This is defined as the intersection of individual variables' types as in Fig. 9.

Individual variables' denotations are extended to context denotations slightly differently depending on whether the environment is a reader or writer thread context: writer threads own the global lock, while readers do not:


Composition and Interference. To support framing (weakening), the Views Framework requires that views form a partial commutative monoid under an operation • : M −→ M −→ M, provided as a parameter to the framework. The framework also requires an interference relation R ⊆M×M between views to reason about local updates to one view preserving validity of adjacent views (akin to the small-footprint property of separation logic). Figure 10 defines our composition operator and the core interference relation R0—the actual interference between views (between threads, or between a local action and framed-away state) is the reflexive transitive closure of R0. Composition is mostly straightforward point-wise union (threads' views may overlap) of each component. Interference bounds the interference writers and readers may inflict on each other. Notably, if a view contains the writer thread, other threads may not modify the shared portion of the heap, or release the writer lock. Other aspects of interference are natural restrictions like that threads may not modify each others' local variables. WellFormed states are closed under both composition (with another WellFormed state) and interference (R relates WellFormed states only to other WellFormed states).

$$\begin{array}{lcl} \downarrow \text{if } (x.f == y) \, \, C\_1 \, \, C\_2 \downarrow \, \text{id} \stackrel{\text{def}}{=} z = x.f; ((\texttt{assume}(z = y); C\_1) + (\texttt{assume}(z \neq y); C\_2)) &\\ \{\texttt{assume}(S)\} (s) \stackrel{\text{def}}{=} \begin{cases} \{s\} \, \text{if } s \in \mathcal{S} \\ \emptyset \, \text{Otherwise} \end{cases} \downarrow \text{while } (e) \, C \downarrow \stackrel{\text{def}}{=} \langle \texttt{assume}(e); C \rangle^{\*} \,; \langle \texttt{assume}(-e) \rangle; \\ \{P\} \cap \{\lceil\mathcal{S}\rceil\} \subseteq \{Q\} & \text{where} \quad \lceil\mathcal{S}\rceil = \{m \mid \lfloor m \rfloor \cap \mathcal{S} \neq \emptyset\} \end{array}$$

Fig. 11. Encoding branch conditions with assume(b)

Stable Environment and Views Shift. The framing/weakening type rule will be translated to a use of the frame rule in the Views Framework's logic. There separating conjunction is simply the existence of two composable instrumented states:

$$m \in P \ast Q \stackrel{def}{=} \exists m'. \exists m''. m' \in P \land m'' \in Q \land m \in m' \bullet m''$$

In order to validate the frame rule in the Views Framework's logic, the assertions in its logic—sets of well-formed instrumented states—must be restricted to sets of logical states that are *stable* with respect to expected interference from other threads or contexts, and interference must be compatible in some way with separating conjunction. Thus a View—the actual base assertions in the Views logic—are then:

$$\mathsf{View}\_{\mathcal{M}} \stackrel{def}{=} \{ M \in \mathcal{P}(\mathcal{M}) | \mathcal{R}(M) \subseteq M \},$$

Additionally, interference must distribute over composition:

$$\forall m\_1, m\_2, m. \left(m\_1 \bullet m\_2\right) \mathcal{R}m \implies \exists m\_1' m\_2'. m\_1 \mathcal{R}m\_1' \land m\_2 \mathcal{R}m\_2' \land m \in m\_1' \bullet m\_2'$$

Because we use this induced Views logic to prove soundness of our type system by translation, we must ensure any type environment denotes a valid view:

Lemma 1 (Stable Environment Denotation-M). *For any* closed *environment* <sup>Γ</sup> *(i.e.,* <sup>∀</sup><sup>x</sup> <sup>∈</sup> dom(Γ). , FV(Γ(x)) <sup>⊆</sup> dom(Γ)*):* <sup>R</sup>(-<sup>Γ</sup>M*,tid*) <sup>⊆</sup> -ΓM*,tid. Alternatively, we say that environment denotation is* stable *(closed under* R*).*

*Proof.* In Appendix A.1 Lemma 7 of the technical report [20].

We elide the statement of the analogous result for the read-side critical section, available in Appendix A.1 of the technical report.

With this setup done, we can state the connection between the Views Framework logic induced by earlier parameters, and the type system from Sect. 4. The induced Views logic has a familiar notion of Hoare triple—{p}C{q} where p and q are elements of ViewM—with the usual rules for non-deterministic choice, nondeterministic iteration, sequential composition, and parallel composition, sound given the proof obligations just described above. It is parameterized by a rule for atomic commands that requires a specification of the triples for primitive operations, and their soundness (an obligation we must prove). This can then be used to prove that every typing derivation embeds to a valid derivation in the Views Logic, roughly ∀Γ, C, Γ ,*tid*. Γ <sup>C</sup> <sup>Γ</sup> ⇒ {-Γ*tid* }-C*tid* {-Γ *tid* } once for the writer type system, once for the readers.

There are two remaining subtleties to address. First, commands C also require translation: the Views Framework has only non-deterministic branches and loops, so the standard versions from our core language must be encoded. The approach to this is based on a standard idea in verification, which we show here for conditionals as shown in Fig. 11. assume(b) is a standard idea in verification semantics [4,30], which "does nothing" (freezes) if the condition b is false, so its postcondition in the Views logic can reflect the truth of b. assume in Fig. 11 adapts this for the Views Framework as in other Views-based proofs [13,14], specifying sets of machine states as a predicate. We write boolean expressions as shorthand for the set of machine states making that expression true. With this setup done, the top-level soundness claim then requires proving – once for the reader type system, once for the writer type system – that every valid source typing derivation corresponds to a valid derivation in the Views logic: ∀Γ, C, Γ , Γ *<sup>M</sup>* C Γ ⇒ {-Γ} ↓ C ↓ {-Γ }.

Second, we have not addressed a way to encode subtyping. One might hope this corresponds to a kind of implication, and therefore subtyping corresponds to consequence. Indeed, this is how we (and prior work [13,14]) address subtyping in a Views-based proof. Views defines the notion of *view shift*<sup>2</sup> () as a way to reinterpret a set of instrumented states as a new (compatible) set of instrumented states, offering a kind of logical consequence, used in a rule of consequence in the Views logic:

$$p \sqsubseteq q \overset{def}{=} \forall m \in \mathcal{M}. \left| p\*\{m\} \right| \subseteq \left| q\*\mathcal{R}(\{m\}) \right|.$$

We are now finally ready to prove the key lemmas of the soundness proof, relating subtying to view shifts, proving soundness of the primitive actions, and finally for the full type system. These proofs occur once for the writer type system, and once for the reader; we show here only the (more complex) writer obligations:

Lemma 2 (Axiom of Soundness for Atomic Commands). *For each axiom,* Γ<sup>1</sup> *<sup>M</sup>* α Γ2*, we show* ∀m. <sup>α</sup>(-<sup>Γ</sup>1*tid* ∗ {m}) <sup>⊆</sup>-<sup>Γ</sup>2*tid* ∗ R({m})

*Proof.* By case analysis on α. Details in Appendix A.1 of the technical report [20].

#### Lemma 3 (Context-SubTyping-M). <sup>Γ</sup> <sup>≺</sup>: <sup>Γ</sup> <sup>=</sup><sup>⇒</sup> -Γ*M,tid* -Γ *M,tid*

*Proof.* Induction on the subtyping derivation, then inducting on the single-type subtype relation for the first variable in the non-empty context case.

#### Lemma 4 (Views Embedding for Write-Side).

∀Γ, C, Γ ,*t*. Γ *<sup>M</sup>* <sup>C</sup> <sup>Γ</sup> <sup>⇒</sup> -Γ*<sup>t</sup>* ∩ -M*<sup>t</sup>* -C*<sup>t</sup>* -Γ *<sup>t</sup>* ∩ -M*<sup>t</sup>*

<sup>2</sup> This is the same notion present in later program logics like Iris [18], though more recent variants are more powerful.

*Proof.* By induction on the typing derivation, appealing to Lemma 2 for primitives, Lemma 3 and consequence for subtyping, and otherwise appealing to structural rules of the Views logic and inductive hypotheses. Full details in Appendix A.1 of the technical report [20].

The corresponding obligations and proofs for the read-side critical section type system are similar in statement and proof approach, just for the read-side type judgments and environment denotations.

# 7 Discussion and Related Work

Our type system builds on a great deal of related work on RCU implementations and models; and general concurrent program verification. Due to space limit, this section captures only discussions on program logics, modeling RCU and memory models, but our technical report [20] includes detailed discussions on model-checking [8,17,21], language oriented approaches [6,16,16] and realization of our semantics in an implementation as well.

Modeling RCU and Memory Models. Alglave et al. [2] propose a memory model to be assumed by the platform-independent parts of the Linux kernel, regardless of the underlying hardware's memory model. As part of this, they give the first formalization of what it means for an RCU implementation to be correct (previously this was difficult to state, as the guarantees in principle could vary by underlying CPU architecture). Essentially, reader critical sections must not span grace periods. They prove by hand that the Linux kernel RCU implementation [1] satisfies this property. McKenney has defined fundamental requirements of RCU implementations [26]; our model in Sect. 3 is a valid RCU implementation according to those requirements (assuming sequential consistency) aside from one performance optimization, *Read-to-Write Upgrade*, which is important in practice but not memory-safety centric – see the technical report [20] for detailed discussion on satisfying RCU requirements. To the best of our knowledge, ours is the first abstract *operational* model for a Linux kernel-style RCU implementation – others are implementation-specific [22] or axiomatic like Alglave et al.'s.

Tassarotti et al. model a well-known way of implementing RCU synchronization without hurting readers' performance—Quiescent State Based Reclamation (QSBR) [8]—where synchronization between the writer thread and reader threads occurs via per-thread counters. Tassarotti et al. [32] uses a protocol based program logic based on separation and ghost variables called GPS [34] to verify a user-level implementation of RCU with a singly linked list client under *releaseacquire* semantics, which is a weaker memory model than sequential-consistency. Despite the weaker model, the protocol that they enforce on their RCU primitives is nearly the same what our type system requires. The reads and writes to per thread QSBR structures are similar to our more abstract updates to reader and bounding sets. Therefore, we anticipate it would be possible to extend our type system in the future for similar weak memory models.

Program Logics. Fu et al. [12] extend Rely-Guarantee and Separation-Logic [10,11,35] with the *past-tense* temporal operator to eliminate the need for using a history variable and lift the standard separation conjunction to assert over on execution histories. Gotsman et al. [15] take assertions from temporal logic to separation logic [35] to capture the essence of epoch-based memory reclamation algorithms and have a simpler proof than what Fu et al. have [12] for Michael's non-blocking stack [29] implementation under a sequentially consistent memory model.

Tassarotti et al. [32] use *abstract-predicates* – e.g. WriterSafe – that are specialized to the singly-linked structure in their evaluation. This means reusing their ideas for another structure, such as a binary search tree, would require revising many of their invariants. By contrast, our types carry similar information (our denotations are similar to their definitions), but are reusable across at least singly-linked and tree data structures (Sect. 5). Their proofs of a linked list also require managing assertions about RCU implementation resources, while these are effectively hidden in the type denotations in our system. On the other hand, their proofs ensure full functional correctness. Meyer and Wolff [28] make a compelling argument that separating memory safety from correctness if profitable, and we provide such a decoupled memory safety argument.

### 8 Conclusions

We presented the first type system that ensures code uses RCU memory management safely, and which is significantly simpler than full-blown verification logics. To this end, we gave the first general operational model for RCU-based memory management. Based on our suitable abstractions for RCU in the operational semantics we are the first showing that decoupling the *memory-safety* proofs of RCU clients from the underlying reclamation model is possible. Meyer et al. [28] took similar approach for decoupling the *correctness* verification of the data structures from the underlying reclamation model under the assumption of the *memory-safety* for the data structures. We demonstrated the applicability/reusability of our types on two examples: a linked-list based bag [25] and a binary search tree [3]. To our best knowledge, we are the first presenting the *memory-safety* proof for a tree client of RCU. We managed to prove type soundness by embedding the type system into an abstract concurrent separation logic called the Views Framework [9] and encode many RCU properties as either type-denotations or global invariants over abstract RCU state. By doing this, we managed to discharge these invariants once as a part of soundness proof and did not need to prove them for each different client.

Acknowledgements. We are grateful to Matthew Parkinson for guidance and productive discussions on the early phase of this project. We also thank to Nik Sultana and Klaus V. Gleissenthall for their helpful comments and suggestions for improving the paper.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Language Design

# **Codata in Action**

Paul Downen<sup>1</sup>, Zachary Sullivan1(B) , Zena M. Ariola<sup>1</sup>, and Simon Peyton Jones<sup>2</sup>

> <sup>1</sup> University of Oregon, Eugene, USA {pdownen,zsulliva,ariola}@cs.uoregon.edu <sup>2</sup> Microsoft Research, Cambridge, UK simonpj@microsoft.com

**Abstract.** Computer scientists are well-versed in dealing with data structures. The same cannot be said about their dual: codata. Even though codata is pervasive in category theory, universal algebra, and logic, the use of codata for programming has been mainly relegated to representing infinite objects and processes. Our goal is to demonstrate the benefits of codata as a general-purpose programming abstraction independent of any specific language: eager or lazy, statically or dynamically typed, and functional or object-oriented. While codata is not featured in many programming languages today, we show how codata can be easily adopted and implemented by offering simple intercompilation techniques between data and codata. We believe codata is a common ground between the functional and object-oriented paradigms; ultimately, we hope to utilize the Curry-Howard isomorphism to further bridge the gap.

**Keywords:** Codata · Lambda-calculi · Encodings · Curry-Howard · Function programming · Object-oriented programming

# **1 Introduction**

Functional programming enjoys a beautiful connection to logic, known as the Curry-Howard correspondence, or proofs as programs principle [22]; results and notions about a language are translated to those about proofs, and vice-versa [17]. In addition to expressing computation as proof transformations, this connection is also fruitful for education: everybody would understand that the assumption "an x is zero" does not mean "every x is zero," which in turn explains the subtle typing rules for polymorphism in programs. The typing rules for modules are even more cryptic, but knowing that they correspond exactly to the rules for existential quantification certainly gives us more confidence that they are correct! While not everything useful must have a Curry-Howard correspondence, we believe finding these delightful coincidences where the same idea is rediscovered many times in both logic and programming can only be beneficial [42].

P. Downen and Z. M. Ariola—This work is supported by the National Science Foundation under grants CCF-1423617 and CCF-1719158.

One such instance involves *codata*. In contrast with the mystique it has as a programming construct, codata is pervasive in mathematics and logic, where it arises through the lens of duality. The most visual way to view the duality is in the categorical diagrams of sums versus products—the defining arrows go *into* a sum and come *out of* a product—and in algebras versus coalgebras [25]. In proof theory, codata has had an impact on theorem proving [5] and on the foundation of computation via *polarity* [29,45]. Polarity recognizes which of two dialogic actors speaks first: the proponent (who seeks to verify or prove a fact) or the opponent (who seeks to refute the fact).

The two-sided, interactive view appears all over the study of programming languages, where data is concerned about how values are constructed and codata is concerned about how they are used [15]. Sometimes, this perspective is readily apparent, like with session types [7] which distinguish internal choice (a provider's decision) versus external choice (a client's decision). But other occurrences are more obscure, like in the semantics of PCF (*i.e.* the call-by-name λ-calculus with numbers and general recursion). In PCF, the result of evaluating a program must be of a ground type in order to respect the laws of functions (namely η) [32]. This is not due to differences between ground types versus "higher types," but to the fact that data types are *directly observable*, whereas codata types are only *indirectly observable* via their interface.

Clearly codata has merit in theoretical pursuits; we think it has merit in practical ones as well. The main application of codata so far has been for representing infinite objects and coinductive proofs in proof assistants [1,39]. However, we believe that codata also makes for an important general-purpose programming feature. Codata is a bridge between the functional and object-oriented paradigms; a common denominator between the two very different approaches to programming. On one hand, functional languages are typically rich in data types—as many as the programmer wants to define via data declarations—but has a paucity of codata types (usually just function types). On the other hand, object-oriented languages are rich in codata types—programmer-defined in terms of classes or interfaces—but a paucity of data types (usually just primitives like booleans and numbers). We illustrate this point with a collection of example applications that arise in both styles of programming, including common encodings, demand-driven programming, abstraction, and Hoare-style reasoning.

While codata types can be seen in the shadows behind many examples of programming—often hand-compiled away by the programmer—not many functional languages have native support for them. To this end, we demonstrate a pair of simple compilation techniques between a typical core functional language (with data types) and one with codata. One direction—based on the well-known visitor pattern from object-oriented programming—simultaneously shows how to extend an object-oriented language with data types (as is done by Scala) and how to compile core functional programs to a more object-oriented setting (*e.g.* targeting a backend like JavaScript or the JVM). The other shows how to add native codata types to functional languages by reducing them to commonlysupported data types and how to compile a "pure" object-oriented style of programming to a functional setting. Both of these techniques are macroexpansions that are not specific to any particular language, as they work with both statically and dynamically typed disciplines, and they preserve the welltyped status of programs without increasing the complexity of the types involved.

Our claim is that codata is a universal programming feature that has been thus-far missing or diminished in today's functional programming languages. This is too bad, since codata is not just a feature invented for the convenience of programmers, but a persistent idea that has sprung up over and over from the study of mathematics, logic, and computation. We aim to demystify codata, and en route, bridge the wide gulf between the functional and object-oriented paradigms. Fortunately, it is easy for most mainstream languages to add or bring out codata today without a radical change to their implementation. But ultimately, we believe that the languages of the future should incorporate *both* data and codata outright. To that end, our contributions are to:


# **2 The Many Faces of Codata**

Codata can be used to solve other problems in programming besides representing infinite objects and processes like streams and servers [1,39]. We start by presenting codata as a merger between theory and practice, whereby *encodings* of data types in an object-oriented style turn out to be a useful intermediate step in the usual encodings of data in the λ-calculus. *Demand-driven programming* is considered a virtue of lazy languages, but codata is a language-independent tool for capturing this programming idiom. Codata exactly captures the essence of *procedural abstraction*, as achieved with λ-abstractions and objects, with a logically founded formalism [16]. Specifying *pre- and post-conditions* of protocols, which is available in some object systems [14], is straightforward with indexed, recursive codata types, *i.e.* objects with guarded methods [40].

#### **2.1 Church Encodings and Object-Oriented Programming**

Crucial information structures, like booleans, numbers, and lists can be encoded in the untyped λ-calculus (*a.k.a.* Church encodings) or in the typed polymorphic λ-calculus (*a.k.a.* B¨ohm-Berarducci [9] encodings). It is quite remarkable that data structures can be simulated with just first-class, higher-order functions. The downside is that these encodings can be obtuse at first blush, and have the effect of obscuring the original program when *everything* is written with just λs and application. For example, the λ-representation of the boolean value True, the first projection out of a pair, and the constant function K are all expressed as λx.λy.x, which is not that immediately evocative of its multi-purpose nature.

Object-oriented programmers have also been representing data structures in terms of objects. This is especially visible in the Smalltalk lineage of languages like Scala, wherein an objective is that everything that can be an object is. As it turns out, the object-oriented features needed to perform this representation technique are *exactly* those of codata. That is because Church-style encodings and object-oriented representations of data all involve *switching focus from the way values are built (i.e. introduced) to the way they are used (i.e. eliminated)*.

Consider the representation of Boolean values as an algebraic data type. There may be many ways to use a Boolean value. However, it turns out that there is a *most-general* eliminator of Booleans: the expression if b then x else y. This basic construct can be used to define all the other uses for Bools. Instead of focusing on the constructors True and False let's then focus on this most-general form of Bool elimination; this is the essence of the encodings of booleans in terms of objects. In other words, booleans can be thought of as objects that implement a single method: If. So that the expression if b then x else y would instead be written as (b.If x y). We then define the true and false values in terms of their reaction to If:

$$\texttt{true} \triangleq \{\texttt{If} \; \texttt{x} \; \texttt{y} \rightarrow \texttt{x}\} \qquad\qquad\qquad \texttt{false} \; \texttt{=} \{\texttt{If} \; \texttt{x} \; \texttt{y} \rightarrow \texttt{y}\}$$

Or alternatively, we can write the same definition using copatterns, popularized for use in the functional paradigm by Abel *et al.* [1] by generalizing the usual pattern-based definition of functions by multiple clauses, as:

true.If x y = x false.If x y = y

This works just like equational definitions by pattern-matching in functional languages: the expression to the left of the equals sign is the same as the expression to the right (for any binding of x and y). Either way, the net result is that (true.If "yes" "no") is "yes", whereas (false.If "yes" "no") is "no".

This covers the object-based presentation of booleans in a dynamically typed language, but how do static types come into play? In order to give a type description of the above boolean objects, we can use the following interface, analogous to a Java interface:

# codata Bool where If : Bool → (forall a. a → a → a)

This declaration is dual to a data declaration in a functional language: data declarations define the types of constructors (which produce values of the data type) and codata declarations define the types of destructors (which consume values of the codata type) like If. The reason that the If observation introduces its own polymorphic type a is because an if-then-else might return any type of result (as long as both branches agree on the type). That way, both the two objects true and false above are values of the codata type Bool.

At this point, the representation of booleans as codata looks remarkably close to the encodings of booleans in the λ-calculus! Indeed, the only difference is that in the λ-calculus we "anonymize" booleans. Since they reply to only one request, that request name can be dropped. We then arrive at the familiar encodings in the polymorphic λ-calculus:

```
Bool = ∀a.a → a → a true = Λa.λx:a.λy:a.x false = Λa.λx:a.λy:a.y
```
In addition, the invocation of the If method just becomes ordinary function application; b.If x y of type a is written as baxy. Otherwise, the definition and behavior of booleans as either codata types or as polymorphic functions are the same.

This style of inverting the definition of data types—either into specific codata types or into polymorphic functions—is also related to another concept in objectoriented programming. First, consider how a functional programmer would represent a binary Tree (with integer-labeled leaves) and a walk function that traverses a tree by converting the labels on all leaves and combining the results of sub-trees:

```
data Tree where Leaf : Int → Tree
                Branch : Tree → Tree → Tree
walk : (Int → a ) → ( a → a → a ) → Tree → a
walk b f (Leaf x) = b x
```
walk b f (Branch l r) = f ( walk b f l) (walk b f r) The above code relies on pattern-matching on values of the Tree data type and higher-order functions b and f for accumulating the result. Now, how might an object-oriented programmer tackle the problem of traversing a tree-like structure? The *visitor pattern*! With this pattern, the programmer specifies a "visitor" object which contains knowledge of what to do at every node of the tree, and tree objects must be able to accept a visitor with a method that will recursively walk down each subcomponent of the tree. In a pure style—which returns an accumulated result directly instead of using mutable state as a side channel for results—the visitor pattern for a simple binary tree interface will look like:

```
codata TreeVisitor a where
  VisitLeaf : TreeVisitor a → (Int → a )
  VisitBranch : TreeVisitor a → ( a → a → a )
codata Tree where
 Walk : Tree → (forall a. TreeVisitor a → a )
leaf : Int → Tree
leaf x = {Walk v → v.VisitLeaf x}
branch : Tree → Tree → Tree
branch l r = {Walk v → v.VisitBranch (l. Walk v) (r.Walk v)}
```
And again, we can write this same code more elegantly, without the need to break apart the two arguments across the equal sign with a manual abstraction, using copatterns as:

(leaf x).Walk v = v.VisitLeaf x (branch l r).Walk v = v. VisitBranch (l. Walk v) (r.Walk v)

Notice how the above code is just an object-oriented presentation of the following encoding of binary trees into the polymorphic λ-calculus:

*Tree* <sup>=</sup> <sup>∀</sup>a.*TreeVisitor* <sup>a</sup> <sup>→</sup> <sup>a</sup> *TreeVisitor* <sup>a</sup> = (*Int* <sup>→</sup> <sup>a</sup>) <sup>×</sup> (<sup>a</sup> <sup>→</sup> <sup>a</sup> <sup>→</sup> <sup>a</sup>) *leaf* : *Int* <sup>→</sup> *Tree leaf* (x:*Int*) = Λa.λv:*TreeVisitor* a. (*fst* v) x *branch* : <sup>∀</sup>a.*Tree* <sup>→</sup> *Tree* <sup>→</sup> *Tree branch* (l:*Tree*) (r:*Tree*) = Λa.λv:*TreeVisitor* a. (*snd* v) (lav) (rav)

The only essential difference between this λ-encoding of trees versus the λencoding of booleans above is currying: the representation of the data type *Tree* takes a single product *TreeVisitor* a of the necessary arguments, whereas the data type *Bool* takes the two necessary arguments separately. Besides this easily-converted difference of currying, the usual B¨ohm-Berarducci encodings shown here correspond to a pure version of the visitor pattern.

#### **2.2 Demand-Driven Programming**

In "Why functional programming matters" [23], Hughes motivates the utility of practical functional programming through its excellence in compositionality. When designing programs, one of the goals is to decompose a large problem into several manageable sub-problems, solve each sub-problem in isolation, and then compose the individual parts together into a complete solution. Unfortunately, Hughes identifies some examples of programs which resist this kind of approach.

In particular, numeric algorithms—for computing square roots, derivatives integrals—rely on an infinite sequence of approximations which converge on the true answer only in the limit of the sequence. For these numeric algorithms, the decision on when a particular approximation in the sequence is "close enough" to the real answer lies solely in the eyes of the beholder: only the observer of the answer can say when to stop improving the approximation. As such, standard imperative implementations of these numeric algorithms are expressed as a single, complex loop, which interleaves both the concerns of producing better approximations with the termination decision on when to stop. Even more complex is the branching structure of the classic minimax algorithm from artificial intelligence for searching for reasonable moves in two-player games like chess, which can have an unreasonably large (if not infinite) search space. Here, too, there is difficulty separating generation from selection, and worse there is the intermediate step of pruning out uninteresting sub-trees of the search space (known as alpha-beta pruning). As a result, a standard imperative implementation of minimax is a single, recursive function that combines all the tasks generation, pruning, estimation, and selection—at once.

Hughes shows how both instances of failed decomposition can be addressed in functional languages through the technique of *demand-driven programming*. In each case, the main obstacle is that the control of how to drive the next step of the algorithm—whether to continue or not—lies with the consumer. The producer of potential approximations and game states, in contrast, should only take over when demanded by the consumer. By giving primary control to the consumer, each of these problems can be decomposed into sensible sub-tasks, and recomposed back together. Hughes uses lazy evaluation, as found in languages like Miranda and Haskell, in order to implement the demand-driven algorithms. However, the downside of relying on lazy evaluation is that it is a whole-language decision: a language is either lazy by default, like Haskell, or not, like OCaml. When working in a strict language, expressing these demand-driven algorithms with manual laziness loses much of their original elegance [33].

In contrast, a language should directly support the capability of yielding control to the consumer independently of the language being strict or lazy; analogously to what happens with lambda abstractions. An abstraction computes on-demand, why is this property relegated to this predefined type only? In fact, the concept of *codata* also has this property. As such, it allows us to describe demand-driven programs in an agnostic way which works just as well in Haskell as in OCaml without any additional modification. For example, we can implement Hughes' demand-driven AI game in terms of codata instead of laziness. To represent the current game state, and all of its potential developments, we can use an arbitrarily-branching tree codata type.

```
codata Tree a where
 Node : Tree a → a
 Children : Tree a → List (Tree a)
```
The task of generating all potential future boards from the current board state produces one of these tree objects, described as follows (where moves of type Board → List Board generates a list of possible moves):

```
gameTree : Board → Tree Board
(gameTree b).Node = b
(gameTree b).Children = map gameTree (moves b)
```
Notice that the tree might be finite, such as in the game of Tic-Tac-Toe. However, it would still be inappropriate to waste resources fully generating all moves before determining which are even worth considering. Fortunately, the fact that the responses of a codata object are only computed when demanded means that the consumer is in full control over how much of the tree is generated, just as in Hughes' algorithm. This fact lets us write the following simplistic prune function which cuts off sub-trees at a fixed depth.

```
prune : Int → Tree Board → Tree Board
(prune x t).Node = t.Node
(prune 0 t).Children = []
(prune x t).Children = map (prune(x -1)) t. Children
```
The more complex alpha-beta pruning algorithm can be written as its own pass, similar to prune above. Just like Hughes' original presentation, the evaluation of the best move for the opponent is the composition of a few smaller functions:

eval = maximize . maptree score . prune 5 . gameTree

What is the difference between this codata version of minimax and the one presented by Hughes that makes use of laziness? They both compute on-demand which makes the game efficient. However, demand-driven code written with codata can be easily ported between strict and lazy languages with only syntactic changes. In other words, codata is a general, portable, programming feature which is the key for compositionality in program design.<sup>1</sup>

#### **2.3 Abstraction Mechanism**

In the pursuit of scalable and maintainable program design, the typical followup to composability is abstraction. The basic purpose of abstraction is to hide certain implementation details so that different parts of the code base need not be concerned with them. For example, a large program will usually be organized into several different parts or "modules," some of which may hold general-purpose "library" code and others may be application-specific "clients" of those libraries. Successful abstractions will leverage tools of the programming language in question so that there is a clear interface between libraries and their clients, codifying which details are exposed to the client and which are kept hidden inside the library. A common such detail to hide is the concrete representation of some data type, like strings and collections. Clear abstraction barriers give freedom to both the library implementor (to change hidden details without disrupting any clients) as well as the client (to ignore details not exposed by the interface).

Reynolds [35] identified, and Cook [12] later elaborated on, two different mechanisms to achieve this abstraction: abstract data types and procedural abstraction. Abstract data types are crisply expressed by the Standard ML module system, based on existential types, which serves as a concrete practical touchstone for the notion. Procedural abstraction is pervasively used in object-oriented languages. However, due to the inherent differences among the many languages and the way they express procedural abstraction, it may not be completely clear of what the "essence" is, the way existential types are the essence of modules. *What is the language-agnostic representation of procedural abstraction? Codata!* The combination of observation-based interfaces, message-passing, and dynamic dispatch are exactly the tools needed for procedural abstraction. Other common object-oriented features—like inheritance, subtyping, encapsulation, and mutable state—are orthogonal to this particular abstraction goal. While they may be useful extensions to codata for accomplishing programming tasks, only pure codata itself is needed to represent abstraction.

<sup>1</sup> To see the full code for all the examples of [24] implemented in terms of codata, visit https://github.com/zachsully/codata examples.

Specifying a codata type is giving an interface—between an implementation and a client—so that instances of the type (implementations) can respond to requests (clients). In fact, method calls are the only way to interact with our objects. As usual, there is no way to "open up" a higher-order function—one example of a codata type—and inspect the way it was implemented. The same intuition applies to all other codata types. For example, Cook's [12] procedural "set" interface can be expressed as a codata type with the following observations:

```
codata Set where
  IsEmpty : Set → Bool
  Contains : Set → Int → Bool
  Insert : Set → Int → Set
  Union : Set → Set → Set
```
Every single object of type Set will respond to these observations, which is the only way to interact with it. This abstraction barrier gives us the freedom of defining several different instances of Set objects that can all be freely composed with one another. One such instance of Set uses a list to keep track of a hidden state of the contained elements (where elemOf : List Int → Int → Bool checks if a particular number is an element of the given list, and the operation fold : (a → b → b) → b → List a → b is the standard functional fold):

```
finiteSet : List Int → Set
(finiteSet xs).IsEmpty = xs == []
(finiteSet xs).Contains y = elemOf xs y
(finiteSet xs).Insert y = finiteSet (y:xs)
(finiteSet xs). Union s = fold (λx t → t.Insert x) s xs
```

```
emptySet = finiteSet []
```
But of course, many other instances of Set can also be given. For example, this codata type interface also makes it possible to represent infinite sets like the set evens of all even numbers which is defined in terms of the more general evensUnion that unions all even numbers with some other set (where the function isEven : Int → Int checks if a number is even):

```
evens = evensUnion emptySet
evensUnion : Set → Set
(evensUnion s). IsEmpty = False
(evensUnion s).Contains y = isEven y || s.Contains y
(evensUnion s).Insert y = evensUnion (s.Insert y)
(evensUnion s). Union t = evensUnion (s. Union t)
```
Because of the natural abstraction mechanism provided by codata, different Set implementations can interact with each other. For example, we can union a finite set and evens together because both definitions of Union know nothing of the internal structure of the other Set. Therefore, all we can do is apply the observations provided by the Set codata type.

While sets of numbers are fairly simplistic, there are many more practical real-world instances of the procedural abstraction provided by codata to be found in object-oriented languages. For example, databases are a good use of abstraction, where basic database queries can be represented as the observations on table objects. A simplified interface to a database table (containing rows of type a) with selection, deletion, and insertion, is given as follows:

```
codata Database a where
  Select : Database a → (a → Bool) → List a
  Delete : Database a → (a → Bool) → Database a
  Insert : Database a → a → Database a
```
On one hand, specific implementations can be given for connecting to and communicating with a variety of different databases—like Postgres, MySQL, or just a simple file system—which are hidden behind this interface. On the other hand, clients can write generic operations independently of any specific database, such as copying rows from one table to another or inserting a row into a list of compatible tables:

```
copy : Database a → Database a → Database a
copy from to = let rows = from.Select(λ_ → True)
               in foldr (λrow db → db.Insert row) to rows
insertAll : List ( Database a) → a → List ( Database a)
insertAll dbs row = map (λdb → db.Insert row) dbs
```
In addition to abstracting away the details of specific databases, both copy and insertAll can communicate between completely different databases by just passing in the appropriate object instances, which all have the same generic type. Another use of this generality is for testing. Besides the normal instances of Database a which perform permanent operations on actual tables, one can also implement a fictitious *simulation* which records changes only in temporary memory. That way, client code can be seamlessly tested by running and checking the results of simulated database operations that have no external side effects by just passing pure codata objects.

#### **2.4 Representing Pre- and Post-Conditions**

The extension of data types with indexes (*a.k.a.* generalized algebraic data types) has proven useful to statically verify a data structure's invariant, like for redblack trees [43]. With indexed data types, the programmer can inform the static type system that a particular value of a data type satisfies some additional conditions by constraining the way in which it was constructed. Unsurprisingly, indexed codata types are dual and allow the creator of an object to constrain the way it is going to be used, thereby adding pre- and post-conditions to the observations of the object. In other words, in a language with type indexes, codata enables the programmer to express more information in its interface.

This additional expressiveness simplifies applications that rely on a type index to guard observations. Thibodeau *et al.* [40] give examples of such programs, including an automaton specification where its transitions correspond to an observation that changes a pre- and post-condition in its index, and a fair resource scheduler where the observation of several resources is controlled by an index tracking the number of times they have been accessed. For concreteness, let's use an indexed codata type to specify safe protocols as in the following example from an object-oriented language with guarded methods:

```
index Raw , Bound , Live
codata Socket i where
 Bind : Socket Raw → String → Socket Bound
 Connect : Socket Bound → Socket Live
 Send : Socket Live → String → ()
 Receive : Socket Live → String
 Close : Socket Live → ()
```
This example comes from DeLine and F¨ahndrich [14], where they present an extension to C constraining the pre- and post-conditions for method calls. If we have an instance of this Socket i interface, then observing it through the above methods can return new socket objects with a different index. The index thereby governs the order in which clients are allowed to apply these methods. A socket will start with the index Raw. The only way to use a Socket Raw is to Bind it, and the only way to use a Socket Bound is to Connect it. This forces us to follow a protocol when initializing a Socket.

**Intermezzo 1.** This declaration puts one aspect in the hands of the programmer, though. A client can open a socket and never close it, hogging the resource. We can remedy this problem with linear types, which force us to address any loose ends before finishing the program. With linear types, it would be a type error to have a lingering Live socket laying around at the end of the program, and a call to Close would use it up. Furthermore, linear types would ensure that outdated copies of Socket objects cannot be used again, which is especially appropriate for actions like Bind which is meant to *transform* a Raw socket into a Bound one, and likewise for Connect which transforms a Bound socket into a Live one. Even better, enhancing linear types with a more sophisticated notion of ownership—like in the Rust programming language which differentiates a *permanent* transfer of ownership from *temporarily* borrowing it—makes this resource-sensitive interface especially pleasant. Observations like Bind, Connect, and Close which are meant to fully consume the observed object would involve full ownership of the object itself to the method call and effectively replace the old object with the returned one. In contrast, observations like Send and Receive which are meant to be repeated on the same object would merely borrow the object for the duration of the action so that it could be used again.

# **3 Inter-compilation of Core Calculi**

We saw previously examples of using codata types to replicate well-known encodings of data types into the λ-calculus. Now, let's dive in and show how data and codata types formally relate to one another. In order to demonstrate the relationship, we will consider two small languages that extend the common polymorphic λ-calculus: λdata extends λ with user-defined algebraic data types, and λcodata extends λ with user-defined codata types. In the end, we will find that both of these foundational languages can be inter-compiled into one another. Data can be represented by codata via the visitor pattern (V). Codata can be represented by data by tabulating the possible answers of objects (T).

In essence, this demonstrates how to compile programs between the functional and object-oriented paradigms. The T direction shows how to extend existing functional languages (like OCaml, Haskell, or Racket) with codata objects without changing their underlying representation. Dually, the V direction shows how to compile functional programs with data types into an object-oriented target language (like JavaScript).

Each of the encodings are macro expansions, in the sense that they leave the underlying base λ-calculus constructs of functions, applications, and variables unchanged (as opposed to, for example, continuation-passing style translations). They are defined to operate on untyped terms, but they also preserve typability when given well-typed terms. The na¨ıve encodings preserve the operational semantics of the original term, according to a call-by-name semantics. We also illustrate how the encodings can be modified slightly to correctly simulate the call-by-value operational semantics of the source program. To conclude, we show how the languages and encodings can be generalized to more expressive type systems, which include features like existential types and indexed types (*a.k.a.* generalized algebraic data types and guarded methods).

**Notation.** We use both an overline t and dots t<sup>1</sup> ... to indicate a *sequence* of terms <sup>t</sup> (and likewise for types, variables, *etc.*). The arrow type <sup>τ</sup> <sup>→</sup> <sup>T</sup> means <sup>τ</sup><sup>1</sup> →···→ <sup>τ</sup><sup>n</sup> <sup>→</sup> <sup>T</sup>; when <sup>n</sup> is 0, it is not a function type, *i.e.* just the codomain T. The application K t means (((K t1) ...) tn); when n is 0, it is not a function application, but the constant K. We write a single step of an operational semantics with the arrow →, and many steps (*i.e.* its reflexive-transitive closure) as →→. Operational steps may occur within an evaluation context <sup>E</sup>, *i.e.* <sup>t</sup> → <sup>t</sup> implies that E[t] → E[t ].

#### **3.1 Syntax and Semantics**

We present the syntax and semantics of the base language and the two extensions λdata and λcodata. For the sake of simplicity, we keep the languages as minimal as possible to illustrate the main inter-compilations. Therefore, λdata and λcodata do not contain recursion, nested (co)patterns, or indexed types. The extension with recursion is standard, and an explanation of compiling (co)patterns can be found in [11,38,39]. Indexed types are later discussed informally in Sect. 3.6.

Syntax:

Type τ, ρ ::= <sup>a</sup> <sup>|</sup> τ ρ | ∀a. τ Term t, u, e ::= <sup>x</sup> <sup>|</sup> t u <sup>|</sup> λx. e

Operational Semantics:

$$\begin{aligned} \text{Call-by-name} \\ V &::= x \mid \lambda x. e \qquad E ::= \Box \mid E \; u \\ & (\lambda x. e) \; u \mapsto e[u/x] \end{aligned} \qquad \begin{aligned} \text{Call-by-value} \\ V &::= x \mid \lambda x. e \qquad E ::= \Box \mid E \; u \mid V \; E \end{aligned}$$

Type System (where S = t for call-by-name and S = V for call-by-value):

$$\begin{array}{c} \frac{x:\tau\in\Gamma}{\Gamma\vdash x:\tau} \quad \quad \frac{\Gamma\vdash t:\tau\to\rho \quad \Gamma\vdash u:\tau}{\Gamma\vdash t\ u:\rho} \quad \quad \frac{\Gamma, x:\tau\vdash e:\rho}{\Gamma\vdash \lambda x.e:\tau\to\rho} \\\hline \frac{\Gamma, a\vdash S:\tau}{\Gamma\vdash S:\forall a.\tau} \quad \quad \frac{\Gamma\vdash t:\forall a.\tau \quad \Gamma\vdash \rho}{\Gamma\vdash t:\tau[\rho/a]} \end{array}$$

**The Base Language.** We will base both our core languages of interest on a common starting point: the polymorphic λ-calculus as shown in Fig. 1. <sup>2</sup> This is the standard simply typed λ-calculus extended with impredicative polymorphism (*a.k.a.* generics). There are only three forms of terms (variables x, applications t u, and function abstractions λx.e) and three forms of types (type variables a, function types τ → ρ, and polymorphic types ∀a.τ ). We keep the type abstraction and instantiation implicit in programs—as opposed to explicit as in System F—for two reasons. First, this more accurately resembles the functional languages in which types are inferred, as opposed to mandatory annotations explicit within the syntax of programs. Second, it more clearly shows how the translations that follow do not rely on first knowing the type of terms, but apply to any untyped term. In other words, the compilation techniques are also appropriate for dynamically typed languages like Scheme and Racket.

Figure 1 reviews both the standard call-by-name and call-by-value operational semantics for the λ-calculus. As usual, the difference between the two is that in call-by-value, the argument of a function call is evaluated prior to substitution, whereas in call-by-name the argument is substituted first. This is implied by the different set of evaluation contexts (E) and the fact that the operational rule uses a more restricted notion of value (V ) for substitutable arguments in call-by-value. Note that, there is an interplay between evaluation and typing. In a more general setting where effects are allowed, the typing rule for introducing polymorphism (*i.e.* the rule with <sup>S</sup> : <sup>∀</sup>a.τ in the conclusion) is only safe for substitutable terms, which imposes the well-known the *value restriction* for callby-value (limiting S to values), but requires no such restriction in call-by-name where every term is a substitutable value (letting S be any term).

<sup>2</sup> The judgement <sup>Γ</sup> <sup>ρ</sup> should be read as: all free type variables in <sup>ρ</sup> occur in <sup>Γ</sup>. As usual Γ, a means that a does not occur free in Γ.

**Fig. 2.** λ*data*: Extending polymorphic λ-calculus with data types

**A Language with Data.** The first extension of the λ-calculus is with userdefined data types, as shown in Fig. 2; it corresponds to a standard core language for statically typed functional languages. Data declarations introduce a new type constructor (T) as well as some number of associated constructors (K) that build values of that data type. For simplicity, the list of branches in a case expression are considered unordered and non-overlapping (*i.e.* no two branches for the same constructor within a single case expression). The types of constructors are given alongside free variables in Γ, and the typing rule for constructors requires they be fully applied. We also assume an additional side condition to the typing rule for case expressions that the branches are exhaustive (*i.e.* every constructor of the data type in question is covered as a premise).

Figure 2 presents the extension to the operational semantics from Fig. 1, which is also standard. The new evaluation rule for data types reduces a case expression matched with an applied constructor. Note that since the branches are unordered, the one matching the constructor is chosen out of the possibilities and the parameters of the constructor are substituted in the branch's pattern. There is also an additional form of constructed values: in call-by-name any constructor application is a value, whereas in call-by-value only constructors parameterized by other values is a value. As such, call-by-value goes on to evaluate constructor parameters in advance, as shown by the extra evaluation context. In both evaluation strategies, there is a new form of evaluation context that points out the discriminant of a case expression, since it is mandatory to determine which constructor was used before deciding the appropriate branch to take.

Syntax: Declaration d ::= **codata** U a **where** H : U a τ... Type τ, ρ ::= <sup>a</sup> <sup>|</sup> τ ρ | ∀a. τ <sup>|</sup> <sup>U</sup> <sup>ρ</sup> Term t, u, e ::= <sup>x</sup> <sup>|</sup> t u <sup>|</sup> λx. e <sup>|</sup> t.<sup>H</sup> | {<sup>H</sup> <sup>e</sup>} Operational Semantics: Call-by-name <sup>V</sup> ::= ···| {<sup>H</sup> <sup>e</sup>} <sup>E</sup> ::= ···| E.<sup>H</sup> {<sup>H</sup> e, . . .}.<sup>H</sup> <sup>e</sup> Call-by-value <sup>V</sup> ::= ···| {<sup>H</sup> <sup>e</sup>} <sup>E</sup> ::= ···| E.<sup>H</sup> {<sup>H</sup> e, . . .}.<sup>H</sup> → <sup>e</sup> Type System: <sup>H</sup> : <sup>∀</sup>a. <sup>U</sup> a τ <sup>∈</sup> Γ Γ <sup>t</sup> : <sup>U</sup> <sup>ρ</sup> <sup>Γ</sup> t.<sup>H</sup> : <sup>τ</sup> [ρ/a] <sup>Γ</sup> <sup>H</sup><sup>1</sup> : <sup>U</sup> ρ τ<sup>1</sup> <sup>Γ</sup> <sup>e</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup> ... <sup>Γ</sup> {H<sup>1</sup> <sup>e</sup>1, ...} : <sup>U</sup> <sup>ρ</sup>

**Fig. 3.** λ*codata*: Extending polymorphic λ-calculus with codata types

**A Language with Codata.** The second extension of the λ-calculus is with user-defined codata types, as shown in Fig. 3. Codata declarations in λcodata define a new type constructor (U) along with some number of associated destructors (H) for projecting responses out of values of a codata type. The type level of λcodata corresponds directly to λdata. However, at the term level, we have codata observations of the form t.H using "dot notation", which can be thought of as sending the message H to the object t or as a method invocation from object-oriented languages. Values of codata types are introduced in the form {H<sup>1</sup> → e1,..., H<sup>n</sup> → e<sup>n</sup>}, which lists each response this value gives to all the possible destructors of the type. As with case expressions, we take the branches to be unordered and non-overlapping for simplicity.

Interestingly, the extension of the operational semantics with codata—the values, evaluation contexts, and reduction rules—are identical for both call-byname and call-by-value evaluation. In either evaluation strategy, a codata object {<sup>H</sup> <sup>→</sup> e, . . .} is considered a value and the codata observation t.<sup>H</sup> *must* evaluate t no matter what to continue, leading to the same form of evaluation context E.H. The additional evaluation rule selects and invokes the matching branch of a codata object and is the same regardless of the evaluation strategy.

Note that the reason that values of codata types are the same in any evaluation strategy is due to the fact that the branches of the object are only ever evaluated on-demand, *i.e.* when they are observed by a destructor, similar to the fact that the body of a function is only ever evaluated when the function is called. This is the semantic difference that separates codata types from records found in many programming languages. Records typically map a collection of labels to a collection of values, which are evaluated in advance in a call-by-value language similar to the constructed values of data types. Whereas with codata objects, labels map to *behavior* which is only invoked when observed.

V ⎡ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ ⎣ **data** T a **where** K<sup>1</sup> : τ<sup>1</sup> T a . . . K*<sup>n</sup>* : τ*<sup>n</sup>* T a ⎤ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎦ = **codata** T*visit* a b **where** K<sup>1</sup> : T*visit* a b τ<sup>1</sup> b . . . K*<sup>n</sup>* : T*visit* a b τ*<sup>n</sup>* b **codata** T a **where** Case<sup>T</sup> : <sup>T</sup> <sup>a</sup> <sup>∀</sup>b. <sup>T</sup>*visit* ab b <sup>V</sup>[[K*<sup>i</sup>* <sup>t</sup>]] = {Case<sup>T</sup> λv. (v.K*i*) <sup>V</sup>[[t]]} <sup>V</sup>[[**case** <sup>t</sup> {K<sup>1</sup> <sup>x</sup><sup>1</sup> <sup>e</sup>1,...}]] = (V[[t]].CaseT) {K<sup>1</sup> λx1. <sup>V</sup>[[e1]],...}

**Fig. 4.** <sup>V</sup> : <sup>λ</sup>*data* <sup>→</sup> <sup>λ</sup>*codata* mapping data to codata via the visitor pattern

The additional typing rules for λcodata are also given in Fig. 3. The rule for typing t.H is analogous to a combination of type instantiation and application, when viewing H as a function of the given type. The rule for typing a codata object, in contrast, is similar to the rule for typing a case expression of a data type. However, in this comparison, the rule for objects is partially "upside down" in the sense that the primary type in question (U ρ) appears in the conclusion rather than as a premise. This is the reason why there is one less premise for typing codata objects than there is for typing data case expressions. As with that rule, we assume that the branches are exhaustive, so that every destructor of the codata type appears in the premise.

#### **3.2 Compiling Data to Codata: The Visitor Pattern**

In Sect. 2.1, we illustrated how to convert a data type representing trees into a codata type. This encoding corresponds to a rephrasing of the object-oriented visitor pattern to avoid unnecessary side-effects. Now lets look more generally at the pattern, to see how any algebraic data type in λdata can be encoded in terms of codata in λcodata.

The visitor pattern has the net effect of inverting the orientation of a data declaration (wherein construction comes first) into codata declarations (wherein destruction comes first). This reorientation can be used for compiling userdefined data types in λdata to codata types in λcodata as shown in Fig. 4. As with all of the translations we will consider, this is a macro expansion since the syntactic forms from the base λ-calculus are treated homomorphically (*i.e.* V[[λx. e]] = λx. V[[e]], V[[t u]] = V[[t]] V[[u]], and V[[x]] = x). Furthermore, this translation also perfectly preserves types, since the types of terms are exactly the same after translation (*i.e.* V[[τ ]] = τ ).

Notice how each data type (T a) gets represented by *two* codata types: the "visitor" (T*visit* a b) which says what to do with values made with each constructor, and the type itself (T a) which has one method which accepts a visitor and returns a value of type b. An object of the codata type, then, must be capable of accepting *any* visitor, no matter what type of result it returns. Also notice that we include no other methods in the codata type representation of T a.

At the level of terms, first consider how the case expression of the data type is encoded. The branches of the case (contained within the curly braces) are represented as a first-class object of the visitor type: each constructor is mapped to the corresponding destructor of the same name and the variables bound in the pattern are mapped to parameters of the function returned by the object in each case. The whole case expression itself is then implemented by calling the sole method (CaseT) of the codata object and passing the branches of the case as the corresponding visitor object. Shifting focus to the constructors, we can now see that they are compiled as objects that invoke the corresponding destructor on any given visitor, and the terms which were parameters to the constructor are now parameters to a given visitor's destructor. Of course, other uses of the visitor pattern might involve a codata type (T) with more methods implementing additional functionality besides case analysis. However, we only need the one method to represent data types in λdata because case expressions are *the* primitive destructor for values of data types in the language.

For example, consider applying the above visitor pattern to a binary tree data type as follows:

$$\begin{array}{l} \text{\(\textbf{\\_}}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text{\\_}\text$$

Note how this encoding differs from the one that was given in Sect. 2.1 since the CaseTree method is non-recursive whereas the WalkTree method was recursive, in order to model a depth-first search traversal of the tree.

Of course, other operations, like the walk function, could be written in terms of case expressions and recursion as usual by an encoding with above method calls. However, it is possible to go one step further and include other primitive destructors—like recursors or iterators in the style of G¨odel's system T—by embedding them as other methods of the encoded codata type. For example, we can represent walk as a primitive destructor as it was in Sect. 2.1 *in addition* to non-recursive case analysis by adding an alternative visitor Tree*walk* and one more destructor to the generated Tree codata type like so:

```
codata Treewalk b where
  Leaf : Int → b
  Branch : b → b → b
                                   codata Tree where
                                      CaseTree : Tree → ∀b. Treevisit b → b
                                      WalkTree : Tree → ∀b. Treewalk b → b
               V[[Leaf n]] = CaseTree → λv. v.Leaf n
                              WalkTree → λw. w.Leaf n
```
For codata types with <sup>n</sup> destructors, where <sup>n</sup> <sup>≥</sup> 1:

T ⎡ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎣ **codata** U a **where** <sup>H</sup><sup>1</sup> : <sup>U</sup> a τ<sup>1</sup> . . . H*<sup>n</sup>* : U a τ*<sup>n</sup>* ⎤ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ <sup>⎦</sup> <sup>=</sup> **data** <sup>U</sup> <sup>a</sup> **where** Table<sup>U</sup> : <sup>τ</sup><sup>1</sup> ··· <sup>τ</sup>*<sup>n</sup>* <sup>U</sup> <sup>a</sup> <sup>T</sup>[[t.H*i*]] = **case** <sup>T</sup>[[t]] {Table<sup>U</sup> <sup>y</sup><sup>1</sup> ...y*<sup>n</sup>* <sup>y</sup>*i*} <sup>T</sup>[[{H<sup>1</sup> <sup>e</sup>1,..., <sup>H</sup>*<sup>n</sup>* <sup>e</sup>*n*}]] = Table<sup>U</sup> <sup>T</sup>[[e1]] ... <sup>T</sup>[[e*n*]]

For codata types with 0 destructors (where Unit is the same for every such U):

T **codata** U a **where** *--no destructors* = **data** Unit **where** unit : Unit <sup>T</sup>[[{}]] = unit

**Fig. 5.** <sup>T</sup> : <sup>λ</sup>*codata* <sup>→</sup> <sup>λ</sup>*data* tabulating codata responses with data tuples

$$\mathfrak{W}[\mathsf{Branch } l \; r] = \begin{cases} \mathsf{Case}\_{\mathsf{Tone}} \to \lambda v. v. \mathsf{Branch } l \; r \\ \mathsf{Wallk}\_{\mathsf{Tone}} \to \lambda w. w. \mathsf{Branch } (l. \mathsf{Wallk}\_{\mathsf{Tone}}) \; (r. \mathsf{Wallk}\_{\mathsf{Tone}}) \end{cases}$$

where the definition of Tree*visit* and the encoding of case expressions is the same. In other words, this compilation technique can generalize to as many primitive observations and recursion schemes as desired.

#### **3.3 Compiling Codata to Data: Tabulation**

Having seen how to compile data to codata, how can we go the other way? The reverse compilation would be useful for extending functional languages with user-defined codata types, since many functional languages are compiled to a core representation based on the λ-calculus with data types.

Intuitively, the declared data types in λdata can be thought of as "sums of products." In contrast, the declared codata types in λcodata can be thought of as "products of functions." Since both core languages are based on the λ-calculus, which has higher-order functions, the main challenge is to relate the two notions of "products." The codata sense of products are based on projections out of abstract objects, where the different parts are viewed individually and only when demanded. The data sense of products, instead, are based on tuples, in which all components are laid out in advance in a single concrete structure.

One way to convert codata to data is to *tabulate* an object's potential answers ahead of time into a data structure. This is analogous to the fact that a function of type Bool → String can be alternatively represented by a tuple of type String \* String, where the first and second components are the responses of the original function to true and false, respectively. This idea can be applied to λcodata in general as shown in the compilation in Fig. 5.

A codata declaration of U becomes a data declaration with a single constructor (TableU) representing a tuple containing the response for each of the original destructors of U. At the term level, a codata abstraction is compiled by concretely tabulating each of its responses into a tuple using the Table<sup>U</sup> constructor. A destructor application returns the specific component of the constructed tuple which corresponds to that projection. Note that, since we assume that each object is exhaustive, the tabulation transformation is relatively straightforward; filling in "missing" method definitions with some error value that can be stored in the tuple at the appropriate index would be done in advance as a separate pre-processing step.

Also notice that there is a special case for non-observable "empty" codata types, which are all collapsed into a single pre-designated Unit data type. The reason for this collapse is to ensure that this compilation preserves typability: if applied to a well-typed term, the result is also well-typed. The complication arises from the fact that when faced with an empty object {}, we have no idea which constructor to use without being given further typing information. So rather than force type checking or annotation in advance for this one degenerate case, we instead collapse them all into a single data type so that there is no need to differentiate based on the type. In contrast, the translation of non-empty objects is straightforward, since we can use the name of any one of the destructors to determine the codata type it is associated with, which then informs us of the correct constructor to use.

#### **3.4 Correctness**

For the inter-compilations between λcodata into λdata to be useful in practice, they should preserve the semantics of programs. For now, we focus only on the call-by-name semantics for each of the languages. With the static aspect of the semantics, this means they should preserve the typing of terms.

**Proposition 1 (Type Preservation).** *For each of the* V *and* T *translations: if* <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> *then* [[Γ]] [[t]] : [[<sup>τ</sup> ]] *(in the call-by-name type system).*

*Proof (Sketch).* By induction on the typing derivation of <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> .

With the dynamic aspect of the semantics, the translations should preserve the outcome of evaluation (either converging to some value, diverging into an infinite loop, or getting stuck) for both typed and untyped terms. This works because each translation preserves the reduction steps, values, and evaluation contexts of the source calculus' call-by-name operational semantics.

**Proposition 2 (Evaluation Preservation).** *For each of the* V *and* T *translations:* <sup>t</sup> →→ <sup>V</sup> *if and only if* [[t]] →→ [[<sup>V</sup> ]] *(in the call-by-name semantics).*

*Proof (Sketch).* The forward ("only if") implication is a result of the following facts that hold for each translation in the call-by-name semantics:


The reverse ("if") implication then follows from the fact that the call-by-name operational semantics of both source and target languages is deterministic.

#### **3.5 Call-by-Value: Correcting the Evaluation Order**

The presented inter-compilation techniques are correct for the call-by-name semantics of the calculi. But what about the call-by-value semantics? It turns out that the simple translations seen so far do not correctly preserve the callby-value semantics of programs, but they can be easily fixed by being more careful about how they treat the values of the source and target calculi. In other words, we need to make sure that values are translated to values, and evaluation contexts to evaluation contexts. For instance, the following translation (up to renaming) does not preserve the call-by-value semantics of the source program:

$$\mathsf{ST}[\{\mathsf{Fst}\rightarrow\ error,\mathsf{Snd}\rightarrow\mathsf{True}\}] = \mathsf{Pair\ error\ True}$$

The object {Fst <sup>→</sup> *error* , Snd <sup>→</sup> True} is a value in call-by-value, and the erroneous response to the Fst will only be evaluated when observed. However, the structure Pair *error* True is not a value in call-by-value, because the field *error* must be evaluated in advance which causes an error immediately. In the other direction, we could also have

$$\{\mathfrak{A}[\mathsf{Pair\ error\ True}]\mathsf{True}\} = \{\mathsf{Case} \to \lambda v.v.\mathsf{Pair\ error\ True} \mathsf{True}\}$$

Here, the immediate error in Pair *error* True has become incorrectly delayed inside the value {Case <sup>→</sup> λv. v.Pair *error* True}.

The solution to this problem is straightforward: we must manually delay computations that are lifted out of (object or λ) abstractions, and manually force computations before their results are hidden underneath abstractions. For the visitor pattern, the correction is to only introduce the codata object on constructed values. We can handle other constructed terms by naming their non-value components in the style of administrative-normalization like so:

$$\begin{aligned} \mathfrak{3}[\mathbb{K}\_i \ \overline{V}] &= \{ \mathbf{Case}\_{\overline{l}} \to \lambda v. v. \mathbb{K}\_i \ \overline{V} \} \\ \mathfrak{3}[\mathbb{K}\_i \ \overline{V} \ u \ \overline{t}] &= \text{let } x = u \text{ in } \mathfrak{3}[\mathbb{K}\_i \ \overline{V} \ x \ \overline{t}] \end{aligned} \qquad \text{if } u \text{ is not a value}$$

Conversely, the tabulating translation T will cause the on-demand observations of the object to be converted to preemptive components of a tuple structure. To counter this change in evaluation order, a thunking technique can be employed as follows:

$$\begin{aligned} \mathfrak{T}[\mathsf{t}.\mathsf{H}\_{i}] &= \mathsf{case}\ \mathfrak{T}[\mathsf{t}]\ \{\mathsf{Table}\_{\mathsf{U}}\ y\_{1}\ldots y\_{n} \rightarrow \mathsf{force}\ y\_{i}\} \\ \mathfrak{T}[\{\mathsf{H}\_{1}\rightarrow e\_{1},\ldots,\mathsf{H}\_{n}\rightarrow e\_{n}\}] &= \mathsf{Table}\_{\mathsf{U}}\ (\mathsf{delay}\ \mathfrak{T}[e\_{1}])\ldots(\mathsf{delay}\ \mathfrak{T}[e\_{n}]) \end{aligned}$$

The two operations can be implemented as **delay** t = λz. t and **force** t = t unit as usual, but can also be implemented as more efficient memoizing operations. With all these corrections, Propositions 1 and 2 also hold for the call-by-value type system and operational semantics.

#### **3.6 Indexed Data and Codata Types: Type Equalities**

In the world of types, we have so far only formally addressed inter-compilation between languages with simple and polymorphic types. What about the compilation of indexed data and codata types? It turns out some of the compilation techniques we have discussed so far extend to type indexes without further effort, whereas others need some extra help. In particular, the visitor-pattern-based translation V can just be applied straightforwardly to indexed data types:

> V ⎡ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ ⎣ **data** T a **where** K<sup>1</sup> : τ<sup>1</sup> → T ρ<sup>1</sup> . . . K*<sup>n</sup>* : τ*<sup>n</sup>* → T ρ*<sup>n</sup>* ⎤ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎦ = **codata** T*visit* a b **where** K<sup>1</sup> : T*visit* ρ<sup>1</sup> b → τ<sup>1</sup> → b . . . K*<sup>n</sup>* : T*visit* ρ*<sup>n</sup>* b → τ*<sup>n</sup>* → b **codata** T a **where** Case<sup>T</sup> : T a → ∀b. T*visit* a b → b

In this case, the notion of an indexed visitor codata type exactly corresponds to the mechanics of case expressions for GADTs. In contrast, the tabulation translation T does not correctly capture the semantics of indexed codata types, if applied na¨ıvely.

Thankfully, there is a straightforward way of "simplifying" indexed data types to more conventional data types using some built-in support for *type equalities*. The idea is that a constructor with a more specific return type can be replaced with a conventional constructor that is parameterized by type equalities that *prove* that the normal return type must be the more specific one. The same idea can be applied to indexed codata types as well. A destructor that can only act on a more specific instance of the codata type can instead be replaced by one which works on any instance, but then immediately asks for *proof* that the object's type is the more specific one before completing the observation. These two translations, of replacing type indexes with type equalities, are defined as:

> Eq ⎡ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ ⎣ **data** T a **where** K<sup>1</sup> : τ<sup>1</sup> → T ρ<sup>1</sup> . . . K*<sup>n</sup>* : τ*<sup>n</sup>* → T ρ*<sup>n</sup>* ⎤ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎦ = **data** T a **where** K<sup>1</sup> : a ≡ ρ<sup>1</sup> → τ<sup>1</sup> → T a . . . K*<sup>n</sup>* : a ≡ ρ*<sup>n</sup>* → τ*<sup>n</sup>* → T a Eq ⎡ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ ⎣ **codata** U a **where** H<sup>1</sup> : U ρ<sup>1</sup> → τ<sup>1</sup> . . . H*<sup>n</sup>* : U ρ*<sup>n</sup>* → τ*<sup>n</sup>* ⎤ ⎥ ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎦ = **codata** U a **where** H<sup>1</sup> : U a → a ≡ ρ<sup>1</sup> → τ<sup>1</sup> . . . H*<sup>n</sup>* : U a → a ≡ ρ*<sup>n</sup>* → τ*<sup>n</sup>*

This formalizes the intuition that indexed data types can be thought of as *enriching* constructors to carry around additional constraints that were available at their time of construction, whereas indexed codata types can be thought of as *guarding* methods with additional constraints that must be satisfied before an observation can be made. Two of the most basic examples of this simplification are for the type declarations which capture the notion of type equality as an indexed data or indexed codata type, which are defined and simplified like so:

Eq **data** Eq a b **where** Refl : Eq a a <sup>=</sup> **data** Eq a b **where** Refl : a ≡ b → Eq a b Eq **codata** IfEq abc **where** AssumeEq : IfEq aac → c <sup>=</sup> **codata** IfEq abc **where** AssumeEq : IfEq abc → a ≡ b → c

With the above ability to simplify away type indexes, *all* of the presented compilation techniques are easily generalized to indexed data and codata types by composing them with Eq. For practical programming example, consider the following safe stack codata type indexed by its number of elements.

> **codata** Stack a **where** Pop : Stack (Succ <sup>a</sup>) <sup>→</sup> (Z, Stack <sup>a</sup>) Push : Stack <sup>a</sup> <sup>→</sup> <sup>Z</sup> <sup>→</sup> Stack (Succ <sup>a</sup>)

This stack type is safe in the sense that the Pop operation can only be applied to non-empty Stacks. We cannot compile this to a data type via T directly, because that translation does not apply to indexed codata types. However, if we first simplify the Stack type via Eq, we learn that we can replace the type of the Pop destructor with Pop : Stack <sup>a</sup> → ∀b.a <sup>≡</sup> Succ <sup>b</sup> <sup>→</sup> (Z, Stack <sup>b</sup>), whereas the Push destructor is already simple, so it can be left alone. That way, for any object s : Stack Zero, even though a client can initiate the observation s.Pop, it will never be completed since there is no way to choose a b and prove that Zero equals Succ b. Therefore, the net result of the combined T◦Eq translation turns Stack into the following data type, after some further simplification:

**data** Stack a **where** MkS : (∀b.a <sup>≡</sup> Succ <sup>b</sup> <sup>→</sup> (Z, Stack <sup>b</sup>)) <sup>→</sup> (<sup>Z</sup> <sup>→</sup> Stack (Succ <sup>a</sup>)) <sup>→</sup> Stack <sup>a</sup>

Notice how the constructor of this type has two fields; one for Pop and one for Push, respectively. However, the Pop operation is guarded by a proof obligation: the client can only receive the top integer and remaining stack if he/she proves that the original stack contains a non-zero number of elements.

# **4 Compilation in Practice**

We have shown how data and codata are related through the use of two different core calculi. To explore how these ideas manifest in practice, we have implemented codata in a couple of settings. First, we extended Haskell with codata


**Table 1.** Fibonacci scaling tests for the GHC implementation

in order to compare the lazy and codata approaches to demand-driven programming described in Sect. 2.2. <sup>3</sup> Second, we have created a prototype language with indexed (co)data types to further explore the interaction between the compilation and target languages. The prototype language does not commit to a particular evaluation strategy, typing discipline, or paradigm; instead this decision is made when compiling a program to one of several backends. The supported backends include functional ones—Haskell (call-by-need, static types), OCaml (call-by-value, static types), and Racket (call-by-value, dynamic types)—as well as the object-oriented JavaScript.<sup>4</sup> The following issues of complex copattern matching and sharing applies to both implementations; the performance results on efficiency of memoized codata objects are tested with the Haskell extension for the comparison with conventional Haskell code.

**Complex Copattern Matching.** Our implementations support nested copatterns so that objects can respond to chains of multiple observations, even though λcodata only provides flat copatterns. This extension does not enhance the language expressivity but allows more succinct programs [2]. A flattening step is needed to compile nested copatterns down to a core calculus, which has been explored in previous work by Setzer *et al.* [37] and Thibodeau [39] and implemented in OCaml by Regis-Gianas and Laforgue [33]. Their flattening algorithm requires copatterns to completely cover the object's possible observations because the coverage information is used to drive flattening. This approach was refined and incorporated in a dependently typed setting by Cockx and Abel [11]. With our goal of supporting codata independently of typing discipline and coverage analysis, we have implemented the purely syntax driven approach to flattening found in [38]. For example, the prune function from Sect. 2.2 expands to:

```
prune = λx → λt →
  { Node → t.Node ,
    Children → case x of
                  0 → []
                  _ → map (prune(x -1)) t. Children }
```
**Sharing.** If codata is to be used instead of laziness for demand-driven programming, then it must have the same performance characteristics, which relies on sharing the results of computations [6]. To test this, we compare the performance of calculating streams of Fibonacci numbers—the poster child for sharing implemented with both lazy list data types and a stream codata type in Haskell

<sup>3</sup> The GHC fork is at https://github.com/zachsully/ghc/tree/codata-macro.

<sup>4</sup> The prototype compiler is at https://github.com/zachsully/dl/tree/esop2019.

Syntax Values - <sup>V</sup> ::= ···| {<sup>H</sup> <sup>V</sup> } Terms t, u, e ::= ···| t.<sup>H</sup> | {<sup>H</sup> <sup>V</sup> } | **let**need <sup>x</sup> <sup>=</sup> <sup>t</sup> **in** <sup>e</sup> Transformation <sup>A</sup>[[t.H]] = <sup>A</sup>[[t]].<sup>H</sup> <sup>A</sup>[[{<sup>H</sup> <sup>t</sup>}]] = **let**need <sup>x</sup> <sup>=</sup> <sup>A</sup>[[t]] **in** {<sup>H</sup> <sup>x</sup>}

**Fig. 6.** Memoization of λ*codata*

extended with codata. These tests, presented in Table 1, show the speed of the codata version is always slower in terms of run time and allocations than the lazy list version, but the difference is small and the two versions scale at the same rate. These performance tests are evidence that codata shares the same information when compiled to a call-by-need language; this we get for free because call-by-need data constructors—which codata is compiled into via T—memoize their fields. In an eager setting, it is enough to use memoized versions of **delay** and **force**, which are introduced by the call-by-value compilation described in Sect. 3.5. This sharing is confirmed by the OCaml and Racket backends of the prototype language which find the 100th Fibonacci in less than a second (a task that takes hours without sharing).

As the object-oriented representative, the JavaScript backend is a compilation from data to codata using the visitor pattern presented in Sect. 3.2. Because codata remains codata (*i.e.* JavaScript objects), an optimization must be performed to ensure the same amount of sharing of codata as the other backends. The solution is to lift out the branches of a codata object, as shown in Fig. 6, where the call-by-need let-bindings can be implemented by **delay** and **force** in strict languages as usual. It turns out that this transformation is also needed in an alternative compilation technique presented by Regis-Gianas and Laforgue [33] where codata is compiled to functions, *i.e.* another form of codata.

### **5 Related Work**

Our work follows in spirit of Amin *et al.*'s [3] desire to provide a minimal theory that can model type parameterization, modules, objects and classes. Another approach to combine type parameterization and modules is also offered by 1ML [36], which is mapped to System F. Amin *et al.*'s work goes one step further by translating System F to a calculus that directly supports objects and classes. Our approach differs in methodology: instead of searching for a logical foundation of a pre-determined notion of objects, we let the logic guide us while exploring what objects are. Even though there is no unanimous consensus that functional and object-oriented paradigms should be combined, there have been several hybrid languages for combining both styles of programming, including Scala, the Common Lisp Object System [8], Objective ML [34], and a proposed but unimplemented object system for Haskell [30].

Arising out of the correspondence between programming languages, category theory, and universal algebras, Hagino [20] first proposed codata as an extension to ML to remedy the asymmetry created by data types. In the same way that data types represent initial F-algebras, codata types represent final F-coalgebras. These structures were implemented in the categorical programming language Charity [10]. On the logical side of the correspondence, codata arises naturally in the sequent calculus [15,28,44] since it provides the right setting to talk about construction of either the provider (*i.e.* the term) or the client (*i.e.* the context) side of a computation, and has roots in classical [13,41] and linear logic [18,19].

In session-typed languages, which also have a foundation in linear logic, external choice can be seen as a codata (product) type dual to the way internal choice corresponds to a data (sum) type. It is interesting that similar problems arise in both settings. Balzer and Pfenning [7] discuss an issue that shows up in choosing between internal and external choice; this corresponds to choosing between data and codata, known as the *expression problem*. They [7] also suggest using the visitor pattern to remedy having external choice (codata) without internal choice (data) as we do in Sect. 3.2. Of course, session types go beyond codata by adding a notion of temporality (via linearity) and multiple processes that communicate over channels.

To explore programming with coinductive types, Ancona and Zucca [4] and Jeannin *et al.* [26] extended Java and OCaml with regular cyclic structures; these have a finite representation that can be eagerly evaluated and fully stored in memory. A less restricted method of programming these structures was introduced by Abel *et al.* [1,2] who popularized the idea of programming by observations, *i.e.* using copatterns. This line of work further developed the functionality of codata types in dependently typed languages by adding indexed codata types [40] and dependent copattern matching [11], which enabled the specification of bisimulation proofs and encodings of productive infinite objects in Agda. We build on these foundations by developing codata in practical languages.

Focusing on implementation, Regis-Gianas and Laforgue [33] added codata with a macro transformation in OCaml. As it turns out, this macro definition corresponds to one of the popular encodings of objects in the λ-calculus [27], where codata/objects are compiled to functions from tagged messages to method bodies. This compilation scheme requires the use of GADTs for static type checking, and is therefore only applicable to dynamically typed languages or the few statically typed languages with expressive enough type systems like Haskell, OCaml, and dependently typed languages. Another popular technique for encoding codata/objects is presented in [31], corresponding to a class-based organization of dynamic dispatch [21], and is presented in this paper. This technique compiles codata/objects to products of methods, which has the advantage of being applicable in a simply-typed setting.

### **6 Conclusion**

We have shown here how codata can be put to use to capture several practical programming idioms and applications, besides just modeling infinite structures. In order to help incorporate codata in today's programming languages, we have shown how to compile between two core languages: one based on the familiar notion of data types from functional languages such as Haskell and OCaml, and the other one, based on the notion of a structure defined by reactions to observations [1]. This paper works toward the goal of providing common ground between the functional and object-oriented paradigms; as future work, we would like to extend the core with other features of full-fledged functional and objectoriented languages. A better understanding of codata clarifies both the theory and practice of programming languages. Indeed, this work is guiding us in the use of fully-extensional functions for the compilation of Haskell programs. The design is motivated by the desire to improve optimizations, in particular the ones relying on the "arity" of functions, to be more compositional and work between higher-order abstractions. It is interesting that the deepening of our understanding of objects is helping us in better compiling functional languages!

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Composing Bidirectional Programs Monadically

Li-yao Xia1(B) , Dominic Orchard<sup>2</sup>, and Meng Wang<sup>3</sup>

 University of Pennsylvania, Philadelphia, USA xialiyao@seas.upenn.edu University of Kent, Canterbury, UK University of Bristol, Bristol, UK

Abstract. Software frequently converts data from one representation to another and vice versa. Naïvely specifying both conversion directions separately is error prone and introduces conceptual duplication. Instead, *bidirectional programming* techniques allow programs to be written which can be interpreted in both directions. However, these techniques often employ unfamiliar programming idioms via restricted, specialised combinator libraries. Instead, we introduce a framework for composing bidirectional programs monadically, enabling bidirectional programming with familiar abstractions in functional languages such as Haskell. We demonstrate the generality of our approach applied to parsers/printers, lenses, and generators/predicates. We show how to leverage compositionality and equational reasoning for the verification of *round-tripping properties* for such monadic bidirectional programs.

# 1 Introduction

A *bidirectional transformation* (BX) is a pair of mutually related mappings between source and target data objects. A well-known example solves the *viewupdate problem* [2] from relational database design. A *view* is a derived database table, computed from concrete *source* tables by a query. The problem is to map an update of the view back to a corresponding update on the source tables. This is captured by a bidirectional transformation. The bidirectional pattern is found in a broad range of applications, including parsing [17,30], refactoring [31], code generation [21,27], and model transformation [32] and XML transformation [25].

When programming a bidirectional transformation, one can separately construct the forwards and backwards functions. However, this approach duplicates effort, is prone to error, and causes subsequent maintenance issues. These problems can be avoided by using specialised programming languages that generate both directions from a single definition [13,16,33], a discipline known as *bidirectional programming*.

The most well-known language family for BX programming is *lenses* [13]. A lens captures transformations between sources S and views V via a pair of functions get : <sup>S</sup> <sup>→</sup> <sup>V</sup> and put : <sup>V</sup> <sup>→</sup> <sup>S</sup> <sup>→</sup> <sup>S</sup>. The get function extracts a view from a source and put takes an updated view and a source as inputs to produce an updated source. The asymmetrical nature of get and put makes it possible for put to recover some of the source data that is not present in the view. In other words, get does not have to be injective to have a corresponding put.

Bidirectional transformations typically respect *round-tripping* laws, capturing the extent to which the transformations preserve information between the two data representations. For example, *well-behaved lenses* [5,13] should satisfy:

$$\text{put } (\text{get } s) \; s = s \qquad\qquad \text{get } (\text{put } v \; s) = v$$

Lens languages are typically designed to enforce these properties. This focus on unconditional correctness inevitably leads to reduced practicality in programming: lens combinators are often stylised and disconnected from established programming idioms. In this paper, we instead focus on expressing bidirectional programs directly, using monads as an interface for sequential composition.

Monads are a popular pattern [35] (especially in Haskell) which combinator libraries in other domains routinely exploit. Introducing monadic composition to BX programming significantly expands the expressiveness of BX languages and opens up a route for programmers to explore the connection between BX programming and mainstream uni-directional programming. Moreover, it appears that many applications of bidirectional transformations (e.g., parsers and printers [17]) do not share the lens *get*/*put* pattern, and as a result have not been sufficiently explored. However, monadic composition is known to be an effective way to construct at least one direction of such transformations (e.g., parsers).

*Contributions.* In this paper, we deliberately avoid the well-tried approach of specialised lens languages, instead exploring a novel point in the BX design space based on monadic programming, naturally reusing host language constructs. We revisit lenses, and two more bidirectional patterns, demonstrating how they can be subject to monadic programming. By being uncompromising about the monad interface, we expose the essential ideas behind our framework whilst maximising its utility. The trade off with our approach is that we can no longer enforce correctness in the same way as conventional lenses: our interface does not rule out all non-round-tripping BXs. We tackle this issue by proposing a new compositional reasoning framework that is flexible enough to characterise a variety of round-tripping properties, and simplifies the necessary reasoning.

Specifically, we make the following contributions:


– We present a scalable reasoning framework, capturing notions of *compositionality* for bidirectional properties (Sect. 4). We define classes of round-tripping properties inherent to bidirectionalism, which can be verified by following simple criteria. These principles are demonstrated with our three examples. We include some proofs for illustration in the paper. The supplementary material [12] contains machine-checked Coq proofs for the main theorems.

An extended version of this manuscript [36] includes additional definitions, proofs, and comparisons in its appendices.

– We have implemented these ideas as Haskell libraries [12], with two wrappers around attoparsec for parsers and printers, and QuickCheck for generators and predicates, showing the viability of our approach for real programs.

We use Haskell for concrete examples, but the programming patterns can be easily expressed in many functional languages. We use the Haskell notation of assigning type signatures to expressions via an infix double colon "::".

#### 1.1 Further Examples of BX

We introduced lenses briefly above. We now introduce the other two examples used in this paper: *parsers/printers* and *generators/predicates*.

*Parsing and printing.* Programming language tools (such as interpreters, compilers, and refactoring tools) typically require two intimately linked components: *parsers* and *printers*, respectively mapping from source code to ASTs and back. A simple implementation of these two functions can be given with types:

```
parser :: String → AST printer :: AST → String
```
Parsers and printers are rarely actual inverses to each other, but instead typically exhibit a variant of round-tripping such as:

parser ◦ printer ◦ parser ≡ parser printer ◦ parser ◦ printer ≡ printer

The left equation describes the common situation that parsing discards information about source code, such as whitespace, so that printing the resulting AST does not recover the original source. However, printing retains enough information such that parsing the printed output yields an AST which is equivalent to the AST from parsing the original source. The right equation describes the dual: printing may map different ASTs to the same string. For example, printed code 1+2+3 might be produced by left- and right-associated syntax trees.

For particular AST subsets, printing and parsing may actually be left- or right- inverses to each other. Other characterisations are also possible, e.g., with equivalence classes of ASTs (accounting for reassociations). Alternatively, parsers and printers may satisfy properties about the interaction of partially-parsed inputs with the printer and parser, e.g., if parser :: String <sup>→</sup> (AST, String):

```
(let (x, s') = parser s in parser ((printer x) ++ s')) ≡ parser s
```
Thus, parsing and printing follows a pattern of inverse-like functions which does not fit the lens paradigm. The pattern resembles lenses between a source (source code) and view (ASTs), but with a compositional notion for the source and partial "gets" which consume some of the source, leaving a remainder.

Writing parsers and printers by hand is often tedious due to the redundancy implied by their inverse-like relation. Thus, various approaches have been proposed for reducing the effort of writing parsers/printers by generating both from a common definition [17,19,30].

*Generating and checking.* Property-based testing (e.g., QuickCheck) [10] expresses program properties as executable predicates. For instance, the following property checks that an insertion function insert, given a sorted list—as checked by the predicate isSorted :: [Int] <sup>→</sup> Bool—produces another sorted list. The combinator <sup>=</sup><sup>⇒</sup> represents implication for properties.

To test this property, a testing framework generates random inputs for val and list. The implementation of <sup>=</sup><sup>⇒</sup> applied here first checks whether list is sorted, and if it is, checks that insert val list is sorted as well. This process is repeated with further random inputs until either a counterexample is found or a predetermined number of test cases pass.

However, this naïve method is inefficient: many properties such as propInsert have preconditions which are satisfied by an extremely small fraction of inputs. In this case, the ratio of sorted lists among lists of length n is inversely proportional to n!, so most generated inputs will be discarded for not satisfying the isSorted precondition. Such tests give no information about the validity of the predicate being tested and thus are prohibitively inefficient.

When too many inputs are being discarded, the user must instead supply the framework with *custom generators* of values satisfying the precondition: genSorted :: Gen [Int].

One can expect two complementary properties of such a generator. A generator is *sound* with respect to the predicate isSorted if it generates only values satisfying isSorted; soundness means that no tests are discarded, hence the tested property is better exercised. A generator is *complete* with respect to isSorted if it can generate all satisfying values; completeness ensures the correctness of testing a property with isSorted as a precondition, in the sense that if there is a counterexample, it will be eventually generated. In this setting of testing, completeness, which affects the potential adequacy of testing, is arguably more important than soundness, which affects only efficiency.

It is clear that generators and predicates are closely related, forming a pattern similar to that of bidirectional transformations. Given that good generators are usually difficult to construct, the ability to extract both from a common specification with bidirectional programming is a very attractive alternative.

*Roadmap.* We begin by outlining a concrete example of our monadic approach via parsers and printers (Sect. 2), before explaining the general approach of using *monadic profunctors* to structure bidirectional programs (Sect. 3). Section 4 then presents a compositional reasoning framework for monadic bidirectional programs, with varying degrees of strength adapted to different round-tripping properties. We then replay the developments of the earlier sections to define lenses as well as generators and predicates in Sects. 5 and 6.

# 2 Monadic Bidirectional Programming

A bidirectional parser, or *biparser*, combines both a parsing direction and printing direction. Our first novelty here is to express biparsers monadically.

In code samples, we use the Haskell pun of naming variables after their types, e.g., a variable of some abstract type v will also be called v. Similarly, for some type constructor m, a variable of type m v will be called mv. A function u → m v (a Kleisli arrow for a monad m) will be called kv.

*Monadic parsers.* The following data type provides the standard way to describe parsers of values of type v which may consume only part of the input string:

```
data Parser v = Parser { parse :: String → (v, String) }
```
It is well-known that such parsers are monadic [35], i.e., they have a notion of monadic sequential composition embodied by the interface:

```
instance Monad Parser where
  (>>=) :: Parser v → (v → Parser w) → Parser w
  return :: v → Parser v
```
The sequential composition operator (>>=), called *bind*, describes the scheme of constructing a parser by sequentially composing two sub-parsers where the second depends on the output of the first; a parser of w values is made up of a parser of v and a parser of w that depends on the previously parsed v. Indeed, this is the implementation given to the monadic interface:

```
pv >>= kw = Parser (λs → let (v, s') = parse pv s in parse (kw v) s')
return v = Parser (λs → (v, s))
```
Bind first runs the parser pv on an input string s, resulting in a value v which is used to create the parser kw v, which is in turn run on the remaining input s' to produce parsed values of type w. The return operation creates a trivial parser for any value v which does not consume any input but simply produces v.

In practice, parsers composed with (>>=) often have a relationship between the output types of the two operands: usually that the former "contains" the latter in some sense. For example, we might parse an expression and compose this with a parser for statements, where statements contain expressions. This relationship will be useful later when we consider printers.

As a shorthand, we can discard the remaining unparsed string of a parser using projection, giving a helper function parser :: Parser <sup>v</sup> <sup>→</sup> (String <sup>→</sup> v).

*Monadic printers.* Our goal is to augment parsers with their inverse printer, such that we have a monadic type Biparser which provides two complementary (bi-directional) transformations:

parser :: Biparser <sup>v</sup> <sup>→</sup> (String <sup>→</sup> v) printer :: Biparser <sup>v</sup> <sup>→</sup> (v <sup>→</sup> String)

However, this type of printer v → String (shown also in Sect. 1.1) cannot form a monad because it is *contravariant* in its type parameter v. Concretely, we cannot implement the bind (>>=) operator for values with types of this form:

We are stuck trying to fill the hole (??) as there is no way to get a value of type v to pass as an argument to pv (first printer) and kw (second printer which depends on a v). Subsequently, we cannot construct a monadic biparser by simply taking a product of the parser monad and v → String and leveraging the result that the product of two monads is a monad.

But what if the type variables of bind were related by containment, such that v is contained within w and thus we have a projection w → v? We could use this projection to fill the hole in the failed attempt above, defining a bind-like operator:

bind' :: (w → v) → (v → String) → (v → (w → String)) → (w → String) bind' from pv kw = <sup>λ</sup><sup>w</sup> <sup>→</sup> **let** v = from w **in** pv v ++ kw v w

This is closer to the monadic form, where from :: <sup>w</sup> <sup>→</sup> <sup>v</sup> resolves the difficulty of contravariance by "contextualizing" the printers. Thus, the first printer is no longer just "a printer of v", but "a printer of v extracted from w". In the context of constructing a bidirectional parser, having such a function to hand is not an unrealistic expectation: recall that when we compose two parsers, typically the values of the first parser for v are contained within the values returned by the second parser for w, thus a notion of projection can be defined and used here to recover a v in order to build the corresponding printer compositionally.

Of course, this is still not a monad. However, it suggests a way to generate a monadic form by putting the printer and the contextualizing projection together, (w → v, v → String) and fusing them into (w → (v, String)). This has the advantage of removing the contravariant occurence of v, yielding a data type:

**data** Printer wv= Printer { print :: <sup>w</sup> <sup>→</sup> (v, String) }

If we fix the first parameter type w, then the type Printer w of printers for w values is indeed monadic, combining a *reader monad* (for some global read-only parameter of type w) and a *writer monad* (for strings), with implementation:

The printer return v ignores its input and prints nothing. For bind, an input w is shared by both printers and the resulting strings are concatenated.

We can adapt the contextualisation of a printer by the following operation which amounts to pre-composition, witnessing the fact that Printer is a contravariant functor in its first parameter:

```
comap :: (w → w') → Printer w' v → Printer w v
comap from (Printer f) = Printer (f ◦ from)
```
#### 2.1 Monadic Biparsers

So far so good: we now have a monadic notion of printers. However, our goal is to combine parsers and printers in a single type. Since we have two monads, we use the standard result that a product of monads is a monad, defining *biparsers*:

By pairing parsers and printers we have to unify their covariant parameters. When both the type parameters of Biparser are the same it is easy to interpret this type: a biparser Biparser v v is a parser from strings to v values and printer from v values to strings. We refer to biparsers of this type as *aligned* biparsers. What about when the type parameters differ? A biparser of type Biparser u v provides a parser from strings to v values and a printer from u values to strings, but where the printers can compute v values from u values, i.e., u is some common broader representation which contains relevant v-typed subcomponents. A biparser Biparser u v can be thought of as printing a certain subtree v from the broader representation of a syntax tree u.

The corresponding monad for Biparser is the product of the previous two monad definitions for Parser and Printer, allowing both to be composed sequentially at the same time. To avoid duplication we elide the definition here which is shown in full in Appendix A of the extended version [36]

We can also lift the previous notion of comap from printers to biparsers, which gives us a way to contextualize a printer:

```
comap :: (u → u') → Biparser u' v → Biparser u v
comap f (Biparser parse print) = Biparser parse (print ◦ f)
upon :: Biparser u' v → (u → u') → Biparser u v
upon = flip comap
```
In the rest of this section, we use the alias "upon" for comap with flipped parameters where we read p 'upon' subpart as applying the printer of p :: Biparser u' v on a subpart of an input of type u calculated by subpart :: <sup>u</sup> <sup>→</sup> u', thus yielding a biparser of type Biparser u v.

*An example biparser.* Let us write a biparser, string :: Biparser String String, for strings which are prefixed by their length and a space. For example, the following unit tests should be true:

154 L. Xia et al.

We start by defining a primitive biparser of single characters as:

A character is parsed by deconstructing the source string into its head and tail. For brevity, we do not handle the failure associated with an empty string. A character c is printed as its single-letter string (a singleton list) paired with c.

Next, we define a biparser int for an integer followed by a single space. An auxiliary biparser digits (on the right) parses an integer one digit at a time into a string. Note that in Haskell, the **do**-notation statement desugars to "char 'upon' head >>= λ d → . . . " which uses (>>=) and a function binding d in the scope of the rest of the desugared block.

On the right, digits extracts a String consisting of digits followed by a single space. As a parser, it parses a character (char 'upon' head); if it is a digit then it continues parsing recursively (digits 'upon' tail) appending the first character to the result (d : igits). Otherwise, if the parsed character is a space the parser returns . As a printer, digits expects a non-empty string of the same format; 'upon' head extracts the first character of the input, then char prints it and returns it back as d; if it is a digit, then 'upon' tail extracts the rest of the input to print recursively. If the character is a space, the printer returns a space and terminates; otherwise (not digit or space) the printer throws an error.

On the left, the biparser int uses read to convert an input string of digits (parsed by digits) into an integer, and printedInt to convert an integer to an output string printed by digits. A safer implementation could return the Maybe type when parsing but we keep things simple here for now.

After parsing an integer n, we can parse the string following it by iterating n times the biparser char. This is captured by the replicateBiparser combinator below, defined recursively like digits but with the termination condition given by an external parameter. To iterate n times a biparser pv: if , there is nothing to do and we return the empty list; otherwise for n>0, we run pv once to get the head v, and recursively iterate n-1 times to get the tail vs.

Note that although not reflected in its type, replicateBiparser n pv expects, as a printer, a list l of length n: if , there is nothing to print; if n>0, 'upon' head extracts the head of l to print it with pv, and 'upon' tail extracts its tail, of length n-1, to print it recursively.

(akin to replicateM from Haskell's standard library). We can now fulfil our task:

```
string :: Biparser String String
string = int 'upon' length >>= λn → replicateBiparser n char
```
Interestingly, if we erase applications of upon, i.e., we substitute every expression of the form py 'upon' f with py and ignore the second parameter of the types, we obtain what is essentially the definition of a parser in an idiomatic style for monadic parsing. This is because 'upon' f is the identity on the parser component of Biparser. Thus the biparser code closely resembles standard, idiomatic monadic parser code but with "annotations" via upon expressing how to apply the backwards direction of printing to subparts of the parsed string.

Despite its simplicity, the syntax of length-prefixed strings is notably contextsensitive. Thus the example makes crucial use of the monadic interface for bidirectional programming: a value (the length) must first be extracted to dynamically delimit the string that is parsed next. Context-sensitivity is standard for parser combinators in contrast with parser generators, e.g., Yacc, and applicative parsers, which are mostly restricted to context-free languages. By our monadic BX approach, we can now bring this power to bear on *bidirectional* parsing.

# 3 A Unifying Structure: Monadic Profunctors

The biparser examples of the previous section were enabled by both the monadic structure of Biparser and the comap operation (also called upon, with flipped arguments). We describe a type as being a *monadic profunctor* when it has both a monadic structure and a comap operation (subject to some equations). The notion of a monadic profunctor is general, but it characterises a key class of structures for bidirectional programs, which we explain here. Furthermore, we show a construction of monadic profunctors from pairs of monads which elicits the necessary structure for monadic bidirectional programming in the style of the previous section.

*Profunctors.* In Sect. 2.1, biparsers were defined by a data type with two type parameters (Biparser u v) which is functorial and monadic in the second parameter and *contravariantly* functorial in the first parameter (provided by the comap operation). In standard terminology, a two-parameter type p which is functorial in both its type parameters is called a *bifunctor*. In Haskell, the term *profunctor* has come to mean any bifunctor which is contravariant in the first type parameter and covariant in the second.<sup>1</sup> This differs slightly from the standard category theory terminology where a profunctor is a bifunctor <sup>F</sup> : <sup>D</sup>op×C →

<sup>1</sup> http://hackage.haskell.org//profunctors/docs/Data-Profunctor.html.

Set. This corresponds to the Haskell community's use of the term "profunctor" if we treat Haskell in an idealised way as the category of sets.

We adopt this programming-oriented terminology, capturing the comap operation via a class Profunctor. In the preceding section, some uses of comap involved a partial function, e.g., comap head. We make the possibility of partiality explicit via the Maybe type, yielding the following definition.

Definition 1. A binary data type is a profunctor if it is a contravariant functor in its first parameter and covariant functor in its second, with the operation:

**class** ForallF Functor <sup>p</sup> <sup>⇒</sup> Profunctor <sup>p</sup> **where** comap :: (u <sup>→</sup> Maybe u') <sup>→</sup> p u' v <sup>→</sup> puv

which should obey two laws:

comap Just = id comap (f >=> g) = comap f ◦ comap g

where (>=>) :: (a <sup>→</sup> Maybe b) <sup>→</sup> (b <sup>→</sup> Maybe c) <sup>→</sup> (a <sup>→</sup> Maybe c) composes partial functions (left-to-right), captured by Kleisli arrows of the Maybe monad.

The constraint ForallF Functor p captures a universally quantified constraint [6]: for all types u then p u has an instance of the Functor class.<sup>2</sup>

The requirement for comap to take partial functions is in response to the frequent need to restrict the domain of bidirectional transformations. In combinator-based approaches, combinators typically constrain bidirectional programs to be bijections, enforcing domain restrictions by construction. Our more flexible approach requires a way to include such restrictions explicitly, hence comap.

Since the contravariant part of the bifunctor applies to functions of type <sup>u</sup> <sup>→</sup> Maybe u', the categorical analogy here is more precisely a profunctor <sup>F</sup> : <sup>C</sup>*<sup>T</sup>* op ×C → Set where <sup>C</sup>*<sup>T</sup>* is the Kleisli category of the partiality (Maybe) monad.

Definition 2. A monadic profunctor is a profunctor p (in the sense of Definition 1) such that p u is a monad for all u. In terms of type class constraints, this means there is an instance Profunctor p and for all u there is a Monad (p u) instance. Thus, we represent monadic profunctors by the following empty class (which inherits all its methods from its superclasses):

**class** (Profunctor p, ForallF Monad p) <sup>⇒</sup> Profmonad <sup>p</sup>

Monadic profunctors must obey the following laws about the interaction between profunctor and monad operations:

comap f (return y) = return y comap f (py >>= kz) = comap f py >>= (λ y → comap f (kz y))

<sup>2</sup> As of GHC 8.6, the QuantifiedConstraints extension allows universal quantification in constraints, written as forall u. Functor (p u), but for simplicity we use the constraint constructor ForallF from the constraints package: http://hackage.haskell. org/package/constraints.

(for all <sup>f</sup> :: <sup>u</sup> <sup>→</sup> Maybe v, py :: pvy, kz :: <sup>y</sup> <sup>→</sup> pvz). These laws are equivalent to saying that comap lifts (partial) functions into monad morphisms. In Haskell, these laws are obtained *for free* by parametricity [34]. This means that every contravariant functor and monad is in fact a monadic profunctor, thus the following universal instance is lawful:

**instance** (Profunctor p, ForallF Monad p) <sup>⇒</sup> Profmonad <sup>p</sup>

Corollary 1. Biparsers form a monadic profunctor as there is an instance of Monad (P u) and Profunctor p satisfying the requisite laws.

Lastly, we introduce a useful piece of terminology (mentioned in the previous section on biparsers) for describing values of a profunctor of a particular form:

Definition 3. A value p :: P u v of a profunctor P is called *aligned* if u = v.

#### 3.1 Constructing Monadic Profunctors

Our examples (parsers/printers, lenses, and generators/predicates) share monadic profunctors as an abstraction, making it possible to write different kinds of bidirectional transformations monadically. Underlying these definitions of monadic profunctors is a common structure, which we explain here using biparsers, and which will be replayed in Sect. 5 for lenses and Sect. 6 for bigenerators.

There are two simple ways in which a covariant functor m (resp. a monad) gives rise to a profunctor (resp. a monadic profunctor). The first is by constructing a profunctor in which the contravariant parameter is discarded, i.e., puv=mv; the second is as a function type from the contravariant parameter u to m v, i.e., puv=u → m v. These are standard mathematical constructions, and the latter appears in the Haskell profunctors package with the name Star. Our core construction is based on these two ways of creating a profunctor, which we call Fwd and Bwd respectively:

The naming reflects the idea that these two constructions will together capture a bidirectional transformation and are related by domain-specific round-tripping properties in our framework. Both Fwd and Bwd map any functor into a profunctor by the following type class instances:

```
instance Functor m ⇒ Functor (Fwd m u) where
  fmap f (Fwd x) = Fwd (fmap f x)
instance Functor m ⇒ Profunctor (Fwd m) where
  comap f (Fwd x) = Fwd x
instance Functor m ⇒ Functor (Bwd m u) where
  fmap f (Bwd x) = Bwd ((fmap f) ◦ x)
instance (Monad m, MonadPartial m) ⇒ Profunctor (Bwd m) where
  comap f (Bwd x) = Bwd ((toFailure ◦ f) >=> x)
```
There is an additional constraint here for Bwd, enforcing that the monad m is a member of the MonadPartial class which we define as:

**class** MonadPartial m **where** toFailure :: Maybe a <sup>→</sup> m a

This provides an interface for monads which can internalise a notion of failure, as captured at the top-level by Maybe in comap.

Furthermore, Fwd and Bwd both map any monad into a monadic profunctor:


The product of two monadic profunctors is also a monadic profunctor. This follows from the fact that the product of two monads is a monad and the product of two contravariant functors is a contravariant functor.

```
data (:*:) p q u v = (:*:) { pfst :: p u v, psnd :: quv}
```

```
instance (Monad (p u), Monad (q u)) ⇒ Monad ((p :*: q) u) where
  return y = return y :*: return y
  py :*: qy >>= kz = (py >>= pfst ◦ kz) :*: (qy >>= psnd ◦ kz)
instance (ForallF Functor (p :*: q), Profunctor p, Profunctor q)
      ⇒ Profunctor (p :*: q) where
  comap f (py :*: qy) = comap f py :*: comap f qy
```
#### 3.2 Deriving Biparsers as Monadic Profunctor Pairs

We can redefine biparsers in terms of the above data types, their instances, and two standard monads, the state and writer monads:

```
type State sa=s → (a, s)
type WriterT w m a = m (a, w)
type Biparser = Fwd (State String) :*: Bwd (WriterT Maybe String)
```
The backward direction composes the writer monad with the Maybe monad using WriterT (the writer monad transformer, equivalent to composing two monads with a distributive law). Thus the backwards component of Biparser corresponds to printers (which may fail) and the forwards component to parsers:

Bwd (WriterT Maybe String) u v <sup>∼</sup><sup>=</sup> <sup>u</sup> <sup>→</sup> Maybe (v, String) Fwd (State String) u v <sup>∼</sup><sup>=</sup> String <sup>→</sup> (v, String)

For the above code to work in Haskell, the State and WriterT types need to be defined via either a **data** type or **newtype** in order to allow type class instances on partially applied type constructors. We abuse the notation here for simplicity but define smart constructors and deconstructors for the actual implementation:<sup>3</sup>

```
parse :: Biparser u v → (String → (v, String))
print :: Biparser u v → (u → Maybe (v, String))
mkBP :: (String → (v, String)) → (u → Maybe (v, String)) → Biparser u v
```
The monadic profunctor definition for biparsers now comes for free from the constructions in Sect. 3.1 along with the following instance of MonadPartial for the writer monad transformer with the Maybe monad:

```
instance Monoid w ⇒ MonadPartial (WriterT w Maybe) where
  toFailure Nothing = WriterT Nothing
  toFailure (Just a) = WriterT (Just (a, mempty))
```
In a similar manner, we will use this monadic profunctor construction to define monadic bidirectional transformations for lenses (Sect. 5) and bigenerators (Sect. 6).

The example biparsers from Sect. 2.1 can be easily redefined using the structure here. For example, the primitive biparser char becomes:

```
char :: Biparser Char Char
char = mkBP (λ (c : s) → (c, s)) (λ c → Just (c, [c]))
```
*Codec library.* The codec library [8] provides a general type for bidirectional programming isomorphic to our composite type Fwd r :\*: Bwd w:

```
data Codec rwca= Codec { codecIn :: r a, codecOut :: c → wa}
```
Though the original codec library was developed independently, its current form is a result of this work. In particular, we contributed to the package by generalising its original type (codecOut :: <sup>c</sup> <sup>→</sup> w ()) to the one above, and provided Monad and Profunctor instances to support monadic bidirectional programming with codecs.

# 4 Reasoning about Bidirectionality

So far we have seen how the monadic profunctor structure provides a way to define biparsers using familiar operations and syntax: monads and **do**-notation. This structuring allows both the forwards and backwards components of a biparser to be defined simultaneously in a single compact definition.

This section studies the interaction of monadic profunctors with the *roundtripping laws* that relate the two components of a bidirectional program. For every bidirectional transformation we can define dual properties: *backward round tripping* (going backwards-then-forwards) and *forward round tripping* (going forwards-then-backwards). In each BX domain, such properties also capture

<sup>3</sup> *Smart constructors* (and dually *smart deconstructors*) are just functions that hide boilerplate code for constructing and deconstructing data types.

additional domain-specific information flow inherent to the transformations. We use biparsers as the running example. We then apply the same principles to our other examples in Sects. 5 and 6. For brevity, we use Bp as an alias for Biparser.

Definition 4. A biparser p :: Bp u u is *backward round tripping* if for all x :: u and s, s' :: String then (recalling that print p :: <sup>u</sup> <sup>→</sup> Maybe (v, String)):

fmap snd (print p x) = Just s <sup>=</sup><sup>⇒</sup> parse p (s ++ s') = (x, s').

That is, if a biparser p when used as a printer (going backwards) on an input value x produces a string s, then using p as a parser on a string with prefix s and suffix s' yields the original input value x and the remaining input s'.

Note that backward round tripping is defined for *aligned* biparsers (of type Bp u u) since the same value x is used as both the input of the printer (typed by the first type parameter of Bp) and as the expected output of the parser (typed by the second type parameter of Bp).

The dual property is *forward* round tripping: a source string s is parsed (going forwards) into some value x which when printed produces the initial source s:

Definition 5. A biparser p :: Bp u u is *forward round tripping* if for every x :: u and s :: String we have that:

Proposition 1. The biparser char :: Bp Char Char (Sect. 3.2) is both backward and forward round tripping. Proof by expanding definitions and algebraic reasoning.

Note, in some applications, forward round tripping is too strong. Here it requires that every printed value corresponds to at most one source string. This is often not the case as ASTs typically discard formatting and comments so that pretty-printed code is lexically different to the original source. However, different notions of equality enable more reasonable forward round-tripping properties.

Although one can check round-tripping properties of biparsers by expanding their definitions and the underlying monadic profunctor operations, a more scalable approach is provided if a round-tripping property is *compositional* with respect to the monadic profunctor operations, i.e., if these operations preserve the property. Compositional properties are easier to enforce and check since only the individual atomic components need round-tripping proofs. Such properties are then guaranteed "by construction" for programs built from those components.

#### 4.1 Compositional Properties of Monadic Bidirectional Programming

Let us first formalize compositionality as follows. A *property* R over a monadic profunctor <sup>P</sup> is a family of subsets <sup>R</sup><sup>u</sup> v of P u v indexed by types u and v.

Definition 6. A property R over a monadic profunctor P is *compositional* if the monadic profunctor operations are closed over R, i.e., the following conditions hold for all types u, v, w:

1. For all <sup>x</sup> :: <sup>v</sup>, (return x) ∈ R<sup>u</sup> v (comp-return) 2. For all <sup>p</sup> :: <sup>P</sup> u v and <sup>k</sup> :: <sup>v</sup> <sup>→</sup> <sup>P</sup> u w,

$$\begin{array}{ccccc} \left(\mathfrak{p} \in \mathcal{R}\_{\mathsf{V}}^{\mathsf{U}}\right) \wedge \left(\forall \mathsf{v}. \left(\mathsf{k} \; \mathsf{v}\right) \in \mathcal{R}\_{\mathsf{W}}^{\mathsf{U}}\right) & \implies \left(\mathfrak{p} \; \mathsf{v} \right) \rightleftharpoons \left(\mathsf{p} \; \mathsf{v} \right) \in \mathcal{R}\_{\mathsf{W}}^{\mathsf{U}} \quad \left(\text{comp-bind}\right),\\ \left(\mathsf{p} \; \mathsf{v} \right) \left(\mathsf{p} \; \mathsf{v} \right) \left(\mathsf{p} \; \mathsf{v} \right) \in \left(\mathsf{p} \; \mathsf{v}\right) \end{array}$$

$$\begin{array}{rcl} \text{3. For all } \mathfrak{p} \text{ :: } \mathbb{P} \text{ \tiny u" } \text{ \tiny v} \text{ and } \mathfrak{f} \text{ :: } \mathfrak{u} \to \mathfrak{Maybe} \text{ \tiny u",} \end{array}$$

$$\mathfrak{p} \in \mathscr{R}\_{\mathsf{V}}^{\mathsf{u}"} \implies \text{ (commap } \mathsf{f} \text{ р) } \in \mathscr{R}\_{\mathsf{V}}^{\mathsf{u}} \tag{comp-comap}$$

Unfortunately for biparsers, forward and backward round tripping as defined above are *not* compositional: return is not backward round tripping and >>= does not preserve forward round tripping. Furthermore, these two properties are restricted to biparsers of type Bp u u (i.e., aligned biparsers) but compositionality requires that the two type parameters of the monadic profunctor can differ in the case of comap and (>>=). This suggests that we need to look for more general properties that capture the full gamut of possible biparsers.

We first focus on backward round tripping. Informally, backward round tripping states that if you print (going backwards) and parse the resulting output (going forwards) then you get back the initial value. However, in a general biparser p :: Bp u v, the input type of the printer u differs from the output type of the parser v, so we cannot compare them. But our intent for printers is that what we actually print is a fragment of u, a fragment which is given as the output of the printer. By thus comparing the outputs of both the parser and printer, we obtain the following variant of backward round tripping:

Definition 7. A biparser p :: Bp u v is *weak backward round tripping* if for all x :: u, y :: v, and s, s' :: String then:

```
print p x = Just (y, s) =⇒ parse p (s ++ s') = (y, s')
```
Removing backward round tripping's restriction to aligned biparsers and using the result y :: v of the printer gives us a property that *is* compositional:

Proposition 2. Weak backward round tripping of biparsers is compositional.

Proposition 3. The primitive biparser char is weak backward round tripping.

Corollary 2. Propositions 2 & 3 imply string is weak backward round tripping.

This property is "weak" as it does not constrain the relationship between the input u of the printer and its output v. In fact, there is no hope for a compositional property to do so: the monadic profunctor combinators do not enforce a relationship between them. However, we can regain compositionality for the stronger backward round-tripping property by combining the weak compositional property with an additional non-compositional property on the relationship between the printer's input and output. This relationship is represented by the function that results from ignoring the printed string, which amounts to removing the main effect of the printer. Thus we call this operation a *purification*:

purify :: forall u v. Bp u v <sup>→</sup> <sup>u</sup> <sup>→</sup> Maybe v purify p u = fmap fst (print p u)

Ultimately, when a biparser is aligned (p :: Bp u u) we want an input to the printer to be returned in its output, i.e, purify p should equal λx → Just x. If this is the case, we recover the original backward round tripping property:

Theorem 1. If p :: P u u is weak backward round tripping, and for all x :: u. purify p x = Just x, then p is backward round tripping.

Thus, for any biparser p, we can get backward round tripping by proving that its atomic subcomponents are weak backward round tripping, and proving that purify p x = Just x. The interesting aspect of the purification condition here is that it renders irrelevant the domain-specific effects of the biparser, i.e., those related to manipulating source strings. This considerably simplifies any proof. Furthermore, the definition of purify is a *monadic profunctor homomorphism* which provides a set of equations that can be used to expedite the reasoning.

Definition 8. A *monadic profunctor homomorphism* between monadic profunctors <sup>P</sup> and <sup>Q</sup> is a polymorphic function proj :: <sup>P</sup> u v <sup>→</sup> <sup>Q</sup> u v such that:

> proj (comap*<sup>P</sup>* f p) ≡ comap*<sup>Q</sup>* f (proj p) proj (p >>=*<sup>P</sup>* k) ≡ (proj p) >>=*<sup>Q</sup>* (λx → proj (k x)) proj (return*<sup>P</sup>* x) ≡ return*<sup>Q</sup>* x

Proposition 4. The purify :: Bp u v <sup>→</sup> <sup>u</sup> <sup>→</sup> Maybe v operation for biparsers (above) is a monadic profunctor homomorphism between Bp and the monadic profunctor PartialFun uv=u → Maybe v.

Corollary 3. (of Theorem 1 with Corollary 2 and Proposition 4) The biparser string is backward round tripping.

*Proof* First prove (in Appendix B [36]) the following properties of biparsers char, int, and replicatedBp :: Int <sup>→</sup> Bp u v <sup>→</sup> Bp [u] [v] (writing proj for purify):

proj char n ≡ Just n (4.1)

proj int n ≡ Just n (4.2)

proj (replicateBp (length xs) p) xs ≡ mapM (proj p) xs (4.3)

From these and the homomorphism properties we can prove proj string = Just:

```
proj string xs
        ≡ proj (comap length int >>= λn → replicateBp n char) xs
  Prop.4 ≡ (comap length (proj int) >>= λn → proj (replicateBp n char)) xs
   (4.2) ≡ (comap length Just >>= λn → proj (replicateBp n char)) xs
  Def.2 ≡ proj (replicateBp (length xs) char) xs
   (4.3) ≡ mapM (proj char) xs
   (4.1) ≡ mapM Just xs
{monad} ≡ Just xs
```
Combining proj string = Just with Corollary 2 (string is weak backward round tripping) enables Theorem 1, proving that string is backward round tripping.

The other two core examples in this paper also permit a definition of purify. We capture the general pattern as follows:

Definition 9. A *purifiable monadic profunctor* is a monadic profunctor P with a homomorphism proj from P to the monadic profunctor of partial functions - → Maybe -. We say that proj p is the *pure projection* of p.

Definition 10. A pure projection proj p :: <sup>u</sup> <sup>→</sup> Maybe v is called the *identity projection* when proj p x = Just x for all x :: u.

Here and in Sects. 5 and 6, identity projections enable compositional roundtripping properties to be derived from more general non-compositional properties, as seen above for backward round tripping of biparsers.

We have neglected forward round tripping, which is not compositional, not even in a weakened form. However, we can generalise compositionality with conditions related to *injectivity*, enabling a generalisation of forward round tripping. We call the generalised meta-property *quasicompositionality*.

#### 4.2 Quasicompositionality for Monadic Profunctors

An injective function <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> is a function for which there exists a left inverse <sup>f</sup> <sup>−</sup><sup>1</sup> : <sup>B</sup> <sup>→</sup> <sup>A</sup>, i.e., where <sup>f</sup> <sup>−</sup><sup>1</sup> ◦ <sup>f</sup> <sup>=</sup> id. We can see this pair of functions as a simple kind of bidirectional program, with a forward round-tripping property (assuming f is the forwards direction). We can lift the notion of injectivity to the monadic profunctor setting and capture forward round-tripping properties that are preserved by the monadic profunctor operations, given some additional injectivity-like restriction. We first formalise the notion of an *injective arrow*.

Informally, an injective arrow <sup>k</sup> :: <sup>v</sup> <sup>→</sup> m w produces an output from which the input can be recalculated:

Definition 11. Let <sup>m</sup> be a monad. A function <sup>k</sup> :: <sup>v</sup> <sup>→</sup> m w is an *injective arrow* if there exists k' :: <sup>w</sup> <sup>→</sup> <sup>v</sup> (the *left arrow inverse* of <sup>k</sup>) such that for all x :: v:

k x >>= λy → return (x, y) ≡ k x >>= λy → return (k' y, y)

Next, we define *quasicompositionality* which extends the compositionality meta-property with the requirement for >>= to be applied to injective arrows:

Definition 12. Let <sup>P</sup> be a monadic profunctor. A property <sup>R</sup><sup>u</sup> v ⊆ P u v indexed by types u and v is *quasicompositional* if the following holds



$$\begin{array}{rcl} \text{3. For all } \mathfrak{p} \text{ :: } \mathbb{P} \text{ uit } \text{ :: } , \text{ :: } \mathfrak{p} \text{ :: } \mathfrak{q} \text{ --- } \mathbb{M} \text{aybe } \mathfrak{u} \text{ ', } \end{array}$$

<sup>p</sup> ∈ Ru' <sup>v</sup> <sup>∧</sup> <sup>=</sup><sup>⇒</sup> (comap f p) ∈ R<sup>u</sup> w (qcomp-comap)

We now formulate a weakening of forward round tripping. As with weak backward round tripping, we rely on the idea that the printer *outputs* both a string and the value that was printed, so that we need to compare the outputs of both the parser and the printer, as opposed to comparing the output of the parser with the input of the printer as in (strong) forward round tripping. If running the parser component of a biparser on a string s01 yields a value y and a remaining string s1, and the printer outputs that same value y along with a string s0, then s0 is the prefix of s01 that was consumed by the parser, i.e., s01 = s0 ++ s1.

Definition 13. A biparser p : Bp u v is *weak forward round tripping* if for all x :: u, y :: v, and s0, s1, s01 :: String then:

parse p s01 = (y, s1) <sup>∧</sup> print p x = Just (y, s0) <sup>=</sup><sup>⇒</sup> s01 = s0 ++ s1

Proposition 5. Weak forward round tripping is quasicompositional.

*Proof.* We sketch the qcomp-bind case, where p = (m >>= k) for some m and k that are weak forward roundtripping. From parse (m >>= k) s01 = (y, s1), it follows that there exists z, s such that parse m s01 = (z, s) and parse (k z) s = (y, s1). Similarly print (m >>= k) x = Just (y, s0) implies there exists z', s0' such that print m x = Just (z', s0') and print (k z') x = Just (y, s1') and s0 = s0' ++ s1'. Because k is an injective arrow, we have z = z' (see appendix). We then use the assumption that m and k are weak forward roundtripping on m and on k a, and deduce that s01 = s0' ++ s and s = s1' ++ s1 therefore s01 = s0 ++ s1.

Proposition 6. The char biparser is weak forward round tripping.

Corollary 4. Propositions 5 and 6 imply that string is weak forward round tripping if we restrict the parser to inputs whose digits do not contain redundant leading zeros.

*Proof.* All of the right operands of >>= in the definition of string are injective arrows, apart from λds → return (read ds) at the end of the auxiliary int biparser. Indeed, the read function is not injective since multiple strings may parse to the same integer: . But the pre-condition to the proposition (no redundant leading zero digits) restricts the input strings so that read is injective. The rest of the proof is a corollary of Propositions 5 and 6.

Thus, quasicompositionality gives us scalable reasoning for weak forward round tripping, which is by construction for biparsers: we just need to prove this property for individual atomic biparsers. Similarly to backward round tripping, we can prove forward round tripping by combining weak forward round tripping with the identity projection property:

Theorem 2. If p :: P u u is weak forward round-tripping, and for all x :: u, purify p x = Just x, then p is forward round tripping.

Corollary 5. The biparser string is forward round tripping by the above theorem (with identity projection shown in the proof of Corollary 3) and Corollary 4.

In summary, for any BX we can consider two round-tripping properties: forwardsthen-backwards and backwards-then-forwards, called just *forward* and *backward* here respectively. Whilst combinator-based approaches can guarantee roundtripping by construction, we have made a trade-off to get greater expressivity in the monadic approach. However, we regain the ability to reason about bidirectional transformations in a manageable, scalable way if round-tripping properties are compositional. Unfortunately, due to the monadic profunctor structuring, this tends not to be the case. Instead, weakened round-tripping properties can be compositional or quasicompositional (adding injectivity). In such cases, we recover the stronger property by proving a simple property on aligned transformations: that the backwards direction faithfully reproduces its input as its output (*identity projection*). Appendix C in our extended manuscript [36] compares this reasoning approach to a proof of backwards round tripping for separately implemented parsers and printers (not using our combined monadic approach).

# 5 Monadic Bidirectional Programming for Lenses

Lenses are a common object of study in bidirectional programming, comprising a pair of functions (get : <sup>S</sup> <sup>→</sup> V, put : <sup>V</sup> <sup>→</sup> <sup>S</sup> <sup>→</sup> <sup>S</sup>) satisfying *well-behaved lens* laws shown in Sect. 1. Previously, when considering the monadic structure of parsers and printers, the starting point was that parsers already have a wellknown monadic structure. The challenge came in finding a reasonable monadic characterisation for printers that was compatible with the parser monad. In the end, this construction was expressed by a product of two monadic profunctors Fwd m and Bwd n for monads m and n. For lenses we are in the same position: the forwards direction (get) is already a monad—the reader monad. The backwards direction put is not a monad since it is contravariant in its parameter; the same situation as printers. We can apply the same approach of "monadisation" used for parsers and printers, giving the following new data type for lenses:

```
data L suv= L { get :: s → v, put :: u → s → (v, s) }
```
The result of put is paired with a covariant parameter v (the result type of get) in the same way as monadic printers. Instead of mapping a view and a source to a source, put now maps values of a different type u, which we call a *pre-view*, along with a source s into a pair of a view v and source s. This definition can be structured as a monadic profunctor via a pair of Fwd and Bwd constructions:

**type** L s=(Fwd (Reader s)) :\*: (Bwd (State s))

Thus by the results of Sect. 3, we now have a monadic profunctor characterisation of lenses that allows us to compose lenses via the monadic interface.

Ideally, get and put should be total, but this is impossible without a way to restrict the domains. In particular, there is the known problem of "duplication" [23], where source data may appear more than once in the view, and a necessary condition for put to be well-behaved is that the duplicates remain equal amid view updates. This problem is inherent to all bidirectional transformations, and bidirectional languages have to rule out inconsistent updates of duplicates either statically [13] or dynamically [23]. To remedy this, we capture both partiality of get and a predicate on sources in put for additional dynamic checking. This is provided by the following Fwd and Bwd monadic profunctors:

Going forwards, *getting* a view v from a source s may fail if there is no view for the current source. Going backwards, *putting* a pre-view u updates some source s (via the state transformer StateT s), but with some further structure returned, provided by WriterT (s → Bool) Maybe (similar to the writer transformer used for biparsers, Sect. 3.2). The Maybe here captures the possibility that put can fail. The WriterT (s → Bool) structure provides a predicate which detects the "duplication" issue mentioned earlier. Informally, the predicate can be used to check that previously modified locations in the source are not modified again. For example, if a lens has a source made up of a bit vector, and a put sets bit i to 1, then the returned predicate will return True for all bit vectors where bit i is 1, and False otherwise. This predicate can then be used to test whether further put operations on the source have modified bit i.

Similarly to biparsers, a pre-view u can be understood as *containing* the view v that is to be merged with the source, and which is returned with the updated source. Ultimately, we wish to form lenses of matching input and output types (i.e. L svv) satisfying the standard lens well-behavedness laws, modulo explicit management of partiality via Maybe and testing for conflicts via the predicate:

```
put l x s = Just ((_, s'), p') ∧ p' s' =⇒ get l s' = Just x (L-PutGet)
      get l s = Just x =⇒ put l x s = Just ((_, s), _) (L-GetPut)
```
L-PutGet and L-GetPut are backward and forward round tripping respectively. Some lenses, such as the later example, are not defined for all views. In that case we may say that the lens is backward/forward round tripping in some subset P ⊆ u when the above properties only hold when x is an element of P.

For every source type s, the lens type L s is automatically a monadic profunctor by its definition as the pairing of Fwd and Bwd (Sect. 3.1), and the following instance of MonadPartial for handling failure and instance of Monoid to satisfy the requirements of the writer monad:

```
instance MonadPartial (StateT s (WriterT (s → Bool) Maybe)) where
 toFailure Nothing = StateT (λ_ → WriterT Nothing)
 toFailure (Just x) = StateT (λs → WriterT (Just ((x , s), mempty)))
instance Monoid (s → Bool) where
 mempty = λ_ → True
 mappend h j = λs0 → h s0 && j s0
```
A simple lens example operates on key-value maps. For keys of type Key and values of type Value, we have the following source type and a simple lens:

The get component of the atKey lens does a lookup of the key k in a map, producing Maybe of a Value. The put component inserts a value for key k. When the key already exists, put overwrites its associated value.

Due to our approach, multiple calls to atKey can be composed monadically, giving a lens that gets/sets multiple key-value pairs at once. The list of keys and the list of values are passed separately, and are expected to be the same length.

We refer interested readers to our implementation [12] for more examples, including further examples involving trees.

*Round tripping.* We apply the reasoning framework of Sect. 4, taking the standard lens laws as the starting point (neither of which are compositional).

We first weaken backward round tripping to be compositional. Informally, the property expresses the idea, that if we put some value x in a source s, resulting in a source s', then what we get from s' is x. However two important changes are needed to adapt to our generalised type of lenses and to ensure compositionality. First, the value x that was put is now to be found in the output of put, whereas there is no way to constrain the input of put because its type v is abstract. Second, by sequentially composing lenses such as in l >>= k, the output source s' of put l will be further modified by put (k x), so this roundtripping property must constrain all potential modifications of s'. In fact, the predicate p ensures exactly that the view get l has not changed and is still x. It is not even necessary to refer to s', which is just one source for which we expect p to be True.

Definition 14. A lens l :: L suv is *weak backward round tripping* if for all <sup>x</sup> :: <sup>u</sup>, <sup>y</sup> :: <sup>v</sup>, for all sources <sup>s</sup>, s', and for all <sup>p</sup> :: <sup>s</sup> <sup>→</sup> Bool, we have:

put l x s = Just ((y, \_), p) <sup>∧</sup> p s' <sup>=</sup><sup>⇒</sup> get l s' = Just y

Theorem 3. Weak backward round tripping is a compositional property.

Again, we complement this weakened version of round tripping with the notion of purification.

Proposition 7. Our lens type L is a *purifiable* monadic profunctor (Definition 9), with a family of pure projections proj s indexed by a source s, defined:

proj :: <sup>s</sup> <sup>→</sup> <sup>L</sup> suv <sup>→</sup> (u <sup>→</sup> Maybe v) proj s = λl u → fmap (fst ◦ fst) (put l u s)

Theorem 4. If a lens l :: L suu is weak backward round tripping and has identity projections on some subset P ⊆ u (i.e., for all s, x then x ∈ P ⇒ proj slx= Just x) then l is also backward round tripping on all x ∈ P.

To demonstrate, we apply this result to atKeys :: [Key] <sup>→</sup> <sup>L</sup> Src [Value] [Value].

Proposition 8. The lens atKey k is weak backward round tripping.

Proposition 9. The lens atKey k has identity projection: proj z (atKey k)=Just.

Our lens atKeys ks is therefore weak backward round tripping by construction. We now interpret/purify atKeys ks as a partial function, which is actually the identity function when restricted to lists of the same length as ks.

Proposition 10. For all vs :: [Value] such that length vs = length ks, and for all s :: Src then proj s (atKeys ks) vs = Just vs.

Corollary 6. By the above results, atKeys ks :: L Src [Value] [Value] for all ks is backward round tripping on lists of length length ks.

The other direction, forward round tripping, follows a similar story. We first restate it as a quasicompositional property.

Definition 15. A lens l :: L suv is *weak forward round tripping* if for all <sup>x</sup> :: <sup>u</sup>, <sup>y</sup> :: <sup>v</sup>, for all sources <sup>s</sup>, s', and for all <sup>p</sup> :: <sup>s</sup> <sup>→</sup> Bool, we have:

get l s = Just y <sup>∧</sup> put l x s = Just ((y, s'), \_) <sup>=</sup><sup>⇒</sup> s = s'

Theorem 5. Weak forward round tripping is a quasicompositional property.

Along with identity projection, this gives the original forward L-GetPut property.

Theorem 6 If a lens l is weak forward round tripping and has identity projections on some subset P (i.e., for all s, x then x ∈ P ⇒ proj slx= Just x) then l is also forward round tripping on P.

We can thus apply this result to our example (details omitted).

Proposition 11. For all ks, the lens atKeys ks :: L Src [Value] [Value] is forward round tripping on lists of length length ks.

# 6 Monadic Bidirectional Programming for Generators

Lastly, we capture the novel notion of *bidirectional generators* (*bigenerators*) extending random generators in property-based testing frameworks like *QuickCheck* [10] to a bidirectional setting. The forwards direction generates values conforming to a specification; the backwards direction checks whether values conform to a predicate. We capture the two together via our monadic profunctor pair as:

The forwards direction of a bigenerator is a generator, while the backwards direction is a partial function u → Maybe v. A value G u v represents a subset of v, where generate is a generator of values in that subset and check maps pre-views u to members of the generated subset. In the backwards direction, check g defines a predicate on u, which is true if and only if check g u is Just of some value. The function toPredicate extracts this predicate from the backward direction:

```
toPredicate :: G u v → u → Bool
toPredicate g x = case check g x of Just _ → True; Nothing → False
```
The bigenerator type G is automatically a monadic profunctor due to our construction (Sect. 3). Thus, monad and profunctor instances come for free, modulo (un)wrapping of constructors and given a trivial instance of MonadPartial:

**instance** MonadPartial Maybe **where** toFailure = id

Due to space limitations, we refer readers to Appendix E [36] for an example of a compositionally-defined bigenerator that produces binary search trees.

*Round tripping.* A random generator can be interpreted as the set of values it may generate, while a predicate represents the set of values satisfying it. For a bigenerator g, we write x ∈ generate g when x is a possible output of the generator. The generator of a bigenerator g should match its predicate toPredicate g. This requirement equates to round-tripping properties: a bigenerator is *sound* if every value which it can generate satisfies the predicate (forward round tripping); a bigenerator is *complete* if every value which satisfies the predicate can be generated (backward round tripping). Completeness is often more important than soundness in testing because unsound tests can be filtered out by the predicate, but completeness determines the potential adequacy of testing.

Definition 16. A bigenerator g :: G u u is *complete* (backward round tripping) when toPredicate g x = True implies x ∈ generate g.

Definition 17. A bigenerator g :: G u u is *sound* (forward round tripping) if for all <sup>x</sup> :: <sup>u</sup>, <sup>x</sup> <sup>∈</sup> generate g implies that toPredicate g x = True.

Similarly to backward round tripping of biparsers and lenses, completeness can be split into a compositional weak completeness and a purifiable property.

As before, the compositional weakening of completeness relates the forward and backward components by their outputs, which have the same type.

Definition 18. A bigenerator g :: G u v is *weak-complete* when

check g x = Just y <sup>=</sup><sup>⇒</sup> <sup>y</sup> <sup>∈</sup> generate g.

Theorem 7. Weak completeness is compositional.

In a separate step, we connect the input of the backward direction, *i.e.*, the checker, by reasoning directly about its pure projection (via a more general form of identity projection) which is defined to be the checker itself:

Theorem 8. A bigenerator g :: G u u is complete if it is weak-complete and its checker satisfies a pure projection property: check g x = Just x' ⇒ x = x'

Thus to prove completeness of a bigenerator g :: G u u, we first have weakcompleteness by construction, and we can then show that check g is a restriction of the identity function, interpreting all bigenerators simply as partial functions.

Considering the other direction, soundness, there is unfortunately no decomposition into a quasicompositional property and a property on pure projections. To see why, let bool be a random uniform bigenerator of booleans, then consider for example, comap isTrue bool and comap isTrue (return True), where isTrue True = Just True and isTrue False = Nothing. Both satisfy any quasicompositional property satisfied by bool, and both have the same pure projection isTrue, and yet the former is unsound—it can generate False, which is rejected by isTrue—while the latter is sound. This is not a problem in practice, as unsoundness, especially in small scale, is inconsequential in testing. But it does raise an intellectual challenge and an interesting point in the design space, where ease of reasoning has been traded for greater expressivity in the monadic approach.

# 7 Discussion and Related Work

Bidirectional transformations are a widely applicable technique used in many domains [11]. Among language-based solutions, the lens framework is most influential [3,4,13,14,24,29]. Broadly speaking, combinators are used as programming constructs with which complex lenses are created by combining simpler ones. The combinators preserve round tripping, and therefore the resulting programs are correct by construction. A problem with lens languages is that they tend to be disconnected from more general programming. Lenses can only be constructed by very specialised combinators and are not subject to existing abstraction mechanisms. Our approach allows bidirectional transformations to be built using standard components of functional programming, and gives a reasoning framework for studying compositionality of round-tripping properties.

The framework of *applicative lenses* [18] uses a function representation of lenses to lift the point-free restriction of the combinator-based languages, and enables bidirectional programming with explicit recursion and pattern matching. Note that the use of "applicative" in applicative lenses refers to the transitional sense of programming with λ-abstractions and functional applications, which is not directly related to applicative functors. In a subsequent work, the authors developed a language known as HOBiT [20], which went further in featuring proper binding of variables. Despite the success in supporting λ-abstractions and function applications in programming bidirectional transformations, none of the languages have explored advanced patterns such as monadic programming.

The work on *monadic lenses* [1] investigates lenses with effects. For instance, a "put" could require additional input to resolve conflicts. Representing effects with monads helps reformulate the laws of round-tripping. In contrast, we made the type of lenses itself a monad, and showed how they can be composed monadically. Our method is applicable to monadic lenses, yielding what one might call *monadic monadic lenses*: monadically composable lenses with monadic effects. We conjecture that laws for monadic lenses can be adapted to this setting with similar compositionality properties, reusing our reasoning framework.

Other work leverages profunctors for bidirectionality. Notably, a *Profunctor optic* [26] between a source type s and a view type v is a function of type pvv → pss, for an abstract profunctor p. Profunctor optics and our monadic profunctors offer orthogonal composition patterns: profunctor optics can be composed "vertically" using function composition, whereas monadic profunctor composition is "horizontal" providing sequential composition. In both cases, composition in the other direction can only be obtained by breaking the abstraction.

It is folklore in the Haskell community that profunctors can be combined with applicative functors [22]. The pattern is sometimes called a *monoidal* profunctor. The codec library [8] mentioned in Sect. 3 prominently features two applications of this applicative programming style: binary serialisation (a form of parsing/printing) and conversion to and from JSON structures (analogous to lenses above). Opaleye [28], an EDSL of SQL queries for Postgres databases, uses an interface of monoidal profunctors to implement generic operations such as transformations between Haskell datatypes and database queries and responses.

Our framework adapts gracefully to applicative programming, a restricted form of monadic programming. By separating the input type from the output type, we can reuse the existing interface of applicative functors without modification. Besides our generalisation to monads, purification and verifying roundtripping properties via (quasi)compositionality are novel in our framework.

Rendel and Ostermann proposed an interface for programming parsers and printers together [30], but they were unable to reuse the existing structure of Functor, Applicative and Alternative classes (because of the need to handle types that are both covariant and contravariant), and had to reproduce the entire hierarchy separately. In contrast, our approach reuses the standard type class hierarchy, further extending the expressive power of bidirectional programming in Haskell. FliPpr [17,19] is an invertible language that generates a parser from a definition of a pretty printer. In this paper, our biparser definitions are more similar to those of parsers than printers. This makes sense as it has been established that many parsers are monadic. Similar to the case of HOBiT, there is no discussion of monadic programming in the FliPpr work.

Previous approaches to unifying random generators and predicates mostly focused on deriving generators from predicates. One general technique evaluates predicates lazily to drive generation (random or enumerative) [7,9], but one loses control over the resulting distribution of generated values. *Luck* [15] is a domainspecific language blending narrowing and constraint solving to specify generators as predicates with user-provided annotations to control the probability distribution. In contrast, our programs can be viewed as generators annotated with left inverses with which to derive predicates. This reversed perspective comes with trade-offs: high-level properties would be more naturally expressed in a declarative language of predicates, whereas it is *a priori* more convenient to implement complex generation strategies in a specialised framework for random generators.

*Conclusions.* This paper advances the expressive power of bidirectional programming; we showed that the classic bidirectional patterns of parsers/printers and lenses can be restructured in terms of *monadic profunctors* to provide sequential composition, with associated reasoning techniques. This opens up a new area in the design of embedded domain-specific languages for BX programming, that does not restrict programmers to stylised interfaces. Our example of bigenerators broadened the scope of BX programming from transformations (converting between two data representations) to non-transformational applications.

To demonstrate the applicability of our approach to real code, we have developed two bidirectional libraries [12], one extending the attoparsec monadic parser combinator library to biparsers and one extending QuickCheck to bigenerators. One area for further work is studying biparsers with *lookahead*. Currently lookahead can be expressed in our extended attoparsec, but understanding its interaction with (quasi)compositional round-tripping is further work.

However, this is not the final word on sequentially composable BX programs. In all three applications, round-tripping properties are similarly split into weak round tripping, which is weaker than the original property but compositional, and purifiable, which is equationally friendly. An open question is whether an underlying structure can be formalised, perhaps based on an adjunction model, that captures bidirectionality even more concretely than monadic profunctors.

Acknowledgments. We thank the anonymous reviewers for their helpful comments. The second author was supported partly by EPSRC grant EP/M026124/1.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Counters in Kappa: Semantics, Simulation, and Static Analysis**

Pierre Boutillier1, Ioana Cristescu2, and J´erˆome Feret3(B)

<sup>1</sup> Harvard Medical School, Boston, USA Pierre Boutillier@hms.harvard.edu <sup>2</sup> Inria Rennes - Bretagne Atlantique, Rennes, France ioana-domnina.cristescu@inria.fr <sup>3</sup> DI-ENS (INRIA/ENS/CNRS/PSL\*), Paris, France ´ feret@ens.fr

**Abstract.** Site-graph rewriting languages, such as Kappa or BNGL, offer parsimonious ways to describe highly combinatorial systems of mechanistic interactions among proteins. These systems may be then simulated efficiently. Yet, the modeling mechanisms that involve counting (a number of phosphorylated sites for instance) require an exponential number of rules in Kappa. In BNGL, updating the set of the potential applications of rules in the current state of the system comes down to the sub-graph isomorphism problem (which is NP-complete).

In this paper, we extend Kappa to deal both parsimoniously and efficiently with counters. We propose a single push-out semantics for Kappa with counters. We show how to compile Kappa with counters into Kappa without counters (without requiring an exponential number of rules). We design a static analysis, based on affine relationships, to identify the meaning of counters and bound their ranges accordingly.

# **1 Introduction**

Site-graph rewriting is a paradigm for modeling mechanistic interactions among proteins. In Kappa [18] and BNGL [3,40], rewriting rules describe how instances of proteins may bind and unbind, and how each protein may activate the interaction sites of each others, by changing their properties. Sophisticated signaling cascades may be described. The long term behavior of such models usually emerges from competition against shared-resources, proteins with multiplephosphorylation sites, scaffolds, separation of scales, and non-linear feedback loops.

It is often desirable to add more structure to states in order to describe generic mechanisms more compactly. In this paper, we consider extending Kappa with counters with numerical values. As opposed to the properties of classical Kappa sites, which offer no structure, counters allow for expressive preconditions (such as the value of a counter is less than 2), but also for generic update functions (such as incrementing or decrementing the current value of a counter by a given value independently of its current value). Without counters, such

**Fig. 1.** Three representations for the phosphorylation of a site. We assume that the rate of phosphorylation of a site in a protein in which exactly *k* sites are already phosphorylated, is equal to the value *f*(*k*). The function *f* is left as a parameter of the model. In (a), we do not use counters. In order to get the number of sites that are already phosphorylated, we have to document the state of all the sites of the protein. In this rule, there are exactly 2 sites already phosphorylated, thus the rate of the rule is equal to *f*(2). In (b), we use a counter to encode the number of sites already phosphorylated. The variable *k*, that is introduced by the notation @*k*, contains the number of sites that are phosphorylated before the application of the rule. Thus, the rate of the rule is equal to *f*(*k*). In the right hand side, the notation +1 indicates that the counter is incremented at each application of the rule. The rule in (b) summarizes exactly 8 rules of the kind of the one in (a) (it defines the phosphorylation of the site *a* regardless of the states of the three other phosphorylation sites). In (c), we abstract away the sites and keep only the counter. The notation @*k* binds the variable *k* to the value of the counter. The left hand side also indicates that the rule may be applied only if the value of the counter is less than or equal to 3 (so that at least one site is not already phosphorylated). The right hand side specifies that the value of the counter is incremented at each application of the rule and that after the application of a rule, the value of the counter is always less than or equal to 4. The rule in (c) stands for 32 rules of the kind of the one in (a) (it depends neither on which site is phosphorylated, nor on the state of the three other sites).

update functions would require one rule per potential value of the counter. This raises efficiency issues for the simulation and also blurs any potential reasoning on the causality of the system.

However adding counters cannot be done without consequences. The efficiency of Kappa simulations mainly relies on two ingredients. Firstly, Kappa graphs are rigid [16,39]: an embedding from a connected site-graph into a sitegraph, when it exists, is fully determined by the image of one node. Thanks to rigidity, searching for the occurrences of a sub-graph into another graph (up-to isomorphism) may be done without backtracking (once a first node has been placed), and embeddings can be described in memory very concisely. Secondly, the representation of the set of potential applications of rules relies on a categorical construction [6] that optimizes sharing among patterns. Yet this construction cannot cope with the more expressive patterns that involve counters. In order to efficiently simulate models with counters, we need an efficient encoding that preserves rigidity and that use classical site-graph patterns.

Let us consider a case study so as to illustrate the need for counters in Kappa. This example is inspired from the behavior of the protein *KaiC* that is involved in the synchronization of the proteins in the circadian clock. We consider one kind of protein with n identified sites that can get phosphorylated. Indeed, n is equal to 6 in the protein *KaiC* . We take n equal to 4 to make graphical representation lighter. We will make n diverge towards the infinity so as to empirically estimate the combinatorial complexity of several encoding schemes.

The rate of phosphorylation/dephosphorylation of each site, depends on the number of sites that are already phosphorylated. In Fig. 1(a), we provide the example of a rule that phosphorylates the site *a* of the protein, assuming that the sites *b* and *c* are already phosphorylated and that the site *d* is not. Proteins are depicted as rectangles. Sites are depicted clockwise from the site *a* to the site *d* starting at the top left corner of the protein. Phosphorylation states are depicted with a black mark when the site is phosphorylated, and with a white mark otherwise. To fully encode this model in Kappa, we would require <sup>n</sup> · <sup>2</sup><sup>n</sup> rules. Indeed, we need to decide whether this is a phosphorylation or a dephosphorylation (2 possibilities), then on which site to apply the transformation (n possibilities), then what the state of the other sites is (2<sup>n</sup>−<sup>1</sup> possibilities). This combinatorial complexity may be reduced by the means of counters. We consider a fresh site (this site is depicted on the right of the protein) and we assume that this site takes numerical values. Writing each rule carefully, we can enforce that the value of this site is always equal to the number of the sites that are phosphorylated in the protein instance. Thanks to this invariant, describing our model requires 2·n rules according to whether we describe a phosphorylation or a dephosphorylation (2 possibilities) and to which site the transformation is applied (n possibilities). An example of rule for the phosphorylation of the site *a* is given in Fig. 1(b). The notation @k assigns the value of the counter before the application of the rule to the variable k. Then the rate of the rule may depend on the value of k. This way, we can make the rate of phosphorylation depend on the number of sites already phosphorylated in the protein. Since there are only n sites that may be phosphorylated, it is straightforward to see that the counter may range only between the values 0 and n.

If only the number of phosphorylated sites matters, we can go even further: we need just one counter and two rules, one for phosphorylating a new site (e. g. see Fig. 1(c)) and one for dephosphorylating it. The value of the counter is no longer related explicitly to a number of phosphorylated sites, thus we need another way to specify that the value of the counter is bounded. We do this, by specifying in the precondition of the rule that the phosphorylation rule may be applied only if the value of the counter is less or equal to n − 1, which entails that the value of the counter may range only between the values 0 and n.

Not only parsimonious description of the mechanistic interactions in a model eases the process of writing a model, enhances readability and leads to more efficient simulation, but also it may provide better grain of observation of the system behavior. In Fig. 2, we illustrate this by looking at three causal traces that denote the same execution, but for three different encodings. Intuitively, causal traces [14,15] are inspired by event structures [43]. They describe sets of traces seen up to permutation of concurrent computation steps. The level of representation for the potential configurations of each protein impacts the way causality is defined, because what is tested in rules depends on the representation level. In our case study, the phosphorylation of each site is intuitively causally independent: one site may be phosphorylated whatever the state of the other sites is. Without counters, the only way to specify that the rate of phosphorylation depends on the number of the sites that are already phosphorylated, is to detail the state of every site of the protein in the precondition of the rule. This induces spurious causal relations (e. g. see Fig. 2(a)). Utilizing counters relaxes this constraint. However it is important to equip counters with arithmetic. Without arithmetic, a rule may only set the value of a counter to a constant value. Thus for implementing counter increment, rules have to enumerate the potential values of the counter before their applications, and set the value of this counter accordingly. This induces again spurious causal relations (e. g. see Fig. 2(b)). With arithmetic, incrementing counters becomes a generic operation that may be applied independently of the current value of the counter. As a result the phosphorylation of the four sites can be seen as causally independent (e. g. see Fig. 2(c)). This faithfully represents the fact that the phosphorylation of the four sites may happen in arbitrary order.

*Contribution.* Now we describe the main contributions of this paper.

In Sect. 2, we formalize a single push-out (SPO) semantics for Kappa with counters. Having a categorical framework dealing with counters, as opposed to implementing counters as syntactic sugar, is important. Firstly, this semantics will serve as a reference for the formal specification of the behavior of counters. Secondly, the categorical setting of Kappa provides efficient ways to define causality [14,15], symmetries [25], and some sound symbolic reasonings on the behavior of the number of occurrences of patterns [1,26] that are used in model reduction. Including counters in the categorical semantics of Kappa allows for extending the definition of these concepts to Kappa with counters for free.

Yet different encodings of counters may be necessary to extend other tools for Kappa. In Sect. 3, we propose a couple of translations from Kappa with counters into Kappa without counters. The goal is to simulate models with counters efficiently without modifying the implementation of the Kappa simulator, KaSim [17]. The first encoding requires counters to be bounded from below and it supports only two kinds of preconditions over counters: a rule may require the value of a counter to be equal to a given value, or to be greater than a given value. Requiring the value of a counter to be less than a given value is not supported. The second encoding supports equality and inequality (in both directions) tests. But it requires the value of each counter to be bounded also from above.

Static analysis is needed not only to prove these requirements, but also to retrieve the meaning of counters. In Sect. 4, we introduce a generic abstract interpretation framework [9] to infer the properties of reachable states of a model. This framework is parametric with respect to a class of properties. In Sect. 5, we instantiate this framework with a relational numerical analysis aiming at relating the value of each counter to its interpretation with respect to the state of the other sites. This is used to detect and prove bounds on the range of counters.

**Fig. 2.** Three causal traces. Each causal trace is made of a set of partially ordered computation steps. Roughly speaking, a computation step precedes another one, if the former is necessary to perform the later. Each computation step is denoted as an arrow labeled with the rule that implements it. In (a), counters are not used. Every rule tests the full configuration of the protein. At this level of representation, the *k*th phosphorylation causally precedes the *k* + 1-th one, whatever the order in which the sites have been phosphorylated. In (b), an additional site is used to record the number of phosphorylated sites in its internal state. With this encoding, the number of phosphorylated sites cannot be incremented without testing explicitly the internal state of the additional site. As a consequence, here again, at this level of representation, each phosphorylation causally depends on the previous one. In (c), we use the expressiveness of arithmetic. We use generic rules to increment the counter regardless of its current value. Hence, at this level of representation, the phosphorylation of the four sites become independent, which flatten the causal trace.

*Related Works.* Many modeling languages support arbitrary data-types. In Spatial-Kappa [41], counters encode the discrete position of agents. More generally, in Chromar [29] and in colored Petri nets [30,35], agents may be tagged with values in arbitrary auxiliary programming languages. In ML-Rules [28], agents with attributes continuously diffuse within compartments and collide to interact.

We have different motivations. Our goal is to enrich the state of proteins with some redundant information, so as to reduce the number of rules that are necessary to describe their mechanistic interactions. Also we want to avoid too expressive data-types, which could not be integrated within simulation, causality analysis, and static analysis tools, without altering their performance. For instance, analysis of colored Petri nets usually relies on unfolding them into classical ones. Unfolding rule sets into classical ones does not scale because the number of rules would become intractable. Thus we need tools which deal directly with counters.

An encoding of two-counter machines has been proposed to show that most problems in Kappa are undecidable [19,34]. We represent counters the same way in our first encoding, but we provide atomic implementation for more primitives.

The number of isomorphic classes of connected components that may occur in Kappa models during simulation is usually huge (if not infinite), which prevents from using agent-centric approaches [4]. For instance, one of the first non-toy model written in Kappa was involving more than 10<sup>19</sup> kinds of bio-molecular complexes [16,26]. Kappa follows a rule-centric approach which allows for the description and the execution of models independently from the number of potential complexes. Also, Kappa disallows to describe diffusion of molecules. Instead the state of the system is assumed to satisfy the well-mixed assumption. This provides efficient ways to represent and update the distribution of potential computation steps, along a simulation [6,17].

Equivalent sites [3] or hyperlinks [31] offer promising solutions to extend the decision procedures to extract minimal causal traces in the case of counters, but the rigidity of graphs is lost. Our encodings rely neither on the use of equivalent sites, nor on expanding the rules into more refined and more numerous ones. Hence our encodings preserve the efficiency of the simulation.

Our analysis is based on the use of affine relationships [32]. It relates counter values to the state of the other variables. Such relationships look like the ones that help understanding and proving the correctness of semaphores [20,21]. We use the decision procedure that is described in [23,24] to deduce bounds on the values of counters from the affine relationships. The cost of each atomic computation is cubic with respect to the number of variables. Abstract multi-sets [27,38] may succeed in expressing the properties of interest, but they require a parameter setting a bound on the values that can abstract precisely. In practice, their time-cost is exponential as soon as this bound is not chosen big enough. Our abstraction has an infinite height. It uses widening [11] and reduction [12] to discover the bounds of interest automatically. Octagons [36,37] have a cubic complexity, but they cannot express the properties involving more than two variables which are required in our context. Polyhedra [13] express all the properties needed for an exponential time-cost in practice.

#### **2 Kappa**

In this section, we enrich the syntax and the operational semantics of Kappa so as to cope with counters. We focus on the single push-out (SPO) semantics.

#### **2.1 Signature**

Firstly we define the signature of a model.

**Definition 1 (signature).** *The signature of a model is defined as a tuple* Σ = (Σ*ag*, Σ*site*, Σ*int*, Σ*int ag-st*, Σ*lnk ag-st*, Σ\$ *ag-st*,*Prop*\$, *Update*\$) *where:*


*For every* G ∈ *Prop*\$*, we assume that for every function* f ∈ *Update*\$*, the set* {f(k) | k ∈ G} *belongs to the set Prop*\$*, and that for every element* k ∈ G*, the set* {k} *belongs to the set Prop*\$ *as well.*

Agent types in Σ*ag* denote the agents of interest, the different kinds of proteins for instance. A site identifier in Σ*site* represents an identified locus for a capability of interaction. Each agent type *A* ∈ Σ*ag* is associated with a set of sites Σ*int ag-st*(*A*) with an internal state (i.e. a property), a set of sites Σ*lnk ag-st*(*A*) which may be linked, and a set of sites Σ\$ *ag-st*(*A*) with a counter. We assume without any loss of generality that the three sets Σ*lnk ag-st*(*A*), Σ*int ag-st*(*A*), and Σ\$ *ag-st*(*A*) are disjoint pairwise. The set *Prop*\$ contains the set of valid conditions that may be checked on the value of counters, whereas the set *Update*\$ contains all the possible update functions for the value of counters. We assume that every singleton that is included in a valid condition is a valid condition as well. In this way, a valid condition may be refined to a fully specified value. Additionally, the image of a valid condition is required to be valid, so that the post-condition obtained by applying an update function to a valid precondition, is valid as well.

*Example 1 (running example). We define the signature for our case study as the tuple* (Σ*ag*, Σ*site*, Σ*int*, Σ*int ag-st*, Σ*lnk ag-st*, Σ\$ *ag-st*,*Prop*\$, *Update*\$) *where:*

*1.* Σ*ag* := {*P*}*;*

$$\underset{\frown}{\mathcal{Q}}\\_\Sigma\_{site} := \{a, b, c, d, x\};$$


*The agent type P denotes the only kind of proteins. It has four sites a, b, c, d carrying an internal state and one site x carrying a counter.*

Until the rest of the paper, we assume given a signature Σ.

#### **2.2 Site-Graphs**

Site-graphs describe both patterns and chemical mixtures. Their nodes are typed agents with some sites which may carry internal and binding states, and counters.

**Fig. 3.** Four site-graphs *<sup>G</sup>*1, *<sup>G</sup>*2, *<sup>G</sup>*3, and *<sup>G</sup>*4.

**Definition 2 (site-graph).** *A site-graph is a tuple* G = (A,*type*, S,L, pκ, cκ) *where:*


$$\mathcal{S} \subseteq \{(n, i) \mid n \in \mathcal{A}, i \in \Sigma\_{ag \cdot st}(type(n))\},$$

*4.* L *maps the set:*

$$\{(n, i) \in \mathcal{S} \mid i \in \Sigma\_{ag\cdot st}^{link}(type(n))\}$$

*to the set:*

$$\{(n, i) \in \mathcal{S} \mid i \in \Sigma\_{ag\circ st}^{\text{link}}(type(n))\} \cup \{\neg, -\},$$

*such that:*


For a site-graph G, we write as A<sup>G</sup> its set of agents, *type*<sup>G</sup> its typing function, <sup>S</sup><sup>G</sup> its set of sites, and <sup>L</sup><sup>G</sup> its set of links. Given a site-graph <sup>G</sup>, we write as <sup>S</sup>*lnk* G (resp. <sup>S</sup>*int* <sup>G</sup> , resp. <sup>S</sup>\$ <sup>G</sup>) its set of binding sites (resp. property sites, resp. counters) that is to say the set of the sites (n, i) such that <sup>i</sup> <sup>∈</sup> <sup>Σ</sup>*lnk ag-st*(*type*G(n)) (resp. i ∈ Σ*int ag-st*(*type*G(n)), resp. <sup>i</sup> <sup>∈</sup> <sup>Σ</sup>\$ *ag-st*(*type*G(n))).

Let us consider a binding site (n, i) ∈ S*lnk* <sup>G</sup> . Whenever L<sup>G</sup>(n, i) =, the site (n, i) is free. Various levels of information may be given about the sites that are bound. Whenever L<sup>G</sup>(n, i) = −, the site (n, i) is bound to an unspecified site. Whenever L<sup>G</sup>(n, i)=(n , i ) (and hence L<sup>G</sup>(n , i )=(n, i)), the sites (n, i) and (n , i ) are bound together.

A *chemical mixture* is a site-graph in which the state of each site is fully specified. Formally, a site-graph G is a chemical mixture, if and only if, the three following properties:


3. every counter has a single value (i. e. for every (n, i) <sup>∈</sup> <sup>Σ</sup>\$ *ag-st*, cκG(n, i) is a singleton);

are satisfied.

*Example 2 (running example). In Fig. 3, we give a graphical representation of the four site-graphs,* G1*,* G2*,* G3*, and* G<sup>4</sup> *that are defined as follows:*

*1. (a)* AG<sup>1</sup> = {1}*, (b) type*<sup>G</sup><sup>1</sup> = [1 → P]*, (c)* S<sup>G</sup><sup>1</sup> = {(1, *a*),(1, *x*)}*, (d)* L<sup>G</sup><sup>1</sup> = ∅*, (e)* pκ<sup>G</sup><sup>1</sup> = [(1, a) → ◦]*, (f )* cκ<sup>G</sup><sup>1</sup> = [(1, x) → {<sup>k</sup> <sup>∈</sup> <sup>Z</sup> <sup>|</sup> <sup>k</sup> <sup>≤</sup> <sup>2</sup>}]*; 2. (a)* A<sup>G</sup><sup>2</sup> = {1}*, (b) type*<sup>G</sup><sup>2</sup> = [1 → P]*, (c)* S<sup>G</sup><sup>2</sup> = {(1, *x*)}*, (d)* L<sup>G</sup><sup>2</sup> = ∅*, (e)* pκ<sup>G</sup><sup>2</sup> = []*, (f )* cκ<sup>G</sup><sup>2</sup> = [(1, x) → {<sup>k</sup> <sup>∈</sup> <sup>Z</sup> <sup>|</sup> <sup>k</sup> <sup>≤</sup> <sup>2</sup>}]*; 3. (a)* A<sup>G</sup><sup>3</sup> = {1}*, (b) type*<sup>G</sup><sup>3</sup> = [1 → P]*, (c)* S<sup>G</sup><sup>3</sup> = {(1, *a*),(1, *x*)}*, (d)* L<sup>G</sup><sup>3</sup> = ∅*, (e)* pκ<sup>G</sup><sup>3</sup> = [(1, a) → •]*, (f )* cκ<sup>G</sup><sup>3</sup> = [(1, x) → {<sup>k</sup> <sup>∈</sup> <sup>Z</sup> <sup>|</sup> <sup>k</sup> <sup>≤</sup> <sup>3</sup>}]*; 4. (a)* A<sup>G</sup><sup>4</sup> = {1}*, (b) type*<sup>G</sup><sup>4</sup> = [1 → P]*, (c)* S<sup>G</sup><sup>4</sup> = {(1, *a*),(1, *b*),(1, *c*),(1, *d*),(1, *x*)}*, (d)* L<sup>G</sup><sup>4</sup> = ∅*, (e)* pκ<sup>G</sup><sup>4</sup> = [(1, a) → ◦,(1, b) → •,(1, c) → •,(1, d) → ◦]*, (f )* cκ<sup>G</sup><sup>4</sup> = [(1, x) → {2}]*;*

*The white site on the side of proteins is always the site x. The other sites, starting from the top-left one denote the sites a, b, c, and d clockwise.*

#### **2.3 Sliding Embeddings**

In classical Kappa, two site-graphs may be related by structure-preserving injections, which are called embeddings. Here, we extend their definition to cope with counters. There are two main issues: a rule may require the value of a given counter to belong to a non-singleton set; also updating counters may involve arithmetic computations. The smaller the set of the potential values for a counter is, the more information we have. Thus, embeddings may map the potential values of a given counter into a subset. In order to cope with update functions, we equip embeddings with some arithmetic functions which explain how to get from the value of the counter in the source of the embedding to its value in the target. This way, our embeddings not only define instances of site-graphs, but they also contain the information to compute the values of counters.

**Fig. 4.** Three sliding embeddings from the *<sup>G</sup>*<sup>2</sup> respectively into the site-graphs *<sup>G</sup>*3, *<sup>G</sup>*1, and *G*4. Only the second and the third embeddings are pure.

**Definition 3 (sliding embedding).** *A* sliding embedding *from a site-graph* G *into a site-graph* H *is a pair* (he, h\$) *where* h<sup>e</sup> *is a function of agents* h<sup>e</sup> : A<sup>G</sup> → A<sup>H</sup> *and* h\$ *is a function mapping the counters of the site-graph* <sup>G</sup> *to update functions* <sup>h</sup>\$ : <sup>S</sup>\$ <sup>G</sup> → *Update*\$ *such that for all agent identifiers* m*,* n*,* n ∈ A<sup>G</sup> *and for all site identifiers* i ∈ Σ*ag-st*(*type*G(n))*,* i ∈ Σ*ag-st*(*type*G(n ))*, the following properties are satisfied:*

*1. if* m = n*, then* he(m) = he(n)*;*

$$2. \ type\_G(n) = type\_H(h\_e(n));$$

*3. if* (n, i) ∈ S<sup>G</sup>*, then* (he(n), i) ∈ S<sup>H</sup>*;*


Two sliding embeddings between site-graphs, from E to F, and from F to G respectively, compose to form a sliding embedding from E to G (functions compose pair-wise). A sliding embedding (he, h\$) such that h\$ maps each counter to the identity function is called a *pure embedding*. A pure embedding from E to F is denoted as . Pure embeddings compose. Two site-graphs E and F are isomorphic if and only if there exist a pure embedding from E to F and a pure embedding from F to E. A pure embedding between two isomorphic site-graphs is called an isomorphism. When it exists, the unique pure embedding (he, h\$) from a site-graph E into the site-graph F such that A<sup>E</sup> ⊆ A<sup>F</sup> and he(n) = n for every agent n ∈ A<sup>E</sup>, is called the *inclusion* from E to F and is denoted as iE,F or as . In such a case, we say that the site-graph E is included in the site-graph F. The inclusion from a site-graph into itself always exists and is called an identity embedding.

*Example 3 (running example). We show in Fig. 4 three sliding embeddings from the site-graph* G<sup>2</sup> *respectively into the site-graphs* G3*,* G1*, and* G4*. The first of these three sliding embeddings is assumed to increment the value of the counter of the site x. The last two embeddings are pure.*

Let L, R, and D be three site-graphs, such that R is included in D, and let φ be a sliding embedding from L into D. Then there exist a site graph D that is included in L and a sliding embedding ψ from D to R such that iR,Dψ = φi<sup>D</sup>-,L and such that D is *maximal* (w.r.t. inclusion among site-graphs) for this property. The pair (D , i<sup>D</sup>-,L, ψ) is called the *pull-pack* of the pair (φ, iR,D).

**Fig. 5.** Composition of partial sliding embeddings.

**Fig. 6.** Rule application.

Let L, R, and D be three site-graphs such that D is included in L. A *partial sliding embedding* from L into R is defined as a pair made of the inclusion iD,L and a sliding embedding from D to R. Sliding embeddings may be considered as partial sliding embeddings with the inclusion as the identity embedding. Partial sliding embeddings compose by the means of a pull-back (e.g. see Fig. 5(b)).

#### **2.4 Rules**

Rules represent transformations between site-graphs. For the sake of simplicity, we only use a fragment of Kappa (we assume here that there are no *side effects*). Rules may break and create bonds between pairs of sites, change the properties of sites, update the value of counters. They may also create and remove agents. When an agent is created, all its sites must be fully specified: binding sites may be either free, or bound to a specific site, and the value of counters must be singletons. So as to ensure that there is no side-effect when an agent is removed, we also assume that the binding sites of removed agents are fully specified. These requirements are formalized as follows:

**Definition 4 (rule).** *A* rule *is a partial sliding embedding such that:*

*1. (modified agents) for all agents* n ∈ A<sup>D</sup> *such that* he(n) ∈ A<sup>R</sup> *and for every site identifier* i ∈ Σ*site*(*type*L(n))*,*

	- *(a) the site* (n, i) *belongs to the set* S<sup>R</sup>*;*
	- *(b) if the site* (n, i) *belongs to the set* <sup>S</sup>*lnk* <sup>R</sup> *, then the binding state* L<sup>R</sup>(n, i) *belong to the set* <sup>S</sup>*lnk* <sup>R</sup> ∪ {}*;*
	- *(c) if the site* (n, i) *belongs to the set* <sup>S</sup>\$ <sup>R</sup>*, then* cκR(n, i) *is a singleton.*

In Definition 4, each agent that is *modified* occurs on both hand sides of a rule. Constraint 1a ensures that they document the same sites. Constraint 1b ensures that, if the binding state of a site is modified, then it has to be fully specified (either free, or bound to a specific site) in both hand sides of the rule. Constraint 1c ensures that the post-condition associated to a counter is the direct image of its precondition by its update function. Constraint 2 ensures that the agents that are *removed* have their binding sites fully specified. Constraint 3a ensures that, in the agents that are *created*, all the sites are documented. Beside, constraint 3b requires that the state of their binding site is either free or bound to a specific site. Constraint 3c ensures that their counters have a single value.

An example of a rule is given in Fig. 6(a).

A rule is usually denoted as (leaving the common region and the sliding embedding implicit). Rules are applied to site-graphs via pure embeddings using the *single push-out* construction [22].

**Definition 5 (rule application** [14]**).** *Let* r *be a rule ,* L *be a sitegraph, and* h<sup>L</sup> *be a pure embedding from* L *to* L *. Then, there exists a rule and a pure embedding such that the following properties are satisfied (e. g. see Fig. 6(c)):*


*Moreover, whenever the site-graph* L *is a chemical mixture, the site-graph* R *is a chemical mixture as well.*

We write <sup>L</sup> <sup>r</sup> −→ R for a transition from the state L into the state R via an application of a rule r. Usually transition labels also mention the pure embedding (h<sup>L</sup> here), but we omit it since we do not use it in the rest of the paper.

*Example 4 (running example). An example of rule application is depicted in Fig. 6. We consider the rule* r *that takes a protein with the site a unphosphorylated and a counter with a value at least equal to* 2*, and that phosphorylates the site a while incrementing the counter by* 1 *(e. g. see Fig. 6(a)). Note that the update function of the counter is written next to its post-condition in the right hand side of the rule. We apply the rule to a protein with the sites b and c phosphorylated, the site d unphosphorylated, and the counter equal to* 2 *(e. g. see Fig. 6(b)). The result is a protein with the sites a, b, and c phosphorylated, the site d unphosphorylated and the counter equal to* 3 *(e. g. see Fig. 6(d)).*

*A model* M *over a given signature* Σ is defined as the pair (G0, R) where G<sup>0</sup> is a chemical mixture, representing the initial state, and R is a set of rules. Each rule is associated with a functional rate which maps each potential tuple of values for the counters of the left hand side of the rule to a non negative real number. We write C(M) for the set of states obtained from G<sup>0</sup> by applying a potentially empty sequence of rules in R.

# **3 Encoding Counters**

In this section, we introduce two encodings from Kappa with counters into Kappa without counters. As explained in Sect. 1, our goal is to preserve the rigidity of site-graphs and to avoid the blow-up of the number of rules in the target model. This is mandatory to preserve the good performances of the Kappa simulator. Both encodings rely on syntactic restrictions over the preconditions and the update functions that may be applied to counters and on semantics ones about the potential range of counters. In Sects. 4 and 5, we provide a static analysis to check whether, or not, these semantics assumptions hold.

#### **3.1 Encoding the Value of Counters as Unbounded Chains of Agents**

In this encoding, each counter is bound to a chain of fictitious agents the length of which minus 1 denotes the value of the counter (another encoding not requiring the subtraction is possible but it would require side-effects). Encoding counters as chains of agents has already been used in the implementation of twocounter machines in Kappa [19,34]. We slightly extend these works to implement more atomic operations over counters. We assume that the value of counters is bounded from below. For the sake of simplicity, we assume that counters range in N, but arbitrary lower bounds may be considered by shifting each value accordingly. We denote by Ω<sup>1</sup> the set of the site-graphs that have a counter with a negative value. They are considered as erroneous states, since they may not be encoded with chains of agents.

Only two kinds of guards are handled. A rule may require the value of a counter to be equal to a given number or that the value of a counter is greater than a given number. Rules testing whether a value is less than a given number

**Fig. 7.** Encoding the value of counters as unbounded chains of agents.

require unfolding each such rule into several ones (one per potential value). Also when the rate of a rule depends on the value of some counters, we unfold each rule according to the value of these counters, so that the rate of each unfolded rule is a constant (the Kappa simulator requires all the instances of a given rule in a given simulation state to have the same rate, for efficiency concerns). For update functions, we only consider constant functions and the functions that increase/decrease the value of counters by a fixed value. Testing whether the value of a counter is equal to (resp. greater than) n, can be done by requiring the corresponding chain to contain exactly (resp. at least) n+ 1 agents (e. g. see Figs. 7(b) and (c)). Incrementing (resp. decrementing) the value of a counter is modeled by inserting (resp. removing) agents at the beginning its chain (e. g. see Fig. 7(d), resp. Fig. 7(e)). Setting a counter to a fixed value, requires to detach its full chain in order to create a new one of the appropriate length (e. g. see Fig. 7(f)). In such a case, the former chain remains as a junk. Thus the state of the model must be understood up to insertion of junk agents. We introduce the function *gc*<sup>1</sup> that removes every chain of spurious agents not bound to any counter. We denote as -G g <sup>1</sup> (resp. rr <sup>1</sup>) the encoding of a site-graph G (resp. of a rule r).

#### **3.2 Encoding the Value of Counters as Circular Lists of Agents**

In this second encoding, each counter is bound to a ring of agents. Each such agent has three binding sites *zero*, *pred* , and *next*, and a property site *value* which may be activated, or not. In a ring, agents are connected circularly through their site *pred* and *next*. Exactly one agent per ring is bound to a counter and exactly one agent per ring has the site *value* activated. The value of the counter is encoded by the distance between the agent bound to the counter and the agent that is activated, scanning the agents by following the direction given by the site *next* of each agent (clock-wisely in the graphical representation). We have to consider that counter values are bounded from above and below. Without any loss of generality, we assume that the length of each ring is the same, that is to say that counters range from 0 to <sup>n</sup> <sup>−</sup> 1, for a given <sup>n</sup> <sup>∈</sup> <sup>N</sup>. We denote by <sup>Ω</sup><sup>2</sup> the set of the site-graphs with at least one counter not satisfying these bounds.

**Fig. 8.** Encoding the value of counters as circular lists of agents.

Compared to the first encoding, this one may additionally cope with testing that a counter has a value less than a given constant without having to unfold the rule. Both encodings may deal with the same update functions. Testing whether a counter is equal to a value is done by requiring that the activated agent is at the appropriate distance of the agent that is connected to the counter (e. g. see Fig. 8(b)). It is worth noting that the intermediary agents are required to be not activated. This is not mandatory for the soundness of the encoding, this is an optimization that helps the simulator for detecting early that no embedding may associate a given agent of the left hand side of a rule to a given agent in the current state of the system. Inequalities are handled by checking that enough agents starting from the one that is connected to the counter and in the direction specified by the direction of the inequality, are not activated (e. g. see Fig. 8(c)). Incrementing/decrementing the value of a counter is modeled by making counter glide along the ring (e. g. see Figs. 8(d) and (e)). Special care has to be taken to ensure that the activated agent never crosses the agent linked to the counter (which would cause a numerical wrap-around). Assigning a given value to a counter requires to entirely remove the ring and to replace it with a fresh one (e. g. see Fig. 8(f)). It may be efficiently implemented without memory allocation. As in the first encoding, when the rate of a rule depends on the value of some counters, we unfold each rule according to the value of these counters, so that the rate of each unfolded rule is constant.

We introduce the function *gc*<sup>2</sup> as the identity function over site-graphs (there are no junk agent in this encoding). We denote as -G g <sup>2</sup> (resp. rr <sup>2</sup>) the encoding without counter of a site-graph G (resp. of a rule r).

#### **3.3 Correspondence**

The following theorem states that, whenever there is no numerical overflow and providing that junk agents are neglected, the semantics of Kappa with counters and the semantics of their encodings are in bisimulation.

**Theorem 1 (correspondence).** *Let* i *be either* 1 *or* 2*. Let* G *be a fully specified site-graph such that* G ∈ Ω<sup>i</sup> *and* r *be a rule. Both following properties are satisfied:*


#### **3.4 Benchmarks**

The experimental evaluation of the impact of both encodings to the performance of the simulator KaSim [6,17] is presented in Fig. 9. We focus on the example that has been presented in Sect. 1. We plot the number of events that are simulated per second of CPU. For the sake of comparison, we also provide the simulation efficiency of the simulator NFSim [40] on the models written in BNGL with equivalent sites (with a linear number of rules only).

We notice that, with KaSim, the direct approach (without counter) is the most efficient when there are less than 9 phosphorylation sites. We explain this overhead, by the fact that each encoding utilizes spurious agents that have to be allocated in memory and relies on rules with bigger left hand sides. Nevertheless this overhead is reasonable if we consider the gain in conciseness in the description of the models. The versions of models with counters rely on a linear number of rules, which make models easier to read, document, and update. For more phosphorylation sites, simulation time for models written without counters blow up very quickly, due to the large number of rules. The simulation of the models with counters scales much better for both encodings.

Models can be concisely described in BNGL without using counters, by the means of equivalent sites. Each version of the model uses n indistinguishable sites and only a linear number of rules is required. However, detecting the potential applications of rules in the case of equivalent sites relies on the sub-graph isomorphism problem on general graphs, which prevent the approach to scale to large value of n. We observe that the efficiency of NFSim on this family of examples is not as good as the ones of KaSim (whatever which of the three modeling methods is used). We also observe a very quick deterioration of the performances starting at n equal to 5.

**Simulation efficiency**

**Fig. 9.** Efficiency of the simulation for the example in Sect. <sup>1</sup> with *<sup>n</sup>* ranging between 1 and 14. We test the simulator KaSim with a version of the models written without counters and versions of the models according to both encodings (including the *n* phosphorylation sites). For the sake of comparison, we also compare with the efficiency of the simulator NFSim with the same model but written in BNGL by the means of equivalent sites. For each version of the model and each simulation method, we run 15 simulations of 10<sup>5</sup> events on an initial state made of 100 agents and we plot the number of computation steps computed in average per second of CPU on a log scale. Every simulation has been performed on 4 processors: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40 GHz 126 GB of RAM, running ubuntu 18.04.

#### **4 Generic Abstraction of Reachable States**

So far, we have provided two encodings to compile Kappa with counters into Kappa without counters. These encodings are sound under some assumptions over the range of counters. Now we propose a static analysis not only to check that these conditions are satisfied, but also to infer the meaning of the counters (in our case study, that they are equal to the number of phosphorylated sites).

Firstly, we provide a generic abstraction to capture the properties of the states that a Kappa model may potentially take. Our abstraction is parametric with respect to the class of properties. It will be instantiated in Sect. 5. Our analysis is not complete: not all the properties of the program are discovered; nevertheless, the result is sound: all the properties that are captured, are correct.

#### **4.1 Collecting Semantics**

Let Q be the set of all the site-graphs. We are interested in the set C(M) of all the states that a model M = (G0, R) may take in 0, 1, or more computation steps. This is the collecting semantics [7]. By [33], it may be expressed as the least fixpoint of the <sup>∪</sup>-complete endomorphism <sup>F</sup> on the complete lattice <sup>℘</sup>(Q) that is defined as <sup>F</sup>(X) = {G0}∪{q | ∃<sup>q</sup> <sup>∈</sup> X, r ∈ R such that <sup>q</sup> <sup>r</sup> −→ q }. By [42], the collecting semantics is also equal to the meet of all the post-fixpoints of the function <sup>F</sup> (i. e. <sup>C</sup>(M) = -{<sup>X</sup> <sup>∈</sup> <sup>℘</sup>(Q) <sup>|</sup> <sup>F</sup>(X) <sup>⊆</sup> <sup>X</sup>}), that is to say the strongest inductive invariant of our model that is satisfied by the initial state.

#### **4.2 Generic Abstraction**

The collecting semantics is usually not decidable. We use the Abstract Interpretation framework [9,10] to compute a sound approximation of it.

**Definition 6 (abstraction).** *A tuple* <sup>A</sup> = (Q-, , γ,, <sup>⊥</sup>, <sup>I</sup>-, t-, ∇) *is called an abstraction when all following conditions are satisfied:*

	- *(b)* <sup>∀</sup>(q- <sup>n</sup>)<sup>n</sup>∈<sup>N</sup> ∈ Q- N *, the sequence* (q<sup>∇</sup> <sup>n</sup> )<sup>n</sup>∈<sup>N</sup> *that is defined as* q<sup>∇</sup> <sup>0</sup> <sup>=</sup> <sup>q</sup>- <sup>0</sup> *and* q<sup>∇</sup> <sup>n</sup>+1 = q<sup>∇</sup> <sup>n</sup> <sup>∇</sup>q- <sup>n</sup>+1 *for every integer* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, is ultimately stationary.*

The set <sup>Q</sup> is an abstract domain. It captures the properties of interest, and abstracts away the others. Each property q- ∈ Q is mapped to the set of the concrete states γ(q-) which satisfy this property by the means of the concretization function γ. The pre-order describes the amount of information which is known about the properties that we approximate. We use a pre-order to allow some concrete properties to be described by several unrelated abstract elements. The abstract union is used to gather the information described by a finite number of abstract elements. It may not necessarily compute the least upper bound of a finite set of abstract elements (this least bound may not even exist). The abstract element ⊥ provides the basis for abstract iterations. The concretization function is strict which means that it maps the element ⊥ to the empty set. The abstract property <sup>I</sup> is satisfied by the initial state. The function t is used to mimic concrete rewriting steps in the abstract. The operator ∇ is called a widening. It ensures the convergence of the analysis in finitely many iterations.

Given an abstraction (Q-, , γ,, <sup>⊥</sup>, <sup>I</sup>-, t-, <sup>∇</sup>), the abstract counterpart <sup>F</sup>- to F is defined as F-(q-) = - {q-, I-}∪{t -(q-, r) | r ∈ R} . The function F- satisfies the soundness condition <sup>∀</sup>q- ∈ Q-, [<sup>F</sup> ◦ <sup>γ</sup>](q-) <sup>⊆</sup> [<sup>γ</sup> ◦ <sup>F</sup>-](q-). Following [7], we compute a sound and decidable approximation of our abstract semantics by using the widening operator <sup>∇</sup>. The abstract iteration [10,11] of <sup>F</sup> is defined by the following induction: F<sup>∇</sup> <sup>0</sup> <sup>=</sup> <sup>⊥</sup> and, for each integer <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>F</sup><sup>∇</sup> <sup>n</sup>+1 = F<sup>∇</sup> n whenever F-(F<sup>∇</sup> <sup>n</sup> ) <sup>F</sup><sup>∇</sup> <sup>n</sup> , and F<sup>∇</sup> <sup>n</sup>+1 = F<sup>∇</sup> <sup>n</sup> <sup>∇</sup>F-(F<sup>∇</sup> <sup>n</sup> ) otherwise.

**Theorem 2 (Termination and soundness).** *The abstract iteration is ultimately stationary and its limit* <sup>F</sup><sup>∇</sup> *satisfies* <sup>C</sup>(M) <sup>⊆</sup> <sup>γ</sup>(F∇)*.*

*Proof.* By construction, F-(F∇) <sup>F</sup>∇. Since <sup>γ</sup> is monotonic, it follows that: γ(F-(F∇)) <sup>⊆</sup> <sup>γ</sup>(F∇). Since, <sup>F</sup> ◦ <sup>γ</sup> . <sup>⊆</sup> <sup>γ</sup> ◦ <sup>F</sup>-, <sup>F</sup>(γ(F∇)) <sup>⊆</sup> <sup>γ</sup>(F∇). So <sup>γ</sup>(F∇) is a post-fixpoint of <sup>F</sup>. By [42], we have *lfp* <sup>F</sup> <sup>⊆</sup> <sup>γ</sup>(F∇).

# **4.3 Coalescent Product**

Two abstractions may be combined pair-wise to form a new one. The result is a coalescent product that defines a mutual induction over both abstractions.

**Definition 7 (coalescent product).** *The coalescent product between two abstractions* (Q- <sup>1</sup>, 1, γ1,1, <sup>⊥</sup>1, <sup>I</sup>- 1, t- <sup>1</sup>, <sup>∇</sup>1) *and* (Q- <sup>2</sup>, 2, γ2,2, <sup>⊥</sup>2, <sup>I</sup>- 2, t- <sup>2</sup>, ∇2) *is defined as the tuple* (Q-, , γ,, <sup>⊥</sup>, <sup>I</sup>-, t-, ∇) *where*


**Theorem 3 (Soundness of the coalescent product).** *The coalescent product of two abstractions is an abstraction as well.*

We notice that if either of both abstractions proves that the precondition of a rule is not satisfiable, then this rule is discarded in the other abstraction (hence the term coalescent). By mutual induction, the composite abstraction may detect which rules may be safely discarded along the iterations of the analysis.

We may now define an analysis modularly with respect to the class of considered properties. We use the coalescent product to extend the existing static analyzer KaSa [5] with a new abstraction dedicated to the range of counters.

# **5 Numerical Abstraction**

Now we specialize our generic abstraction to detect and prove safe bounds to the range of counters. In general, this requires to relate the value of the counters to the state of others sites. Our approach consists in translating each protein configuration into a vector of relative numbers and in abstracting each rule by its potential effect on these vectors. We obtain an integer linear programming problem that we will solve by choosing an appropriate abstract domain.

The set of convex parts of <sup>Z</sup> is written as <sup>I</sup>Z. We assume that guards on counters are element of I<sup>Z</sup> and that each update function either set counters to a constant value, or increment/decrement counters by a constant value.

#### **5.1 Encoding States and Preconditions**

We propose to translate each agent into a set of numerical constraints. A protein of type *A* is associated with one variable χ<sup>λ</sup> <sup>i</sup> for each binding site i and each binding state λ, one variable χ<sup>ι</sup> <sup>i</sup> for each property site i and each internal state identifier ι, and one variable *val*<sup>i</sup> for each counter in i.

**Definition 8 (numerical variables).** *Let* A ∈ Σ*ag be an agent type. We define the set* <sup>V</sup>*ar*<sup>A</sup> *as the set of variables* <sup>V</sup>*arlnk* <sup>A</sup> ∪ V*arint* <sup>A</sup> ∪ V*ar*\$ <sup>A</sup> *where:*

1.  $\mathcal{V}ar\_{A}^{\text{lnk}} = \{ \chi\_{i}^{\lambda} \mid i \in \Sigma\_{ag\cdot st}^{\text{lnk}}(A), \lambda \in \{\dashv\} \cup \{ (A', i') \mid A' \in \Sigma\_{ag}, i' \in \Sigma\_{ag\cdot st}^{\text{lnk}}(A') \}\};$  2.  $\mathcal{V}ar\_{A}^{\text{int}} = \{ \chi\_{i}^{\iota} \mid i \in \Sigma\_{ag\cdot st}^{\text{int}}(A), \iota \in \Sigma\_{int} \};$  3.  $\mathcal{V}ar\_{A}^{\text{S}} = \{ val\_{i} \mid i \in \Sigma\_{ag\cdot st}^{\text{S}} \}$ .

Intuitively, variables of the form χ<sup>λ</sup> <sup>i</sup> (resp. χ<sup>ι</sup> <sup>i</sup>) take the value 1 if the binding (resp. internal) state of the site i is λ (resp. ι), whereas the variables of the form *val*<sup>i</sup> takes the value of the counter i.

Each agent of type *A* may be translated into a function mapping each variable in the set <sup>V</sup>*ar*<sup>A</sup> into a subset of the set <sup>Z</sup>. Such a function is called a guard.

**Definition 9 (Encoding of agents).** *Let* G *be a site-graph and* n *be an agent in* A<sup>G</sup>*. We denote by* A *the type type*G(n)*. We define as follows the function guard*G(n) *from the set* V*ar*<sup>A</sup> *into the set* I<sup>Z</sup>*:*


#### *4. guard*G(n)(*val*i) *is equal to the set* cκG(c) *whenever* (n, i) ∈ S\$ <sup>G</sup> *and to the set* Z *otherwise.*

The variable χ <sup>i</sup> takes the value {1} if we know that the site i is free, the value {0} if we know that it is bound, and the value {0, 1} if we do not know whether the site is free or not. This is the same for binding type, the variable χ(A- ,i- ) <sup>i</sup> takes the value {1} if we know that the site is bound to the site i of an agent of type A , the value {0} if we know that this is not the case, and the value {0, 1} otherwise. Property sites work the same way. Lastly, the variable *val*<sup>i</sup> takes as value the set attached to the counter or the value Z if the site is not mentioned in the agent. We notice that when n is a fully-specified agent of type A, the function *guard*G(n) maps every variable in the set V*ar*<sup>A</sup> to a singleton.

*Example 5 (running example).* We provide the translation of the unique agent of the site-graph G<sup>1</sup> (e. g. see Fig. 3(a)) and the one of the unique agent of the site-graph G<sup>4</sup> (e. g. see Fig. 3(d)).

The agent of the site-graph G<sup>1</sup> is translated as follows:

$$\begin{cases} \chi\_a^\circ = \{1\}; \chi\_a^\bullet = \{0\};\\ \chi\_b^\circ = \{0, 1\}; \chi\_b^\bullet = \{0, 1\};\\ \chi\_c^\circ = \{0, 1\}; \chi\_c^\bullet = \{0, 1\};\\ \chi\_d^\circ = \{0, 1\}; \chi\_d^\bullet = \{0, 1\};\\ val\_x = \{z \in \mathbb{Z} \mid z \le 2\} \end{cases}.$$

According to the first two constraints, the site *a* is unphosphorylated. According to the next six ones, the sites *b*, *c*, and *d* have an unspecified state. According to the last constraint, the value of the counter must be less than or equal to 2.

The translation of the agent of the site-graph G<sup>4</sup> is obtained the same way:

$$\begin{cases} \chi\_a^\circ = \{1\}; \chi\_a^\bullet = \{0\}; \\ \chi\_b^\circ = \{0\}; \chi\_b^\bullet = \{1\}; \\ \chi\_c^\circ = \{0\}; \chi\_c^\bullet = \{1\}; \\ \chi\_d^\circ = \{1\}; \chi\_d^\bullet = \{0\}; \\ val\_x = \{2\} \end{cases} \text{J}$$

This means that the sites *b* and *c* are phosphorylated while the sites *a* and *d* are not. According to the last constraint, the value of the counter is equal to 2.

#### **5.2 Encoding Rules**

In Kappa, a rule may be applied only when its precondition is satisfied. Moreover, the application of a rule modifies the state of some sites in agents. We translate each rule into a tuple of guards that encodes its precondition, a set of noninvertible assignments (when a site is given a new state that does not depend on the former one), and a set of invertible assignments (when the new state of a site depends on the previous one). Such a distinction is important as we want to establish relationships among the value of some variables [32]: a noninvertible assignment completely hides the former value of a variable. This is not the case with invertible assignments for which relationships may be propagated more easily. The agents that are created (which have no precondition) and the ones that are removed (which disappear), have a special treatment.

**Definition 10 (Encoding of rules).** *Each rule is associated with the tuple* (*pre*r, *not-invert*r, *invert*r, *new*r) *where:*


*Example 6 (running example).* The encoding of the rule of Fig. 6(a) is given as follows:

– the function *pre*<sup>r</sup> maps the agent 1 to the following set of constraints:

$$\begin{cases} \chi\_a^\circ = \{1\}; \chi\_a^\bullet = \{0\};\\ \chi\_b^\circ = \{0, 1\}; \chi\_b^\bullet = \{0, 1\};\\ \chi\_c^\circ = \{0, 1\}; \chi\_c^\bullet = \{0, 1\};\\ \chi\_d^\circ = \{0, 1\}; \chi\_d^\bullet = \{0, 1\};\\ val\_x = \{z \in \mathbb{Z} \mid z \le 2\} \end{cases};$$


The guard specifies that the site *a* must be unphosphorylated and the value of the counter less or equal to 2. Applying the rule modifies the value of three variables. The site *a* gets phosphorylated. This is a non-invertible modification that sets the variable χ◦ <sup>a</sup> to the constant value 0 and the variable χ• <sup>a</sup> to the constant value 1. The counter *x* is incremented. This is an invertible modification that is encoded by incrementing the value of the variable *val*x.

#### **5.3 Generic Numerical Abstract Domain**

We are now ready to define a generic numerical abstraction.

**Definition 11 (Numerical domain).** *A numerical abstract domain is a family* (A<sup>N</sup> <sup>A</sup> )<sup>A</sup>∈Σ*ag of tuples* (D<sup>N</sup> <sup>A</sup> , <sup>N</sup> <sup>A</sup> , γA,<sup>N</sup> <sup>A</sup> , ⊥<sup>N</sup> <sup>A</sup> , <sup>N</sup> <sup>A</sup> , *g*<sup>N</sup> <sup>A</sup> , *forget*<sup>N</sup> <sup>A</sup> , δ<sup>N</sup> <sup>A</sup> , ∇<sup>N</sup> <sup>A</sup> ) *that satisfy the following conditions, for every agent type* A ∈ Σ*ag:*

	- *(b)* <sup>∀</sup>(ρ- <sup>n</sup>)<sup>n</sup>∈<sup>N</sup> ∈ D<sup>N</sup> A N *, the sequence* (ρ<sup>∇</sup> <sup>n</sup> )<sup>n</sup>∈<sup>N</sup> *that is defined as* ρ<sup>∇</sup> <sup>0</sup> <sup>=</sup> <sup>ρ</sup>- <sup>0</sup> *and* ρ<sup>∇</sup> <sup>n</sup>+1 = ρ<sup>∇</sup> <sup>n</sup> <sup>∇</sup><sup>N</sup> <sup>ρ</sup>- <sup>n</sup>+1 *for every integer* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, is ultimately stationary.*

# **5.4 Numerical Abstraction**

The following theorem explains how to build an abstraction (as defined in Sect. 4) from a numerical abstract domain. We introduce an operator ↑ to extend the domain of functions with default values. Given a function f, a value v and a super-set <sup>X</sup> of the domain of <sup>f</sup>, we write <sup>↑</sup><sup>v</sup> <sup>X</sup> f the extension of the function f that maps each element x ∈ X \ *Dom (f )* to the value v. We also write *set*<sup>A</sup> for the function mapping pairs (f,X-) where f is a partial function from the set <sup>V</sup>*ar*<sup>A</sup> into the set of the convex parts of <sup>Z</sup> and <sup>X</sup> an abstract property in D<sup>N</sup> <sup>A</sup> , to the abstract property: *g*<sup>N</sup> <sup>A</sup> (↑<sup>Z</sup> V*ar<sup>A</sup>* f, *forget*<sup>N</sup> <sup>A</sup> (*dom*(f), X-)). The function *set*<sup>A</sup> forgets all the information about the variables in the domain of the function f, and reassign their range to their image by f in the abstract.

**Theorem 4.** *Let* (D<sup>N</sup> <sup>A</sup> , <sup>N</sup> <sup>A</sup> , γA,<sup>N</sup> <sup>A</sup> , ⊥<sup>N</sup> <sup>A</sup> , <sup>N</sup> <sup>A</sup> , *g*<sup>N</sup> <sup>A</sup> , *forget*<sup>N</sup> <sup>A</sup> , δ<sup>N</sup> <sup>A</sup> , ∇<sup>N</sup> <sup>A</sup> )<sup>A</sup>∈Σ*ag be a numerical abstract domain. The tuple* (Q-, , γ,, <sup>⊥</sup>, <sup>I</sup>-, t-, ∇) *that is defined by:*


$$
\sqcup\_A^{\mathcal{N}}(\{X^\sharp(A)\} \cup \{\operatorname{resh}(r, A) \cup \operatorname{updated}(r, A, X^\sharp)\},
$$

*with:*

	- *set*A(*not-invert*r(n), δ<sup>N</sup> <sup>A</sup> (↑<sup>0</sup> <sup>A</sup> *invert*r(n), *g*<sup>N</sup> <sup>A</sup> (*pre*r(n), X-(A)))) *for each agent* n ∈ A<sup>D</sup> *with type*D(n) = A*;*

*is a generic abstraction.*

Most of the constructions of the abstraction are standard. The expression *g*<sup>N</sup> <sup>A</sup> (*pre*r(n), X-(*type*L(n))) refines the abstract information about the potential configurations of the n-th agent in the left hand side of the rule, by taking into account its precondition. Whenever a bottom element is obtained for at least one agent, the precondition of the rule is not satisfiable and the rule is discarded at this moment of the iteration. Otherwise, the information about each agent is updated. Starting from the result of the refinement of the abstract element by the precondition, the function δ<sup>N</sup> <sup>A</sup> applies the invertible transformations ↑0 <sup>A</sup> *invert*r(n) (the function <sup>↑</sup><sup>0</sup> <sup>A</sup> extends the domain of the function *invert*r(n) by specifying that the variables not in the domain of this function remain unchanged), and the function *set*<sup>A</sup> applies non invertible one *not-invert*r(n).

The domain of intervals [8] and the one of affine relationships [32] provide all the primitives requested by Definition 11. We use a product of them, when all primitives are defined pair-wise, except the guards which refine its output by using the algorithm that is described in [23]. We use widening with thresholds [2] for intervals so as to avoid infinite bounds when possible. This way we obtain a domain, where all operations are cubic with respect to the number of variables.

This is a very good trade-off. A relational domain is required. Other relational domain are either too imprecise [37], or to costly [13], or both [27,38].

#### **5.5 Benchmarks**

We run our analysis on the family of models of Sect. 1 for n ranging between 1 and 25. For each version of the model, the protein is made of n phosphorylation sites and a counter. Moreover, our analysis always discover that the counter ranges between 0 and n. CPU time is plot in Fig. 10.

**Fig. 10.** Efficiency of the static analysis for the example in Sect. <sup>1</sup> with *<sup>n</sup>* ranging between 1 and 25. Every analysis has successfully computed the exact range of the counter. The analysis has been performed on a MacBook Pro on a 2.8 GHz intel Core i7, 16 GB of RAM, running under macOS High Sierra version 10.13.6.

### **6 Conclusion**

When potential protein transformations depend on the number of sites satisfying a given property, counters offer a convenient way to describe generic mechanisms while avoiding the explosion in the number of rules. We have extended the semantics of Kappa to deal with counters. We have proposed some encodings to remove counters while preserving the performance of the Kappa simulator. In particular, graphs remain rigid and the number of rules remain the same. Then, we have introduced a static analysis to bound the range of counters.

It is quite common to find proteins with more than 40 phosphorylation sites. Without our contributions, the modeler has no choice but to assume these proteins to be active only when all their sites are phosphorylated. This is a harsh simplification. Modeling simplifications are usually done not only because detailed knowledge is missing, but also because corresponding models cannot be described, executed, or analyzed efficiently. Yet these simplifications are done without any clue of their impact on the behavior of the systems. By providing ways of describing and handling some complex details, we offer the modelers the means to incorporate these details and to test empirically their impact.

Our framework is fully integrated within the Kappa modeling platform which is open-source and usable online (https://kappalanguage.org). It is worth noting that we have taken two radically different approaches to deal with counters in simulation and in static analysis. Encodings are good for simulation, but they tend to obfuscate the properties of interest, hence damaging drastically the capability of the static analysis to infer useful properties about them. The extension of the categorical semantics provides a parsimonious definition of causality between computation steps, as well as means to reason symbolically on the behavior of the number of occurrences of patterns. For further works, we will extend existing decision procedures [14,15] that compute minimal causal traces to cope with counters. It is very likely that a third approach will be required. We suggest to use the traces obtained by simulation, then translate the counters in these traces thanks to equivalent sites, and apply existing decision procedures the traces that will be obtained this way.

# **References**


in Theoretical Computer Science. An EATCS Series, 2nd edn. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-662-03241-1


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **One Step at a Time A Functional Derivation of Small-Step Evaluators from Big-Step Counterparts**

Ferdinand Vesely1,2(B) and Kathleen Fisher<sup>1</sup>

<sup>1</sup> Tufts University, Medford, USA {fvesely,kfisher}@eecs.tufts.edu <sup>2</sup> Swansea University, Swansea, UK f.vesely@swansea.ac.uk

**Abstract.** Big-step and small-step are two popular flavors of operational semantics. Big-step is often seen as a more natural transcription of informal descriptions, as well as being more convenient for some applications such as interpreter generation or optimization verification. Smallstep allows reasoning about non-terminating computations, concurrency and interactions. It is also generally preferred for reasoning about type systems. Instead of having to manually specify equivalent semantics in both styles for different applications, it would be useful to choose one and derive the other in a systematic or, preferably, automatic way.

Transformations of small-step semantics into big-step have been investigated in various forms by Danvy and others. However, it appears that a corresponding transformation from big-step to small-step semantics has not had the same attention. We present a fully automated transformation that maps big-step evaluators written in direct style to their small-step counterparts. Many of the steps in the transformation, which include CPS-conversion, defunctionalisation, and various continuation manipulations, mirror those used by Danvy and his co-authors. For many standard languages, including those with either call-by-value or call-by-need and those with state, the transformation produces small-step semantics that are close in style to handwritten ones. We evaluate the applicability and correctness of the approach on 20 languages with a range of features.

**Keywords:** Structural operational semantics · Big-step semantics · Small-step semantics · Interpreters · Transformation · Continuation-passing style · Functional programming

# **1 Introduction**

Operational semantics allow language designers to precisely and concisely specify the meaning of programs. Such semantics support formal type soundness proofs [29], give rise (sometimes automatically) to simple interpreters [15,27] and debuggers [14], and document the correct behavior for compilers. There are two popular approaches for defining operational semantics: big-step and smallstep. *Big-step semantics* (also referred to as *natural* or *evaluation* semantics) relate initial program configurations directly to final results in one "big" evaluation step. In contrast, *small-step semantics* relate intermediate configurations consisting of the term currently being evaluated and auxiliary information. The initial configuration corresponds to the entire program, and the final result, if there is one, can be obtained by taking the transitive-reflexive closure of the small-step relation. Thus, computation progresses as a series of "small steps."

The two styles have different strengths and weaknesses, making them suitable for different purposes. For example, big-step semantics naturally correspond to definitional interpreters [23], meaning many big-step semantics can essentially be transliterated into a reasonably efficient interpreter in a functional language. Big-step semantics are also more convenient for verifying program optimizations and compilation – using big-step, semantic preservation can be verified (for terminating programs) by induction on the derivation [20,22].

In contrast, small-step semantics are often better suited for stepping through the evaluation of an example program, and for devising a type system and proving its soundness via the classic syntactic method using progress and preservation proofs [29]. As a result, researchers sometimes develop multiple semantic specifications and then argue for their equivalence [3,20,21]. In an ideal situation, the specifier writes down a single specification and then derives the others.

Approaches to deriving big-step semantics from a small-step variant have been investigated on multiple occasions, starting from semantics specified as either interpreters or rules [4,7,10,12,13]. An obvious question is: what about the reverse direction?

This paper presents a systematic, mechanised transformation from a big-step interpreter into its small-step counterpart. The overall transformation consists of multiple stages performed on an interpreter written in a functional programming language. For the most part, the individual transformations are well known. The key steps in this transformation are to explicitly represent control flow as *continuations*, to defunctionalise these continuations to obtain a datatype of reified continuations, to "tear off" recursive calls to the interpreter, and then to return the reified continuations, which represent the rest of the computation. This process effectively produces a stepping function. The remaining work consists of finding translations from the reified continuations to equivalent terms in the source language. If such a term cannot be found, we introduce a new term constructor. These new constructors correspond to the intermediate auxiliary forms commonly found in handwritten small-step definitions.

We define the transformations on our *evaluator definition language* – an extension of λ-calculus with call-by-value semantics. The language is untyped and, crucially, includes tagged values (variants) and a case analysis construct for building and analysing object language terms. Our algorithm takes as input a big-step interpreter written in this language in the usual style: a main function performing case analysis on a top-level term constructor and recursively calling itself or auxiliary functions. As output, we return the resulting small-step interpreter which we can "pretty-print" as a set of small-step rules in the usual style. Hence our algorithm provides a fully automated path from a restricted class of big-step semantic specifications written as interpreters to corresponding small-step versions.

To evaluate our algorithm, we have applied it to 20 different languages with various features, including languages based on call-by-name and call-by-value λ-calculi, as well as a core imperative language. We extend these base languages with conditionals, loops, and exceptions.

We make the following contributions:


# **2 Overview**

In this section, we provide an overview of the transformation steps on a simple example language. The diagram in Fig. 1 shows the transformation pipeline. As the initial step, we first convert the input big-step evaluator into continuationpassing style (CPS). We limit the conversion to the *eval* function itself and leave all other functions in direct style. The resulting continuations take a value as input and advance the computation. In the generalization step, we modify these continuations so that they take an arbitrary term and evaluate it to a value before continuing as before. With this modification, each continuation handles both the general non-value case and the value case itself. The next stage lifts a carefully chosen set of free variables as arguments to continuations, which allows us to define all of them at the same scope level. After generalization and argument lifting, we can invoke continuations directly to switch control, instead of passing them as arguments to the *eval* function. Next we defunctionalize the continuations, converting them into a set of tagged values together with an *apply* function capturing their meaning. This transformation enables the next step, in which we remove recursive tail-calls to *apply*. This allows us to interrupt the interpreter and make it return a continuation or a term: effectively, it yields a stepping function, which is the essence of a small-step semantics. The remainder of the pipeline converts continuations to terms, performs simplifications, and then converts the CPS evaluator back to direct style to obtain the final small-step interpreter. This interpreter can be pretty-printed as a set of small-step rules.

**Fig. 1.** Transformation overview

Our example language is a λ-calculus with call-by-value semantics. Fig. <sup>2</sup> gives its syntax and big-step rules. We use environments to give meaning to variables. The only values in this language are closures, formed by packaging a λ-abstraction with an environment.

$$\begin{array}{c} x \in Var \\ \hline \\ v \mathrel{\mathop{:=}} \mathsf{cl}(x, e, \rho) \\ e \mathrel{\mathop{:=}} \mathsf{val}(v, e) \\ \end{array} \qquad \begin{array}{c} \rho \in Env = Var \to Val \\ \hline \\ \rho \vdash \mathsf{val}(v) \Downarrow v \\ \end{array} \qquad \begin{array}{c} \rho(x) = v \\ \rho \vdash \mathsf{val}(v) \Downarrow v \\ \rho \vdash \mathsf{len}(x) \Downarrow v \\ \end{array}$$

**Fig. 2.** Example: Call-by-value λ-calculus, abstract syntax and big-step semantics

We will now give a series of interpreters to illustrate the transformation process. We formally define the syntax of the meta-language in which we write these interpreters in Section 3, but we believe for readers familiar with functional programming the language is intuitive enough to not require a full explanation at this point. Shaded text highlights (often small) changes to subsequent interpreters.

*Big-Step Evaluator.* We start with an interpreter corresponding directly to the big-step semantics given in Fig. 2. We represent environments as functions – the empty environment returns an error for any variable. The body of the *eval* function consists of a pattern match on the top-level language term. Function abstractions are evaluated to closures by packaging them with the current environment. The only term that requires recursive calls to *eval* is application: both its arguments are evaluated in the current environment, and then its first argument is pattern-matched against a closure, the body of which is then evaluated to a value in an extended environment using a third recursive call to *eval*.

```
let empty = λx. error() in
let update xvρ = λx-

                          . let xx-
                                   = (== x x-

                                                 ) in if xx-
                                                            then v else (ρ x-

                                                                                 ) in
let rec eval e ρ =
  case e of {
    val(v) → v |
    var(x ) → let v = (ρ x ) in v |
    lam(x , e-

              ) → clo(x , e-

                            , ρ) |
    app(e1 , e2 ) →
         let v1 = (eval e1 ρ) in
         let v2 = (eval e2 ρ) in
         case v1 of {
           clo(x , e-

                    , ρ-

                       ) →
                let ρ-
                      -
                       = (update x v2 ρ-

                                           ) in
                let v = (eval e-
                                 ρ-
                                    -
                                     ) in
                v
         }
  }
```
*CPS Conversion.* Our first transformation introduces a continuation argument to *eval*, capturing the "rest of the computation" [9,26,28]. Instead of returning the resulting value directly, *eval* will pass it to the continuation. For our example we need to introduce three continuations – all of them in the case for **app**. The continuation *kapp*<sup>1</sup> captures what remains to be done after evaluating the first argument of **app**, *kapp*<sup>2</sup> captures the computation remaining after evaluating the second argument, and *kclo*<sup>1</sup> the computation remaining after the closure body is fully evaluated. This final continuation simply applies the top-level continuation to the resulting value and might seem redundant; however, its utility will become apparent in the following step. Note that the CPS conversion is limited to the *eval* function, leaving any other functions in the program intact.

```
let rec eval e ρ k =
  case e of {
    val(v) → (k v) |
    var(x ) → let v = (ρ x ) in (k v) |
    lam(x , e-

              ) → (k clo(x , e-

                               , ρ)) |
    app(e1 , e2 ) →
       letcont kapp1 v1 =
           letcont kapp2 v2 =
             case v1 of {
               clo(x , e-

                        , ρ-

                           ) →
                    let ρ-
                          -
                           = (update x v2 ρ-

                                               ) in
                    letcont kclo1 v = (k v) in
                    (eval e-
                            ρ-
                              -
                                (λv. (kclo1 v)) )
             } in
             (eval e2 ρ (λv2 . (kapp2 v2 )) ) in
        (eval e1 ρ (λv1 . (kapp1 v1 )) )
  }
```
*Generalization.* Next, we modify the continuation definitions so that they handle both the case when the term is a value (the original case) and the case where it is still a term that needs to be evaluated. To achieve this goal, we introduce a case analysis on the input. If the continuation's argument is a value, the evaluation will proceed as before. Otherwise it will call *eval* with itself as the continuation argument. Intuitively, the latter case will correspond to a congruence rule in the resulting small-step semantics and we refer to these as *congruence cases* in the rest of this paper.

```
let rec eval e ρ k = case e of {
    val(v) → (k val(v) ) |
    var(x ) → let v = (ρ x ) in (k val(v) ) |
    lam(x , e-

              ) → (k val(clo(x , e-

                                    , ρ)) ) |
    app(e1 , e2 ) →
         letcont kapp1 e1 =
           case e1 of {
              val(v1 ) →
                 ...
                           case v1 of {
                             clo(x , e-

                                      , ρ-

                                         ) →
                                  let ρ-
                                        -
                                         = (update x v2 ρ-

                                                              ) in
                                  letcont kclo1 e =
                                    case e of {
                                       val(v) → (k val(v)) |
                                       ELSE(e) → (eval e ρ-
                                                               -
                                                                (λe-

                                                                     . (kclo1 e-

                                                                                 )))
                                    } in
                                  (eval e-
                                          ρ-
                                             -
                                              (λv. (kclo1 v)))
                 ...
              ELSE(e1 ) → (eval e1 ρ (λe-

                                            1 . (kapp1 e-

                                                         1 )))
           } in
         (eval e1 ρ (λv1 . (kapp1 v1 )))
  }
```
*Argument Lifting.* The free variables inside each continuation can be divided into those that depend on the top-level term and those that parameterize the evaluation. The former category contains variables dependent on subterms of the top-level term, either by standing for a subterm itself, or by being derived from it. In our example, for *kapp*1, it is the variable <sup>e</sup><sup>2</sup>, i.e., the right argument of **app**, for *kapp*2, the variable <sup>v</sup><sup>1</sup> as the value resulting from evaluating the left argument, and for *kclo*<sup>1</sup> it is the environment obtained by extending the closure's environment by binding the closure variable to the operand value (ρ derived from v<sup>2</sup>). We lift variables that fall into the first category, that is, variables derived from the input term. We leave variables that parametrize the evaluation, such as the input environment or the store, unlifted. The rationale is that, eventually, we want the continuations to act as term constructors and they need to carry information not contained in arguments passed to *eval*.

```
let rec eval e ρ k = case e of {
    ...
    app(e1 , e2 ) →
        letcont kapp1 e2 e1 =
                 ...
                 letcont kapp2 v1 e2 =
                    ...
                                 letcont kclo1 ρ-
                                                    e =
                                   case e of {
                                     val(v) → (k val(v)) |
                                     ELSE(e) → (eval e ρ-
                                                              (λe-

                                                                   . (kclo1 ρ-
                                                                              e-

                                                                                 )))
                                   } in
                                 (eval e-
                                         ρ-
                                            -
                                             (λv. (kclo1 ρ-
                                                            -
                                                              v)))
                          } |
                      ELSE(e2 ) → (eval e2 ρ (λe-

                                                    2 . (kapp2 v1 e-

                                                                    2 )))
                    } in
                 (eval e2 ρ (λv2 . (kapp2 v1 v2 ))) |
             ELSE(e1 ) → (eval e1 ρ (λe-

                                           1 . (kapp1 e2 e-

                                                           1 )))
           } in
        (eval e1 ρ (λv1 . (kapp1 e2 v1 )))
  }
```
*Continuations Switch Control.* Since continuations now handle the full evaluation of their argument themselves, they can be used to switch stages in the evaluation of a term. Observe how in the resulting evaluator below, the evaluation of an **app** term progresses through stages initiated by *kapp*1, *kapp*2, and finally *kclo*1.

```
let rec eval e ρ k = case e of {
    ...
    app(e1 , e2 ) →
        letcont kapp1 e2 e1 =
           ...
                 letcont kapp2 v1 e2 =
                    ...
                    letcont kclo1 ρ-
                                      e =
                      ...
                    in (kclo1 ρ-
                                 -
                                  e-

                                    )
                    ...
                 in (kapp2 v1 e2 ) |
           ...
        in (kapp1 e2 e1 )
  }
```
*Defunctionalization.* In the next step, we defunctionalize continuations. For each continuation, we introduce a constructor with the corresponding number of arguments. The *apply* function gives the meaning of each defunctionalized continuation.

```
let rec apply eval ek ρ k = case ek of {
    kapp1(e2 , e1 ) →
         case e1 of {
           val(v1 ) → (apply eval kapp2(v1 , e2 ) ρ k) |
           ELSE(e1 ) → (eval e1 ρ (λe-

                                          1 . (apply eval kapp1(e2 , e-

                                                                      1 ) ρ k) ))
         } |
    kapp2(v1 , e2 ) →
         case e2 of {
           val(v2 ) →
                case v1 of {
                  clo(x , e-

                           , ρ-

                              ) →
                      let ρ-
                            -
                              = (update x v2 ρ-

                                                  )
                      in (apply eval kclo1(ρ-
                                                -
                                                 , e-

                                                    ) ρ k)
                } |
           ELSE(e2 ) → (eval e2 ρ (λe-

                                          2 . (apply eval kapp2(v1 , e-

                                                                      2 ) ρ k) ))
         } |
    kclo1(ρ-

             , e) →
         case e of {
           val(v) → (k val(v)) |
           ELSE(e) → (eval e ρ-
                                    (λe-

                                         . (apply eval kclo1(ρ-

                                                                 , e-

                                                                    ) ρ k) ))
         }
  } in
let rec eval e ρ k = case e of {
    val(v) → (k val(v)) |
    var(x ) → let v = (ρ x ) in (k val(v)) |
    lam(x , e-

              ) → (k val(clo(x , e-

                                    , ρ))) |
    app(e1 , e2 ) → (apply eval kapp1(e2 , e1 ) ρ k)
  }
```
*Remove Tail-Calls.* We can now move from a recursive evaluator to a stepping function by modifying the continuation arguments passed to *eval* in congruence cases. Instead of calling *apply* on the defunctionalized continuation, we return the defunctionalized continuation itself. Note, that we leave intact those calls to *apply* that switch control between different continuations (e.g., in the definition of *eval*).

```
let rec apply eval ek ρ k = case ek of {
    kapp1(e2 , e1 ) →
         case e1 of {
           val(v1 ) → (apply eval kapp2(v1 , e2 ) ρ k) |
           ELSE(e1 ) → (eval e1 ρ (λe-

                                          1 . (k kapp1(e2 , e-

                                                             1 )) ))
         } |
    kapp2(v1 , e2 ) →
         case e2 of {
           val(v2 ) → ... (apply eval kclo1(ρ-
                                                 -
                                                 , e-

                                                     ) ρ k) |
           ELSE(e2 ) → (eval e2 ρ (λe-

                                          2 . (k kapp2(v1 , e-

                                                             2 )) ))
         } |
    kclo1(ρ-

             , e) →
```

```
case e of {
         val(v) → (k val(v)) |
         ELSE(e) → (eval e ρ-
                                  (λe-

                                       . (k kclo1(ρ-

                                                      , e-

                                                         )) ))
      }
} in ...
```
*Convert Continuations into Terms.* At this point, we have a stepping function that returns either a term or a continuation, but we want a function returning only terms. The most straightforward approach to achieving this goal would be to introduce a term constructor for each defunctionalized continuation constructor. However, many of these continuation constructors can be trivially expressed using constructors already present in the object language. We want to avoid introducing redundant terms, so we aim to reuse existing constructors as much as possible. In our example we observe that **kapp1**(e<sup>2</sup>, e<sup>1</sup>) corresponds to **app**(e<sup>1</sup>, e<sup>2</sup>), while **kapp2**(v<sup>1</sup>, e<sup>2</sup>) to **app**(**val**(v<sup>1</sup>), e<sup>2</sup>). We might also observe that **kclo1**(ρ , e) would correspond to **app**(**clo**(x, e, ρ), val(v<sup>2</sup>)) if <sup>ρ</sup> = update x v<sup>2</sup> <sup>ρ</sup>. Our current implementation doesn't handle such cases, however, and so we introduce **kclo1** as a new term constructor.

```
let rec apply eval ek ρ k = case ek of {
  kapp1(e2 , e1 ) →
       case e1 of {
         val(v1 ) → (apply eval kapp2(v1 , e2 ) ρ k) |
         ELSE(e1 ) → (eval e1 ρ (λe-

                                        1 . (k app(e-

                                                     1 , e2 ) )))
       } |
  kapp2(v1 , e2 ) →
       case e2 of {
         val(v2 ) →
              case v1 of {
                clo(x , e-

                         , ρ-

                            ) → let ρ-
                                        -
                                         = (update x v2 ρ-

                                                              ) in kclo1(ρ-
                                                                            -
                                                                             , e-

                                                                                )
              } |
         ELSE(e2 ) → (eval e2 ρ (λe-

                                        2 . (k app(val(v1 ), e-

                                                              2 ) )))
       } |
  kclo1(ρ-

           , e) →
       case e of {
         val(v) → (k val(v)) |
         ELSE(e) → (eval e ρ-
                                  (λe-

                                        . (k kclo1(ρ-

                                                      , e-

                                                         ) )))
       }
} in
let rec eval e ρ k = case e of {
  ...
  kclo1(ρ-

           , e-

               ) → (apply eval kclo1(ρ-

                                           , e-

                                              ) ρ k)
}
```
*Inlining and Simplification.* Next, we eliminate the *apply* function by inlining its applications and simplifying the result. At this point we have obtained a small-step interpreter in continuation-passing style.

```
let rec eval e ρ k = case e of {
  ...
  app(e1 , e2 ) →
       case e1 of {
         val(v1 ) →
              case e2 of {
                val(v2 ) →
                     case v1 of {
                       clo(x , e-

                                 , ρ-

                                    ) → let ρ-
                                                -
                                                 = (update x v2 ρ-

                                                                      ) in kclo1(ρ-
                                                                                     -
                                                                                     , e-

                                                                                         )
                     } |
                ELSE(e2 ) → (eval e2 ρ (λe-

                                                2 . (k app(val(v1 ), e-

                                                                      2 ))))
              } |
         ELSE(e1 ) → (eval e1 ρ (λe-

                                         1 . (k app(e-

                                                      1 , e2 ))))
       } |
  kclo1(ρ-

           , e-

              ) →
       case e-
               of {
         val(v) → (k val(v)) |
         ELSE(e) → (eval e ρ-
                                   (λe-

                                        . (k kclo1(ρ-

                                                      , e-

                                                          ))))
       }
}
```
*Convert to Direct Style and Remove the Value Case.* The final transformation is to convert our small-step interpreter back to direct style. Moreover, we also remove the value case **val**(*v*) *<sup>→</sup>* **val**(*v*) as we, usually, do not want values to step.

```
let rec eval e ρ = case e of {
  var(x ) → let v = (ρ x ) in val(v) |
  lam(x , e-

            ) → val(clo(x , e-

                               , ρ)) |
  app(e1 , e2 ) →
       case e1 of {
         val(v1 ) →
              case e2 of {
                val(v2 ) →
                     case v1 of {
                       clo(x , e-

                                , ρ-

                                    ) → let ρ-
                                               -
                                                 = (update x v2 ρ-

                                                                      ) in kclo1(ρ-
                                                                                    -
                                                                                     , e-

                                                                                        )
                     } |
                ELSE(e2 ) → let e-

                                     2 = (eval e2 ρ) in app(val(v1 ), e-

                                                                          2 )
              } |
         ELSE(e1 ) → let e-

                              1 = (eval e1 ρ) in app(e-

                                                          1 , e2 )
       } |
  kclo1(ρ-

           , e-

              ) →
       case e-
               of {
         val(v) → val(v) |
         ELSE(e) → let e-
                              = (eval e ρ-

                                            ) in kclo1(ρ-

                                                           , e-

                                                              )
       }
}
```
*Small-Step Evaluator.* Fig. 3 shows the small-step rules corresponding to our last interpreter. Barring the introduction of the **kclo1** constructor, the resulting semantics is essentially identical to one we would write manually.

<sup>1</sup> *<sup>v</sup>* <sup>=</sup> <sup>ρ</sup> *<sup>x</sup>* <sup>ρ</sup> **var**(*<sup>x</sup>* ) **val**(*v*) <sup>2</sup> <sup>ρ</sup> **lam**(*<sup>x</sup>* , *<sup>e</sup>*- ) **val**(**clo**(*<sup>x</sup>* , *<sup>e</sup>*- , ρ)) 3 ρ-- = update *x v<sup>2</sup>* ρ- <sup>ρ</sup> **app**(**val**(**clo**(*<sup>x</sup>* , *<sup>e</sup>*- , ρ- )), **val**(*v<sup>2</sup>* )) **kclo1**(ρ--, *e*- ) 4 ρ e<sup>2</sup> e- 2 <sup>ρ</sup> **app**(**val**(*v<sup>1</sup>* ), e2) **app**(**val**(*v<sup>1</sup>* ), e- 2) 5 ρ e<sup>1</sup> e- 1 <sup>ρ</sup> **app**(e1, *<sup>e</sup><sup>2</sup>* ) **app**(e- <sup>1</sup>, *e<sup>2</sup>* ) 6 <sup>ρ</sup> **kclo1**(ρ- , **val**(*v*)) **val**(*v*) <sup>7</sup> <sup>ρ</sup>- e e- <sup>ρ</sup> **kclo1**(ρ- , e) **kclo1**(ρ- , e- )

**Fig. 3.** Resulting small-step semantics

#### **3 Big-Step Specifications**

We define our transformations on an untyped extended λ-calculus with call-byvalue semantics that allows the straightforward definition of big- and small-step interpreters. We call this language an *evaluator definition language* (EDL).

#### **3.1 Evaluator Definition Language**

Table 1 gives the syntax of EDL. We choose to restrict ourselves to A-normal form, which greatly simplifies our partial CPS conversion without compromising readability. Our language has the usual call-by-value semantics, with arguments being evaluated left-to-right. All of the examples of the previous section were written in this language.

Our language has 3 forms of let-binding constructs: the usual (optionally recursive) **let**, a let-construct for evaluator definition, and a let-construct for defining continuations. The behavior of all three constructs is the same, however, we treat them differently during the transformations. The **leteval** construct also comes with the additional static restriction that it may appear only once (i.e., there can be only one evaluator). The **leteval** and **letcont** forms are recursive by default, while **let** has an optional **rec** specifier to create a recursive binding. For simplicity, our language does not offer implicit mutual recursion, so mutual recursion has to be made explicit by inserting additional arguments. We do this when we generate the *apply* function during defunctionalization.

*Notation and Presentation.* We use vector notation to denote syntactic lists belonging to a particular sort. For example, e and ae are lists of elements of, respectively, *Expr* and *AExpr* , while x is a list of variables. Separators can be spaces (e.g., function arguments) or commas (e.g., constructor arguments or configuration components). We expect the actual separator to be clear from the context. Similarly for lists of expressions: e, ae , etc. In let bindings, f x<sup>1</sup> ... xn <sup>=</sup> <sup>e</sup> and <sup>f</sup> <sup>=</sup> λx<sup>1</sup> ... xn. e are both syntactic sugar for <sup>f</sup> <sup>=</sup> λx<sup>1</sup>. . . . λxn. e.

**Table 1.** Syntax of the evaluator definition language.

*Expr* <sup>e</sup> ::= **let** bn <sup>=</sup> ce **in** <sup>e</sup> (let-binding) <sup>|</sup> **let rec** bn <sup>=</sup> ce **in** <sup>e</sup> (recursive let-binding) <sup>|</sup> **leteval** <sup>x</sup> <sup>=</sup> ce **in** <sup>e</sup> (evaluator definition) <sup>|</sup> **letcont** <sup>k</sup> <sup>=</sup> ce **in** <sup>e</sup> (continuation definition) | ce *CExpr* ce ::= (ae ae . . .) (application) <sup>|</sup> **case** ae **of** { *cas <sup>|</sup>* ... *<sup>|</sup> cas* } (pattern matching) <sup>|</sup> **if** ae **then** <sup>e</sup> **else** <sup>e</sup> (conditional) | ae *AExpr* ae ::= <sup>v</sup> <sup>|</sup> op (value, operator) <sup>|</sup> <sup>x</sup> <sup>|</sup> <sup>k</sup> (variable, continuation variable) <sup>|</sup> *<sup>λ</sup>*bn. <sup>e</sup> (λ-abstraction) <sup>|</sup> <sup>c</sup>(ae, ..., ae) (constructor application) | ae, ..., ae (configuration expression) *Binder* bn ::= <sup>x</sup> | <sup>x</sup>, ..., <sup>x</sup> (variable, configuration) *Case cas* ::= <sup>c</sup>(x, ..., <sup>x</sup>) *<sup>→</sup>* <sup>e</sup> (constructor pattern) <sup>|</sup> **ELSE**(x) *<sup>→</sup>* <sup>e</sup> (default pattern) *Value* <sup>v</sup> ::= <sup>n</sup> <sup>|</sup> <sup>b</sup> <sup>|</sup> <sup>c</sup>(v,...,v) | <sup>v</sup>,...,<sup>v</sup> | **abs**(*λ*x.e, ρ)

# **4 Transformation Steps**

In this section, we formally define each of the transformation steps informally described in Section 2. For each transformation function, we list only the most relevant cases; the remaining cases trivially recurse on the A-normal form (ANF) abstract syntax. We annotate functions with E, *CE*, and *AE* to indicate the corresponding ANF syntactic classes. We omit annotations when a function only operates on a single syntactic class. For readability, we annotate meta-variables to hint at their intended use – ρ stands for read-only entities (such as environments), whereas σ stands for read-write or "state-like" entities of a configuration (e.g., stores or exception states). These can be mixed with our notation for syntactic lists, so, for example, x<sup>σ</sup> is a sequence of variables referring to state-like entities, while ae <sup>ρ</sup> is a sequence of a-expressions corresponding to read-only entities.

#### **4.1 CPS Conversion**

The first stage of the process is a *partial* CPS conversion [8,25] to make control flow in the evaluator explicit. We limit this transformation to the main evaluator function, i.e., only the function *eval* will take an additional continuation argument and will pass results to it. Because our input language is already in ANF, the conversion is relatively easy to express. In particular, applications of the evaluator are always **let**-bound to a variable (or appear in a tail position), which makes constructing the current continuation straightforward. Below are the relevant clauses of the conversion. For this transformation we assume the following easily checkable properties:


The conversion is defined as three mutually recursive functions with the following signatures:

> cps*<sup>E</sup>* : *Expr* <sup>→</sup> (*CExpr* <sup>→</sup> *Expr* ) <sup>→</sup> *Expr* cps*CE* : *CExpr* <sup>→</sup> (*CExpr* <sup>→</sup> *Expr* ) <sup>→</sup> *Expr* cps*AE* : *AExpr* <sup>→</sup> *AExpr*

In the equations, <sup>K</sup>, <sup>I</sup>, <sup>A</sup>k : *CExpr* <sup>→</sup> *Expr* are meta-continuations; <sup>I</sup> injects a *CExpr* into *Expr* .

$$\mathsf{cps}\_E \left[ \begin{matrix} \mathsf{tteval} \ \mathsf{eval} \ \mathsf{eval} \ \mathsf{b} \ \mathsf{n} = e\_1 \ \mathsf{in} \ e\_2 \end{matrix} \right] \mathcal{K} = $$

$$\begin{aligned} \text{label } \overrightarrow{\text{real }} \vec{bn} \; k &= \begin{pmatrix} \mathsf{cps}\_E \left[ e\_1 \right] \; \mathcal{A}\_k \end{pmatrix} \text{ in } \left( \mathsf{cps}\_E \left[ e\_2 \right] \; \mathsf{K} \right) \\ \text{where } k &\text{ is a fresh continuation variable} \end{aligned}$$

cps*<sup>E</sup>* - **let** bn = (*eval* ae<sup>1</sup> ae ) **in** <sup>e</sup> K = **letcont** k bn <sup>=</sup> cps*<sup>E</sup>* - e K **in** cps*CE* - (*eval* ae<sup>1</sup> ae ) Ak where k is a fresh continuation variable

cps*<sup>E</sup>* - **let** bn <sup>=</sup> ce **in** e K = renorm- **let**' bn <sup>=</sup> cps*CE* - ce I **in** cps*<sup>E</sup>* - e K 

cps*CE* - (*eval* ae<sup>1</sup> ae ) K = (*eval* cps*AE* - ae1 cps*AE* - ae (*λ*x. <sup>K</sup> - x )) where x is a fresh variable

$$\mathsf{eps}\_{CE}\left[ae\right]\,\mathsf{K}=\mathsf{K}\left(\mathsf{eps}\_{AE}\left[ae\right]\right).$$

$$\begin{aligned} \mathsf{cps}\_{AE} \left[ \lambda x.e \right] &= \lambda x. \left( \mathsf{cps}\_{E} \left[ e \right] \mathcal{I} \right) \\\\ \mathsf{cps}\_{AE} \left[ ae \right] &= ae \end{aligned} $$

where for any <sup>k</sup>, <sup>A</sup>k is defined as

$$\begin{aligned} \mathcal{A}\_k \begin{bmatrix} ae \\ e \end{bmatrix} &= k \ ae \\ \mathcal{A}\_k \begin{bmatrix} ce \end{bmatrix} &= \textbf{let} \ x = ce \ \textbf{in} \ k \ x \quad \text{where } x \text{ is fresh} \end{aligned}$$

and

$$\begin{aligned} \text{return} \left[ \mathbf{let'} \, x = ce \, \mathbf{in} \, e \right] &= \mathbf{let} \, x = ce \, \mathbf{in} \, e\\ \text{return} \left[ \mathbf{let'} \, x = (\mathbf{let} \, x' = ce \, \mathbf{in} \, e') \, \mathbf{in} \, e \right] &= \\ \mathbf{let} \, x' &= ce \, \mathbf{in} \, \mathbf{rem} \mathbf{m} [\!\mathbf{let'} \, x = e' \, \mathbf{in} \, e] \end{aligned}$$

In the above equations, **let**' is a pseudo-construct used to make renormalization more readable. In essence, it is a non-ANF version of **let** where the bound expression is generalized to *Expr* . Note that renorm only works correctly if x ∈ fv(e), which is implied by our assumption that all bound variables are distinct.

### **4.2 Generalization of Continuations**

The continuations resulting from the above CPS conversion expect to be applied to value terms. The next step is to generalize (or "lift") the continuations so that they recursively call the evaluator to evaluate non-value arguments. In other words, assuming the term type can be factored into values and computations V <sup>+</sup>C, we convert each continuation k with the type V <sup>→</sup> V into a continuation k : V <sup>+</sup> C <sup>→</sup> V using the following schema:

$$\text{let } \mathbf{rec} \ k' \ t = \mathbf{case} \ t \ \mathbf{of} \ nil \ v \to k \ v \mid \ in r \ c \to \text{eval } c \ k'$$

The recursive clauses will correspond to congruence rules in the resulting smallstep semantics.

The transformation works by finding the unique application site of the continuation and then inserting the corresponding call to *eval* in the non-value case.

$$\begin{aligned} \texttt{gencont}\_{E}\left[\texttt{letcont}\ k\left(x,\vec{x}^{\sigma}\right)=e\_{k}\ \texttt{in}\ e\right]&=\\ \texttt{letcont}\ k\left(\hat{x},\vec{x}^{\sigma}\right)&=\\ \texttt{case\ }\hat{x}\ \texttt{of}\ \{\ \\ \texttt{val}(x)\rightarrow e\_{k}\ \vdots\\ \texttt{ELSE}(\hat{x})\rightarrow\texttt{eval}\ \langle\hat{x},\vec{a}\vec{e}^{\sigma}\ \rangle\ a\vec{e}^{\rho}\ \mathit{ae}\_{k}\ \rangle\\ \} \\ \texttt{if}\ \texttt{findApp}\ k\ e&=\texttt{eval}\ \langle\ \Box,\vec{a}\vec{e}^{\sigma}\ \rangle\ a\vec{e}^{\rho}\ \mathit{ae}\_{k}\end{aligned}$$

where


Following the CPS conversion, each named continuation is applied exactly once in e, so findApp k e is total and returns the continuation's unique use site. Moreover, because the continuation was originally defined and let-bound at that use site, all free variables in findApp k e are also free in the definition of k.

When performing this generalization transformation, we also modify tail positions in *eval* that return a value so that they wrap their result in the **val** constructor. That is, if the continuation parameter of *eval* is k, then we rewrite all sites applying k to a configuration as follows:

$$k \nmid ae, \vec{ae}^{\sigma} \nmid \Rightarrow k \nmid \texttt{val}(ae), \vec{ae}^{\sigma} \nmid$$

#### **4.3 Argument Lifting in Continuations**

In the next phase, we partially lift free variables in continuations to make them explicit arguments. We perform a *selective* lifting in that we avoid lifting nonterm arguments to the evaluation function. These arguments represent entities that parameterize the evaluation of a term. If an entity is modified during evaluation, the modified entity variable gets lifted. In the running example of Section 2, such a lifting occurred for *kclo*1.

Function lift specifies the transformation at the continuation definition site:

$$\begin{aligned} \text{left } \Xi \,\,\Delta \,\text{[let} \text{cont} \,\, k \,\,= \lambda x. e\_k \text{ in } e] &= \\ \text{left } \text{cont} \,\, k &= \,\lambda \,\, x\_1 \,\,\dots \,\, x\_n \,\, x. (\text{lift } \Xi' \,\,\Delta' \,\,[e\_k]) \text{ in } (\text{lift } \Xi' \,\,\Delta' \,\,[e]). \end{aligned}$$

where

$$\begin{array}{l} -\Xi' = \Xi \cup \{k\} \\ -\{x\_1, \ldots, x\_n\} = \mathsf{f} \mathsf{v} \, e\_k \cup \left(\bigcup\_{g \in \left(\mathsf{dom}\,\Delta \cap \mathsf{f} \, e\_k\right)} \Delta(g)\right) - \Xi' \\ -\Delta' = \Delta[k \mapsto \{x\_1, \ldots, x\_n\}] \Big|\_{\mathsf{H}} \end{array}$$

and at the continuation application site – recall that continuations are always applied fully, but at this point they are only applied to one argument:

$$\text{lift }\Xi\,\Delta\left[k\,ae\right] = \begin{array}{c} k\,\,x\_1\,\,\dots\,\,x\_n \text{ (lift }\Xi\,\Delta\left[ae'\right])\end{array}$$

if <sup>k</sup> <sup>∈</sup> dom <sup>Δ</sup> and <sup>Δ</sup>(k)=(x<sup>1</sup>,...,xn).

Our lifting function is a restricted version of a standard argument-lifting algorithm [19]. The first restriction is that we do not lift all free variables, since we do not aim to float and lift the continuations to the top-level of the program, only to the top-level of the evaluation function. The other difference is that we can use a simpler way to compute the set of lifted parameters due to the absence of mutual recursion between continuations. The correctness of this can be proved using the approach of Fischbach [16].

### **4.4 Continuations Switch Control Directly**

At this point, continuations handle the full evaluation of a term themselves. Instead of calling *eval* with the continuation as an argument, we can call the continuation directly to switch control between evaluation stages of a term. We will replace original *eval* call sites with direct applications of the corresponding continuations. The recursive call to *eval* in congruence cases of continuations will be left untouched, as this is where the continuation's argument will be evaluated to a value. Following from the continuation generalization transformation, this call to *eval* is with the same arguments as in the original site (which we are now replacing). In particular, the *eval* is invoked with the same ae <sup>ρ</sup> arguments in the continuation body as in the original call site.

$$\begin{aligned} \text{direct} \text{cont}\_E \begin{bmatrix} \text{letcont } k = ce \text{ in } e \end{bmatrix} K &= \\ \text{letcont } k &= \text{direct} \text{cont}\_{CE} \begin{bmatrix} ce \end{bmatrix} K \text{ in } \text{direct} \text{cont}\_E \begin{bmatrix} e \end{bmatrix} (K \uplus \{k\}) \\ \text{direct} \text{cont}\_{CE} \begin{bmatrix} \text{eval } \langle \, ae, \, \vec{a}\vec{e}^{\sigma} \, \rangle \, \vec{a}\vec{e}^{\sigma} & \langle \, ka, \, \vec{a}\vec{e}^{\sigma} \, \rangle \end{bmatrix} K &= k \ \vec{x} \ \langle \, ae, \, \vec{a}\vec{e}^{\sigma} \, \rangle \qquad \text{if } k \in K \end{aligned}$$

#### **4.5 Defunctionalization**

Now we can move towards a first-order representation of continuations which can be further converted into term constructions. We defunctionalize continuations by first collecting all continuations in *eval*, then introducing corresponding constructors (the syntax), and finally generating an *apply* function (the semantics). The collection function accumulates continuation names and their definitions. At the same time it removes the definitions.

$$\mathsf{collect}\_{E}\left[\mathsf{let}\mathsf{cont}\ \begin{aligned} \mathsf{let}\mathsf{cont}\ \begin{aligned} \mathsf{let}\mathsf{cont}\ \begin{aligned} \end{aligned} \ k=ce\ \mathsf{in}\ e\end{aligned} \right] = \left(\left\{(k,ce')\right\}\cup K\_{ce}\cup K\_{e},e'\right) \\ \text{where } (K\_{ce},ce') = \mathsf{collect}\_{CE}\left[ce\right] \\ (K\_{e},e') = \mathsf{collect}\_{E}\left[e\right] \end{aligned}$$

We reuse continuation names for constructors. The *apply* function is generated by simply generating a case analysis on the constructors and reusing the argument names from the continuation function arguments. In addition to the defunctionalized continuations, the generated *apply* function will take the same arguments as *eval*. Because of the absence of mutual recursion in our meta-language, *apply* takes *eval* as an argument.

$$\begin{aligned} \text{genApply } \vec{x}^{\rho} \ \vec{x}^{\sigma} \ k\_{top} \left\{ (k\_1, \lambda p\_{1,1} \ldots p\_{1,i} \cdot e\_1), \ldots, (k\_n, \lambda p\_{n,1} \ldots p\_{n,j} \cdot e\_n) \right\} = \\ \lambda \epsilon \text{val} \left\{ \left. x\_k, \vec{x}^{\sigma} \right\} \ \vec{x}^{\rho} \ k\_{top}. \end{aligned}$$

$$\begin{aligned} \text{case } x\_k \text{ of } \{ \\ \begin{aligned} &k\_1(p\_{1,1}, \ldots, p\_{1,i}) \rightarrow & e\_1 \ ; \\ &\cdots \vdots \\ &k\_n(p\_{n,1}, \ldots, p\_{n,j}) \rightarrow & e\_n \end{aligned}$$

Now we need a way to replace calls to continuations with corresponding calls to *apply*. For ae <sup>ρ</sup> and <sup>k</sup>*top* we use the arguments passed to *eval* or *apply* (depending on where we are replacing).

$$\mathsf{perpAcc}\_{CE}\left[k\ \vec{a}\vec{e}\_k\ \langle ae, \vec{a}\vec{e}^{\sigma}\rangle\right](\vec{x}^{\rho}, k\_{top}) = \text{apply } \text{eval}\left<\!\left(k(\vec{a}\vec{e}\_k, ae), \vec{a}\vec{e}^{\sigma}\right)\ \vec{x}^{\rho}\ \ k\_{top}\right>$$

Finally, the complete defunctionalization is defined in terms of the above three functions.

#### **4.6 Remove Self-recursive Tail-Calls**

This is the transformation which converts a recursive evaluator into a stepping function. The transformation itself is very simple: we simply replace the selfrecursive calls to *apply* in congruence cases.

$$\begin{aligned} \mathsf{derec}\_{CE}\left[\mathsf{eval}\left\,\,a\overline{e}^{\rho}\,\,\left<\lambda\left<\,\,x',\,\,\overline{x}^{\sigma\prime}\right>\,.\,\text{apply } \text{eval}\left<\,c^{\kappa}(a\vec{e},\,\,x'),\,\,\vec{x}^{\sigma\prime}\,\,\right>\,\,a\overline{e}^{\rho\prime}\,\,k\right]\right] &= \\ \text{eval}\left,\,\,k\left<\,c^{\kappa}(a\vec{e},\,\,x'),\,\,\vec{x}^{\sigma\prime}\,\,\right>\right) \end{aligned}$$

Note, that we still leave those invocations of *apply* that serve to switch control through the stages of evaluation. Unless a continuation constructor will become a part of the output language, its application will be inlined in the final phase of our transformation.

#### **4.7 Convert Continuations to Terms**

After defunctionalization, we effectively have two sorts of terms: those constructed using the original constructors and those constructed using continuation constructors. Terms in these two sorts are given their semantics by the *eval* and *apply* functions, respectively. To get only one evaluator function at the end of our transformation process, we will join these two sorts, adding extra continuation constructors as new term constructors. We could simply merge *apply* to *eval*, however, this would give us many overlapping constructors. For example, in Section 2, we established that **kapp1**(e<sup>2</sup>, e<sup>1</sup>) <sup>≈</sup> **app**(e<sup>1</sup>, e<sup>2</sup>) and **kapp2**(v<sup>1</sup>, e<sup>2</sup>) <sup>≈</sup> **app**(**val**(v<sup>1</sup>), e<sup>2</sup>). The inference of equivalent term constructors is guided by the following simple principle. For each continuation term <sup>c</sup><sup>k</sup>(ae<sup>1</sup>, . . . , aen) we are looking for a term <sup>c</sup> (ae <sup>1</sup>, . . . , ae m), such that, for all ae <sup>σ</sup>, ae <sup>ρ</sup> and aek

$$\begin{aligned} \text{apply eval } \langle \, c^{\mathbb{K}}(ae\_1, \dots, ae\_n), \vec{ae}^{\sigma} \, \rangle \, \vec{ae}^{\rho} \, \, ae\_k \\ = \text{eval } \langle \, c^{\prime}(ae\_1^{\prime}, \dots, ae\_m^{\prime}), \vec{ae}^{\sigma} \, \rangle \, \vec{ae}^{\rho} \, \, ae\_k \end{aligned}$$

In our current implementation, we use a conservative approach where, starting from the cases in *eval*, we search for continuations reachable along a control flow path. Variables appearing in the original term are instantiated along the way. Moreover, we collect variables dependent on configuration entities (state). If control flow is split based on information derived from the state, we automatically include any continuation constructors reachable from that point as new constructors in the resulting language and interpreter. This, together with how information flows from the top-level term to subterms in congruence cases, preserves the coupling between state and corresponding subterms between steps.

If, starting from an input term c(x), an invocation of *apply* on a continuation term <sup>c</sup><sup>k</sup>(ae k) is reached, and if, after instantiating the variables in the input term c(ae ), the sets of their free variables are equal, then we can introduce a translation from <sup>c</sup><sup>k</sup>(ae k) into <sup>c</sup>(ae ). If such a direct path is not found, the <sup>c</sup><sup>k</sup> will become a new term constructor in the language and a case in *eval* is introduced such that the above equation is satisfied.

#### **4.8 Inlining, Simplification and Conversion to Direct Style**

To finalize the generation of a small-step interpreter, we inline all invocations of *apply* and simplify the final program. After this, the interpreter will consists of only the *eval* function, still in continuation-passing style. To convert the interpreter to direct style, we simply substitute *eval*'s continuation variable for (*λ*x.x) and reduce the new redexes. Then we remove the continuation argument performing rewrites following the scheme:

$$\text{eval } \vec{a}\vec{e} \text{ (}\lambda bn. \text{ } e) \Rightarrow \mathbf{ let } bn = \text{eval } \vec{a}\vec{e} \text{ in } e.$$

Finally, we remove the reflexive case on values (i.e., **val**(v) *<sup>→</sup>* **val**(v)). At this point we have a small-step interpreter in direct form.

### **4.9 Removing Vacuous Continuations**

After performing the above transformation steps, we may end up with some redundant term constructors, which we call "empty" or vacuous. These are constructors which only have one argument and their semantics is equivalent to the argument itself, save for an extra step which returns the computed value. In other words, they are unary constructs which only have two rules in the resulting small-step semantics matching the following pattern.

$$\begin{array}{cc} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \end{array} \end{\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \end{} \end{\begin} \end{\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \right) \end{} \end{\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \end{bmatrix} \end{$$

Such a construct will result from a continuation, which, even after generalization and argument lifting, merely evaluates its sole argument and returns the corresponding value:

**letcont rec** k<sup>i</sup> e = **case** e **of** { **val**(v) *<sup>→</sup>* k v *<sup>|</sup>* **ELSE**(e) *<sup>→</sup> eval* <sup>e</sup> (*λ*e- . k<sup>i</sup> e- ) }

These continuations can be easily identified and removed once argument lifting is performed, or at any point in the transformation pipeline, up until *apply* is absorbed into *eval*.

### **4.10 Detour: Generating Pretty-Big-Step Semantics**

It is interesting to see what kind of semantics we get by rearranging or removing some steps of the above process. If, after CPS conversion, we do not generalize the continuations, but instead just lift their arguments and defunctionalize them,<sup>1</sup> we obtain a *pretty-big-step* [6] interpreter. The distinguishing feature of pretty-big-step semantics is that constructs which would normally have rules with multiple premises are factorized into intermediate constructs. As observed by Chargu´eraud, each intermediate construct corresponds to an intermediate state of the interpreter, which is why, in turn, they naturally correspond to continuations. Here are the pretty-big-step rules generated from the big-step semantics in Fig. 2 (Section 2).

<sup>1</sup> The complete transformation to pretty-big-step style involves these steps: 1. CPS conversion, 2. argument lifting, 3. removal of vacuous continuations, 4. defunctionalization, 5. merging of apply and eval, and 6. conversion to direct style.

$$\begin{array}{c} \begin{array}{l} \begin{array}{l} \rho \vdash \mathsf{val}(v) \Downarrow\_{\mathsf{B}}^{P} v \end{array} \\ \begin{array}{l} v = \rho \mathbin{x} \\ \rho \vdash \mathsf{var}(x) \Downarrow\_{\mathsf{B}}^{P} v \end{array} \end{array} \qquad \begin{array}{l} \begin{array}{l} \rho \vdash e\_{1} \Downarrow\_{\mathsf{B}}^{P} v\_{1} \ \rho \vdash \mathsf{k}\mathsf{ap}\mathbf{p}\mathbf{1}(e\_{2},v\_{1}) \Downarrow\_{\mathsf{B}}^{P} v \\ \rho \vdash \mathsf{a}\mathsf{p}\mathbf{p}(e\_{1},e\_{2}) \Downarrow\_{\mathsf{B}}^{P} v \end{array} \end{array} \end{array}$$

$$\begin{array}{l} v = \rho \mathbin{x} \\ \begin{array}{l} \rho \vdash e\_{2} \textsf{var}(x) \Downarrow\_{\mathsf{B}}^{P} v \end{array} \qquad \begin{array}{l} \rho \vdash e\_{2} \Downarrow\_{\mathsf{B}}^{P} v \textsf{\smallB} \end{array} \quad \begin{array}{l} \rho \vdash e\_{2} \textsf{var}(e\_{1},e\_{2}) \Downarrow\_{\mathsf{B}}^{P} v \\ \rho \vdash \mathsf{k}\mathsf{ap}\mathbf{p}\mathbf{1}(e\_{2},v\_{1}) \Downarrow\_{\mathsf{B}}^{P} v \end{array} \}$$

As we can see, the evaluation of **app** now proceeds through two intermediate constructs, **kapp1** and **kapp2**, which correspond to continuations introduced in the CPS conversion. The evaluation of **app**(e<sup>1</sup>, e<sup>2</sup>) starts by evaluating <sup>e</sup><sup>1</sup> to <sup>v</sup><sup>1</sup>. Then **kapp1** is responsible for evaluating <sup>e</sup><sup>2</sup> to <sup>v</sup><sup>2</sup>. Finally, **kapp2** evaluates the closure body just as the third premise of the original rule for **app**. Save for different order of arguments, the resulting intermediate constructs and their rules are identical to Chargu´eraud's examples.

#### **4.11 Pretty-Printing**

For the purpose of presenting and studying the original and transformed semantics, we add a final pretty-printing phase. This amounts to generating inference rules corresponding to the control flow in the interpreter. This pretty-printing stage can be applied to both the big-step and small-step interpreters and was used to generate many of the rules in this paper, as well as for generating the appendix of the full version of this paper [1].

### **4.12 Correctness**

A correctness proof for the full pipeline is not part of our current work. However, several of these steps (partial CPS conversion, partial argument lifting, defunctionalization, conversion to direct style) are instances of well-established techniques. In other cases, such as generalization of continuations (Section 4.2) and removal of self-recursive tail-calls (Section 4.6), we have informal proofs using equational reasoning [1]. The proof for tail-call removal is currently restricted to compositional interpreters.

# **5 Evaluation**

We have evaluated our approach to deriving small-step interpreters on a range of example languages. Table 2 presents an overview of example big-step specifications and their properties, together with their derived small-step counterparts. A full listing of the input and output specifications for these case studies appears in the appendix to the full version of the paper, which is available online [1].

**Table 2.** Overview of transformed example languages. Input is a given big-step interpreter and our transformations produce a small-step counterpart as output automatically. "Prems" columns only list structural premises: those that check for a big or small step. Unless otherwise stated, environments are used to give meaning to variables and they are represented as functions.


For our case studies, we have used call-by-value and call-by-name λ-calculi, and a simple imperative language as base languages and extended them with some common features. Overall, the small-step specifications (as well as the corresponding interpreters) resulting from our transformation are very similar to ones we could find in the literature. The differences are either well justified—for example, by different handling of value terms—or they are due to new term constructors which could be potentially eliminated by a more powerful translation.

We evaluated the correctness of our transformation experimentally, by comparing runs of the original big-step and the transformed small-step interpreters, as well as by inspecting the interpreters themselves. In a few cases, we proved the transformation correct by transcribing the input and output interpreters in Coq (as an evaluation relation coupled with a proof of determinism) and proving them equivalent. From the examples in Table 2, we have done so for "Call-byvalue", "Exceptions as state", and a simplified version of "CBV, exceptions as state".

We make a few observations about the resulting semantics here.

*New Auxiliary Constructs.* In languages that use an environment to look up values bound to variables, new constructs are introduced to keep the updated environment as context. These constructs are simple: they have two arguments – one for the environment (context) and one for the term to be evaluated in that environment. A congruence rule will ensure steps of the term argument in the given context and another rule will return the result. The construct **kclo1** from the λ-calculus based examples is a typical example.

$$\begin{array}{ccc}\hline\hline\rho\vdash\mathsf{kcol}\mathbf{1}(\rho',\mathsf{val}(v))\rightarrow\mathsf{val}(v)\\\hline\\\end{array}\qquad\begin{array}{ccc}\rho'\vdash t\rightarrow t'\\\hline\rho\vdash\mathsf{kcol}\mathbf{1}(\rho',t)\rightarrow\mathsf{kcol}\mathbf{1}(\rho',t')\\\hline\end{array}$$

As observed in Section 2, if the environment ρ is a result of updating an environment <sup>ρ</sup> with a binding of x to v, then the **app** rule

$$\frac{\rho^{\prime\prime} = \mathsf{updata}\ x\ v\ \rho^{\prime}}{\rho \vdash \mathsf{app}(\mathsf{col}(\rho^{\prime}, x, e), v) \to \mathsf{kcol}1(\rho^{\prime\prime}, e)}$$

and the above two rules can be replaced with the following rules for **app**:

$$\begin{array}{cc} \begin{array}{c} \begin{array}{c} \begin{array}{c} \rho \vdash \mathsf{app}(\mathsf{clo}(x,v,\rho'),v\_{2}) \rightarrow v \end{array} \end{array} \end{array} & \begin{array}{c} \rho'' = \mathsf{update}\,x \ v\_{2} \ \rho' \qquad \rho'' \vdash e \rightarrow e' \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \rho'' \vdash e \rightarrow e' \end{array} \end{array}$$

Another common type of constructs resulting in a recurring pattern of extra auxiliary constructs are loops. For example, the "While" language listed in Table 2 contains a while-loop with the following big-step rules:

$$\frac{\langle e\_b, \sigma \rangle \Downarrow \langle \mathbf{false}, \sigma' \rangle}{\langle \mathbf{while}(e\_b, c), \sigma \rangle \Downarrow \langle \mathbf{skip}, \sigma' \rangle}$$

$$\frac{\langle e\_b, \sigma \rangle \Downarrow \langle \mathbf{true}, \sigma' \rangle \quad \langle c, \sigma' \rangle \Downarrow \langle \mathbf{skip}, \sigma'' \rangle \quad \langle \mathbf{while}(e\_b, c), \sigma'' \rangle \Downarrow \langle v, \sigma'' \rangle}{\langle \mathbf{while}(e\_b, c), \sigma \rangle \Downarrow \langle v, \sigma'' \rangle}$$

The automatic transformation of these rules introduces two extra constructs, **kwhile1** and **ktrue1**. The former ensures the full evaluation of the condition expression, keeping a copy of it together with the while's body. The latter construct ensures the full evaluation of while's body, keeping a copy of the body together with the condition expression.

> **while**(eb, c), σ −→ **kwhile1**(c, eb, eb), σ **kwhile1**(c, eb, **true**), σ −→ **ktrue1**(eb, c, c), σ **kwhile1**(c, eb, **false**), σ −→ **skip**, σ t, σ −→ t - , σ- **kwhile1**(c, eb, t), σ −→ **kwhile1**(c, eb, t- ), σ- **ktrue1**(eb, c, **skip**), σ −→ **while**(eb, c), σ t, σ −→ t - , σ- **ktrue1**(eb, c, t), σ −→ **ktrue1**(eb, c, t- ), σ-

We observe that in a language with a conditional and a sequencing construct we can find terms corresponding to **kwhile1** and **ktrue1:**

**kwhile1**(c, eb, e- <sup>b</sup>) <sup>≈</sup> **if**(<sup>e</sup> - <sup>b</sup>, **seq**(c, **while**(eb, c)), **skip**) **ktrue1**(eb, c, c- ) <sup>≈</sup> **seq**(<sup>c</sup> - , **while**(eb, c))

The small-step semantics of **while** could then be simplified to a single rule.

$$\begin{array}{c} \hline \end{array} \langle \mathtt{while}(e\_b, c), \sigma \rangle \rightarrow \langle \mathtt{if}(e\_b, \mathtt{seq}(c, \mathtt{while}(e\_b, c)), \mathtt{skip}), \sigma \rangle$$

Our current, straightforward way of deriving term–continuation equivalents is not capable of finding these equivalences. In future work, we want to explore external tools, such as SMT solvers, to facilitate searching for translations from continuations to terms. This search could be possibly limited to a specific term depth.

*Exceptions as Values.* We tested our transformations with two ways of representing exceptions in big-step semantics currently supported by our input language: as values and as state. Representing exceptions as values appears to be more common and is used, for example, in the big-step specification of Standard ML [24], or in [6] in connection with *pretty big-step semantics*. Given a big-step specification (or interpreter) in this style, the generated small-step semantics handles exceptions correctly (based on our experiments). However, since exceptions are just values, propagation to top-level is spread out across multiple steps – depending on the depth of the term which raised the exception. The following example illustrates this behavior.

$$\begin{aligned} \mathsf{add}(1, \mathsf{add}(2, \mathsf{add}(\mathsf{raise}(3), \mathsf{raise}(4)))) &\to \mathsf{add}(1, \mathsf{add}(2, \mathsf{add}(\mathsf{exc}(3), \mathsf{raise}(4)))) \\ &\to \mathsf{add}(1, \mathsf{add}(2, \mathsf{exc}(3))) \to \mathsf{add}(1, \mathsf{exc}(3)) \to \mathsf{exc}(3) \end{aligned}$$

Since we expect the input semantics to be deterministic and the propagation of exceptions in the resulting small-step follows the original big-step semantics, this "slow" propagation is not a problem, even if it does not take advantage of "fast" propagation via labels or state. A possible solution we are considering for future work is to let the user flag values in the big-step semantics and translate such values as labels on arrows or a state change to allow propagating them in a single step.

*Exceptions as State.* Another approach to specifying exceptions is to use a flag in the configuration. Rules may be specified so that they only apply if the incoming state has no exception indicated. As with the exceptions-as-values approach, propagation rules have to be written to terminate a computation early if a computation of a subterm indicates an exception. Observe the exception propagation rule for **add** and the exception handling rule for **try**.

$$\begin{array}{c} \langle e\_1, \sigma, \mathbf{ok} \rangle \Downarrow \langle v\_1, \sigma', \mathbf{ex} \rangle\\ \hline \langle \mathbf{app}(e\_1, e\_2), \sigma, \mathbf{ok} \rangle \Downarrow \langle \mathbf{skip}, \sigma', \mathbf{ex} \rangle \end{array}$$

$$\begin{array}{c} \langle e\_1, \sigma, \mathbf{ok} \rangle \Downarrow \Downarrow \langle v\_1, \sigma', \mathbf{ex} \rangle \quad \langle e\_2, \sigma', \mathbf{ok} \rangle \Downarrow \langle v\_2, \sigma', \mathbf{ok} \rangle\\ \hline \langle \mathbf{try}(e\_1, e\_2), \sigma, \mathbf{ok} \rangle \Downarrow \langle v\_2, \sigma', \mathbf{ok} \rangle \end{array}$$

Using state to propagate exceptions is mentioned in connection with smallstep SOS in [4]. While this approach has the potential advantage of manifesting the currently raised exception immediately at the top-level, it also poses a problem of locality. If an exception is reinserted into the configuration, it might become decoupled from the original site. This can result, for example, in the wrong handler catching the exception in a following step. Our transformation deals with this style of exceptions naturally by preserving more continuations in the final interpreter. After being raised, an exception is inserted into the state and propagated to top-level by congruence rules. However, it will only be caught after the corresponding subterm has been evaluated, or rather, a value has been propagated upwards to signal a completed computation. This behavior corresponds to exception handling in big-step rules, only it is spread out over multiple steps. Continuations are kept in the final language to correspond to stages of computation and thus, to preserve the locality of a raised exception. A handler will only handle an exception once the raising subterm has become a value. Hence, the exception will be intercepted by the innermost handler – even if the exception is visible at the top-level of a step.

Based on our experiments, the exception-as-state handling in the generated small-step interpreters is a truthful unfolding of the big-step evaluation process. This is further supported by our ad-hoc proofs of equivalence between input and output interpreters. However, the generated semantics suffers from a blowup in the number of rules and moves away from the usual small-step propagation and exception handling in congruence rules. We see this as a shortcoming of the transformation. To overcome this, we briefly experimented with a case-floating stage, which would result in catching exceptions in the congruence cases of continuations. Using such transformation, the resulting interpreter would more closely mirror the standard small-step treatment of exceptions as signals. However, the conditions when this transformations should be triggered need to be considered carefully and we leave this for future work.

*Limited Non-determinism.* In the present work, our aim was to only consider deterministic semantics implemented as an interpreter in a functional programming language. However, since cases of the interpreter are considered independently in the transformation, some forms of non-determinism in the input semantics get translated correctly. For example, the following internal choice construct (cf. CSP's  operator [5,17]) gets transformed correctly. The straightforward big-step rules are transformed into small-step rules as expected. Of course, one has to keep in mind that these rules are interpreted as ordered, that is, the first rule in both styles will always apply.


# **6 Related Work**

In their short paper [18], the authors propose a direct syntactic way of deriving small-step rules from big-step ones. Unlike our approach, based on manipulating control flow in an interpreter, their transformation applies to a set of inference rules. While axioms are copied over directly, for conditional rules a stack is added to the configuration to keep track of evaluation. For each conditional bigstep rule, an auxiliary construct and 4 small-step rules are generated. Results of "premise computations" are accumulated and side-conditions are only discharged at the end of such a computation sequence. For this reason, we can view the resulting semantics more as a "leap" semantics, which makes it less suitable for a semantics-based interpreter or debugger. A further disadvantage is that the resulting semantics is far removed from a typical small-step specification with a higher potential for blow-up as 4 rules are introduced for each conditional rule. On the other hand, the delayed unification of meta-variables and discharging of side-conditions potentially makes the transformation applicable to a wider array of languages, including those where control flow is not as explicit.

In [2], the author explores an approach to constructing abstract machines from big-step (natural) specifications. It applies to a class of big-step specifications called *L-attributed big-step semantics*, which allows for sufficiently interesting languages. The extracted abstract machines use a stack of evaluation contexts to keep track of the stages of computations. In contrast, our transformed interpreters rebuild the context via congruence rules in each step. While this is less efficient as a computation strategy, the intermediate results of the computation are visible in the context of the original program, in line with usual SOS specifications.

A significant body of work has been developed on transformations that take a form of small-step semantics (usually an interpreter) and produce a big-stepstyle interpreter. The relation between semantic specifications, interpreters and abstract machines has been thoroughly investigated, mainly in the context of reduction semantics [10–13,26]. In particular, our work was inspired by and is based on Danvy's work on refocusing in reduction semantics [13] and on use of CPS conversion and defunctionalization to convert between representations of control in interpreters [11].

A more direct approach to deriving big-step semantics from small-step is taken by authors of [4], where a small-step Modular SOS specification is transformed into a pretty-big-step one. This is done by introducing reflexivity and transitivity rules into a specification, along with a "refocus" rule which effectively compresses a transition sequence into a single step. The original small-step rules are then specialized with respect to these new rules, yielding refocused rules in the style of pretty-big-step semantics [6]. A related approach is by Ciobˆac˘a [7], where big-step rules are generated for a small-step semantics. The big-step rules are, again, close to a pretty-big-step style.

# **7 Conclusion and Future Work**

We have presented a stepwise functional derivation of a small-step interpreter from a big-step one. This derivation proceeds through a sequence of, mostly basic, transformation steps. First, the big-step evaluation function is converted into continuation-passing style to make control-flow explicit. Then, the continuations are generalized (or lifted) to handle non-value inputs. The non-value cases correspond to congruence rules in small-step semantics. After defunctionalization, we remove self-recursive calls, effectively converting the recursive interpreter into a stepping function. The final major step of the transformation is to decide which continuations will have to be introduced as new auxiliary terms into the language. We have evaluated our approach on several languages covering different features. For most of these, the transformation yields small-step semantics which are close to ones we would normally write by hand.

We see this work as an initial exploration of automatic transformations of bigstep semantics into small-step counterparts. We identified a few areas where the current process could be significantly improved. These include applying better equational reasoning to identify terms equivalent to continuations, or transforming exceptions as state in a way that would avoid introducing many intermediate terms and would better correspond to usual signal handling in small-step SOS. Another research avenue is to fully verify the transformations in an interactive theorem prover, with the possibility of extracting a correct transformer from the proofs.

**Acknowledgements.** We would like to thank Jeanne-Marie Musca, Brian LaChance and the anonymous referees for their useful comments and suggestions. This work was supported in part by DARPA award FA8750-15-2-0033.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Program Semantics

# **Extended Call-by-Push-Value: Reasoning About Effectful Programs and Evaluation Order**

Dylan McDermott(B) and Alan Mycroft

Computer Laboratory, University of Cambridge, Cambridge, UK {Dylan.McDermott,Alan.Mycroft}@cl.cam.ac.uk

**Abstract.** Traditionally, reasoning about programs under varying evaluation regimes (call-by-value, call-by-name etc.) was done at the metalevel, treating them as term rewriting systems. Levy's call-by-push-value (CBPV) calculus provides a more powerful approach for reasoning, by treating CBPV terms as a common intermediate language which captures both call-by-value and call-by-name, and by allowing equational reasoning about changes to evaluation order between or within programs.

We extend CBPV to additionally deal with call-by-need, which is nontrivial because of shared reductions. This allows the equational reasoning to also support call-by-need. As an example, we then prove that callby-need and call-by-name are equivalent if nontermination is the only side-effect in the source language.

We then show how to incorporate an effect system. This enables us to exploit static knowledge of the potential effects of a given expression to augment equational reasoning; thus a program fragment might be invariant under change of evaluation regime only because of knowledge of its effects.

**Keywords:** Evaluation order · Call-by-need · Call-by-push-value · Logical relations · Effect systems

# **1 Introduction**

Programming languages based on the λ-calculus have different semantics depending on the reduction strategy employed. Three common variants are callby-value, call-by-name and call-by-need (with the third sometimes also referred to as "lazy evaluation" when data constructors defer evaluation of arguments until the data structure is traversed). Reasoning about such programs and their equivalence under varying reduction strategies can be difficult as we have to reason about meta-level reduction strategies and not merely at the object level.

Levy [17] introduced *call-by-push-value* (CBPV) to improve the situation. CBPV is a calculus with separated notions of value and computation. A characteristic feature is that each CBPV program encodes its own evaluation order. It is best seen as an *intermediate language* into which lambda-calculus-based *sourcelanguage* programs can be translated. Moreover, CBPV is powerful enough that programs employing call-by-value or call-by-name (or even a mixture) can be simply translated into it, giving an object-calculus way to reason about the meta-level concept of reduction order.

However, CBPV does not enable us to reason about call-by-need evaluation. An intuitive reason is that call-by-need has "action at a distance" in that reduction of one subterm causes reduction of all other subterms that originated as copies during variable substitution. Indeed call-by-need is often framed using mutable stores (graph reduction [32], or reducing a thunk which is accessed by multiple pointers [16]). CBPV does not allow these to be encoded.

This work presents *extended call-by-push-value* (ECBPV), a calculus similar to CBPV, but which can capture call-by-need reduction in addition to call-by-value and call-by-name. Specifically, ECBPV adds an extra primitive M needx. N which runs N, with M being evaluated the first time x is used. On subsequent uses of x, the result of the first run is returned immediately. The term M is evaluated at most once. We give the syntax and type system of ECBPV, together with an equational theory that expresses when terms are considered equal.

A key justification for an intermediate language that can express several evaluation orders is that it enables equivalences between the evaluation orders to be proved. If there are no (side-)effects at all in the source language, then call-by-need, call-by-value and call-by-name should be semantically equivalent. If the only effect is nondeterminism, then need and value (but not name) are equivalent. If the only effect is nontermination then need and name (but not value) are equivalent. We show that ECBPV can be used to prove such equivalences by proving the latter using an argument based on *Kripke logical relations of varying arity* [12].

These equivalences rely on the *language* being restricted to particular effects. However, one may wish to switch evaluation order for *subprograms* restricted to particular effects, even if the language itself does not have such a restriction. To allow reasoning to be applied to these cases, we add an *effect system* [20] to ECBPV, which allows the side-effects of subprograms to be statically estimated. This allows us to determine which parts of a program are invariant under changes in evaluation order. As we will see, support for call-by-need (and action at a distance more generally) makes describing an effect system significantly more difficult than for call-by-value.

*Contributions.* We make the following contributions:


– We refine the type system of ECBPV so that its types also carry effect information (Sect. 4). This allows equivalences between evaluation orders to be exploited, both at ECBPV and source level, when subprograms are statically limited to particular effects.

# **2 Extended Call-by-Push-Value**

We describe an extension to call-by-push-value with support for call-by-need. The primary difference between ordinary CBPV and ECBPV is the addition of a primitive that allows computations to be added to the environment, so that they are evaluated only the first time they are used. Before describing this change, we take a closer look at CBPV and how it supports call-by-value and call-by-name.

CBPV stratifies terms into *values*, which do not have side-effects, and *computations*, which might. Evaluation order is irrelevant for values, so we are only concerned with how computations are sequenced. There is exactly one primitive that causes the evaluation of more than one computation, which is the computation M to x. N. This means run the computation M, bind the result to x, and then run the computation N. (It is similar to M >>= \x -> N in Haskell.) The evaluation order is fixed: M is always eagerly evaluated. This construct can be used to implement call-by-value: to apply a function, eagerly evaluate the argument and then evaluate the body of the function. No other constructs cause the evaluation of more than one computation.

To allow more control over evaluation order, CBPV allows computations to be thunked. The term thunk M is a value that contains the thunk of the computation M. Thunks can be duplicated (to allow a single computation to be evaluated more than once), and can be converted back into computations with force V . This allows call-by-name to be implemented: arguments to functions are thunked computations. Arguments are used by forcing them, so that the computation is evaluated every time the argument is used. Effectively, there is a construct M name x. N, which evaluates M each time the variable x is used by N, rather than eagerly evaluating. (The variable x is underlined here to indicate that it refers to a computation rather than a value: uses of it may have side-effects.)

To support call-by-need, extended call-by-push-value adds another construct M needx. N. This term runs the computation N, with the computation M being evaluated the first time x is used. On subsequent uses of x, the result of the first run is returned immediately. The computation M is evaluated at most once. This new construct adds the "action at a distance" missing from ordinary CBPV.

We briefly mention that adding general mutable references to call-by-pushvalue would allow call-by-need to be encoded. However, reasoning about evaluation order would be difficult, and so we do not take this option.

#### **2.1 Syntax**

The syntax of extended call-by-push-value is given in Fig. 1. The highlighted parts are new here. The rest of the syntax is similar to CBPV.<sup>1</sup>

$$\begin{aligned} V, W &:= c \mid x \mid (V\_1, V\_2) \mid \mathsf{fst}\, V \mid \mathsf{snd}\, V \mid \mathsf{in}\, V \mid \mathsf{in}\, V \\ &\mid \mathsf{case}\, V \,\mathsf{of}\,\{\mathsf{in}\, x.W\_1, \mathsf{in}\, y.W\_2\} \mid \mathsf{thunk}\, M \\\ M, N &:= \left\lfloor \underline{x} \mid \mid \mathsf{force}\, V \mid \lambda\{i.M\_i\}\_{i \in I} \mid i^\*M \mid \lambda x.M \mid V^\*M \mid \mathsf{return}\, V \right\rfloor \\ &\mid M \,\mathsf{to}\, x.N \mid \mid M \,\mathsf{need}\,\underline{x}.N \\\ A, B &:= \mathsf{unit} \mid A\_1 \times A\_2 \mid A\_1 + A\_2 \mid \mathsf{U}\,\underline{C} \\\ \underline{C}, \underline{D} &:= \prod\_{i \in I} \underline{C}\_i \mid A \to \underline{C} \mid \mathsf{Fr}\, A \\\ \Gamma &:= \diamond \mid \Gamma, x \colon A \mid \overline{F}, \underline{\pi} : \mathsf{Fr}\, A \end{aligned}$$

**Fig. 1.** Syntax of ECBPV

We assume two sets of variables: *value variables* x, y, . . . and *computation variables* x, y,... . While ordinary CBPV does not include computation variables, they do not of themselves add any expressive power to the calculus. The ability to use call-by-need in ECBPV comes from the need construct used to bind the variable.<sup>2</sup>

There are two kinds of terms, *value terms* V,W which do not have side-effects (in particular, are strongly normalizing), and *computation terms* M,N which might have side-effects. Value terms include constants c, and specifically the constant () of type **unit**. There are no constant computation terms; value constants suffice (see Sect. 3 for an example). The value term thunk M suspends the computation M; the computation term force V runs the suspended computation V . Computation terms also include I-ary tuples λ{i. Mi}<sup>i</sup>∈<sup>I</sup> (where I ranges over *finite* sets); the ith projection of a tuple M is i'M. Functions send values to computations, and are computations themselves. Application is written V 'M, where V is the argument and M is the function to apply. The term return V is a computation that just returns the value V , without causing any side-effects. Eager sequencing of computations is given by M to x. N, which evaluates M until it returns a value, then places the result in x and evaluates N. For example, in M to x. return (x, x), the term M is evaluated once, and the result is duplicated. In M to x. return (), the term M is still evaluated once, but its result is never

<sup>1</sup> The only difference is that eliminators of product and sum types are value terms rather than computation terms (which makes value terms slightly more general). Levy [17] calls this CBPV with *complex values*.

<sup>2</sup> Computation variables are not strictly required to support call-by-need (since we can use *x* : **U** (**Fr** *A*) instead of *x* : **Fr** *A*), but they simplify reasoning about evaluation order, and therefore we choose to include them.

used. Syntactically, both to and need (explained below) are right-associative (so M<sup>1</sup> to x. M<sup>2</sup> to y.M<sup>3</sup> means M<sup>1</sup> to x.(M<sup>2</sup> to y.M3)).

The primary new construct is M need x. N. This term evaluates N. The first time x is evaluated (due to a use of x inside N) it behaves the same as the computation M. If M returns a value V , then subsequent uses of x behave the same as return V . Hence only the first use of x will evaluate M. If x is not used then M is not evaluated at all. The computation variable x bound inside the term is primarily used by eagerly sequencing it with other computations. For example,

$$M \text{ אөead } \underline{x}. \underline{x} \text{ to } y. \underline{x} \text{ to } z. \text{ теturn } (y, z)$$

uses x twice: once where the result is bound to y, and once where the result is bound to z. Only the first of these uses will evaluate M, so this term has the same semantics as M to x. return(x, x). The term M need x. return () does not evaluate M at all, and has the same semantics as return ().

With the addition of need it is not in general possible to determine the order in which computations are executed statically. Uses of computation variables are given statically, but not all of these actually evaluate the corresponding computation dynamically. In general, the set of uses of computation variables that actually cause effects depends on run-time behaviour. This will be important when describing the effect system in Sect. 4.

The standard capture-avoiding substitution of value variables in value terms is denoted V [x -→ W]. We similarly have substitutions of value variables in computation terms, computation variables in value terms, and computation variables in computation terms. Finally, we define the call-by-name construct mentioned above as syntactic sugar for other CBPV primitives:

$$M \mathtt{name} \underline{x}. N \ := \ \mathtt{then} \ k M \ \mathtt{if} \ \lambda y. N[\underline{x} \mapsto \mathtt{for} \ \mathtt{ce} \ y].$$

where y is not free in N.

Types are stratified into *value types* A, B and *computation types* C, D. Value types include the unit type, products and sum types. (It is easy to add further base types; we omit Levy's empty types for simplicity.) Value types also include *thunk types* **U** C, which are introduced by thunk M and eliminated by force V . Computation types include I-ary product types - <sup>i</sup>∈<sup>I</sup> <sup>C</sup><sup>i</sup> for finite <sup>I</sup>, function types A → C, and *returner types* **Fr** A. The latter are introduced by return V , and are the only types of computation that can appear on the left of either to or need (which are the eliminators of returner types). The type constructors **U** and **Fr** form an *adjunction* in categorical models. Finally, contexts Γ map value variables to value types, and computation variables to computation types of the form **Fr** A. This restriction is due to the fact that the only construct that binds computation variables is need, which only sequences computations of returner type. Allowing computation variables to be associated with other forms of computation type in typing contexts is therefore unnecessary. Typing contexts are ordered lists.

The syntax is parameterized by a *signature*, containing the constants c.

**Definition 1 (Signature).** *A* signature K *consists of a set* K<sup>A</sup> *of constants of type* A *for each value type* A*. All signatures contain* () ∈ K**unit***.*

#### **2.2 Type System**

The type system of extended call-by-push-value is a minor extension of the type system of ordinary call-by-push-value. Assume a fixed signature K. There are two typing judgements, one for value types and one for computation types. The rules for the value typing judgement Γ <sup>v</sup> V : A and the computation typing judgement Γ M : C are given in Fig. 2. Rules that add a new variable to the typing context implicitly require that the variable does not already appear in the context. The type system admits the usual weakening and substitution properties for both value and computation variables.

<sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup> <sup>Γ</sup> <sup>v</sup> <sup>x</sup> : <sup>A</sup> if (<sup>x</sup> : <sup>A</sup>) <sup>∈</sup> <sup>Γ</sup> <sup>Γ</sup> <sup>v</sup> <sup>c</sup> : <sup>A</sup> if <sup>c</sup> ∈ K*<sup>A</sup>* <sup>Γ</sup> <sup>M</sup> : <sup>C</sup> <sup>Γ</sup> <sup>v</sup> thunk <sup>M</sup> : **<sup>U</sup>** <sup>C</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup><sup>1</sup> : <sup>A</sup><sup>1</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup><sup>2</sup> : <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> (V1, V2) : <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> fst <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> snd <sup>V</sup> : <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>Γ</sup> <sup>v</sup> inl <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>+</sup> <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> inr <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>+</sup> <sup>A</sup><sup>2</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup><sup>1</sup> <sup>+</sup> <sup>A</sup><sup>2</sup> Γ, x : <sup>A</sup><sup>1</sup> <sup>v</sup> <sup>W</sup><sup>1</sup> : B Γ, x : <sup>A</sup><sup>2</sup> <sup>v</sup> <sup>W</sup><sup>2</sup> : <sup>B</sup> <sup>Γ</sup> <sup>v</sup> case <sup>V</sup> of {inl x. W1, inr y.W2} : <sup>B</sup> <sup>Γ</sup> <sup>M</sup> : <sup>C</sup> <sup>Γ</sup> <sup>x</sup> : **Fr** <sup>A</sup> if (<sup>x</sup> : **Fr** <sup>A</sup>) <sup>∈</sup> <sup>Γ</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : <sup>A</sup> <sup>Γ</sup> return <sup>V</sup> : **Fr** <sup>A</sup> <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : **<sup>U</sup>** <sup>C</sup> <sup>Γ</sup> force <sup>V</sup> : <sup>C</sup> (<sup>Γ</sup> <sup>M</sup>*<sup>i</sup>* : <sup>C</sup>*i*)*<sup>i</sup>*∈*<sup>I</sup>* <sup>Γ</sup> <sup>λ</sup>{i. M*<sup>i</sup>*}*<sup>i</sup>*∈*<sup>I</sup>* : - *<sup>i</sup>*∈*<sup>I</sup>* C*<sup>i</sup>* <sup>Γ</sup> <sup>M</sup> : - *<sup>i</sup>*∈*<sup>I</sup>* C*<sup>i</sup>* <sup>Γ</sup> <sup>i</sup>'<sup>M</sup> : <sup>C</sup>*<sup>i</sup>* Γ, x : <sup>A</sup> <sup>M</sup> : <sup>C</sup> <sup>Γ</sup> λx. M : A C <sup>Γ</sup> <sup>v</sup> <sup>V</sup> : A Γ <sup>M</sup> : A C <sup>Γ</sup> <sup>V</sup> '<sup>M</sup> : <sup>C</sup> <sup>Γ</sup> <sup>M</sup> : **Fr** A Γ, x : <sup>A</sup> <sup>N</sup> : <sup>C</sup> <sup>Γ</sup> <sup>M</sup> to x. N : <sup>C</sup> <sup>Γ</sup> <sup>M</sup> : **Fr** A Γ, x : **Fr** <sup>A</sup> <sup>N</sup> : <sup>C</sup> <sup>Γ</sup> <sup>M</sup> need x. N : <sup>C</sup>

**Fig. 2.** Typing rules for ECBPV

It should be clear that ECBPV is actually an extension of call-by-push-value. CBPV terms embed as terms that never use the highlighted forms. We translate call-by-need by encoding call-by-need functions as terms of the form

> λx .(force x ) need x. M

where x is not free in M. This is a call-by-push-value function that accepts a thunk as an argument. The thunk is added to the context, and the body of the function is executed. The first time the argument is used (via x), the computation inside the thunk is evaluated. Subsequent uses do not run the computation again. A translation based on this idea from a call-by-need source language is given in detail in Sect. 3.2.

#### **2.3 Equational Theory**

In this section, we present the *equational theory* of extended call-by-push-value. This is an extension of the equational theory for CBPV given by Levy [17] to support our new constructs. It consists of two judgement forms, one for values and one for computations:

$$
\Gamma \vdash\_{\mathbf{v}} V \equiv W : A \qquad \Gamma \vdash M \equiv N : \underline{C}
$$

These mean both terms are well typed, and are considered equal by the equational theory. We frequently omit the context and type when they are obvious or unimportant.

The definition is given by the axioms in Fig. 3. Note that these axioms only hold when the terms they mention have suitable types, and when suitable constraints on free variables are satisfied. For example, the second sequencing axiom holds only if x is not free in N. These conditions are left implicit in the figure. The judgements are additionally reflexive (assuming the typing holds), symmetric and transitive. They are also closed under all possible congruence rules. There are no restrictions on congruence related to evaluation order. None are necessary because ECBPV terms make the evaluation order explicit: all sequencing of computations uses to and need. Finally, note that enriching the signature with additional constants will in general require additional axioms capturing their behaviour; Sect. 3 exemplifies this for constants ⊥<sup>A</sup> representing nontermination.

For the equational theory to capture call-by-need, we might expect computation terms that are not of the form return V to never be duplicated, since they should not be evaluated more than once. There are two exceptions to this. Such terms can be duplicated in the axioms that duplicate value terms (such as the β laws for sum types). In this case, the syntax ensures such terms are thunked. This is correct because we should allow these terms to be executed once in each separate execution of a computation (and separate executions arise from duplication of thunks). We are only concerned with duplication *within* a single computation. Computation terms can also be duplicated across multiple elements of a tuple λ{i. Mi} of computation terms. This is also correct, because only one component

**Fig. 3.** Equational theory of ECBPV

of a tuple can be used within a single computation (without thunking), so the effects still will not happen twice. (There is a similar consideration for functions, which can only be applied once.) The remainder of the axioms never duplicate need-bound terms that might have effects.

The majority of the axioms of the equational theory are standard. Only the axioms involving need are new; these are highlighted. The first new sequencing axiom (in Fig. 3c) is the crucial one. It states that if a computation will next evaluate x, where x is a computation variable bound to M, then this is the same as evaluating M, and then using the result for subsequent uses of x. In particular, this axiom (together with the η law for **Fr**) implies that M need x. x ≡ M.

The second sequencing axiom does *garbage collection* [22]: if a computation bound by need is not used (because the variable does not appear), then the binding can be dropped. This equation implies, for example, that

$$M\_1 \text{ need } \underline{x}\_1. M\_2 \text{ need } \underline{x}\_2. \cdots \cdot M\_n \text{ need } \underline{x}\_n. \text{ return } () \equiv \text{ return } ().$$

The next four sequencing axioms (two from CBPV and two new) state that binding a computation with to or need commutes with the remaining forms of computation terms. These allow to and need to be moved to the outside of other constructs *except* thunks. The final four axioms (one from CBPV and three new) capture associativity and commutativity involving need and to; again these parallel the existing simple associativity axiom for to.

Note that associativity between different evaluation orders is not necessarily valid. In particular, we do not have

$$(M\_1 \text{ to } x.M\_2) \text{ need } \underline{y}.M\_3 \quad \equiv \quad M\_1 \text{ to } x.(M\_2 \text{ need } \underline{x}.M\_3).$$

(The first term might not evaluate M1, the second always does.) This is usually the case when evaluation orders are mixed [26].

These final two groups allow computation terms to be placed in normal forms where bindings of computations are on the outside. (Compare this with the translation of source-language *answers* given in Sect. 3.2.) Finally, the β law for need (in Fig. 3a) parallels the usual β law for to: it gives the behaviour of computation terms that return values without having any effects.

The above equational theory induces a notion of *contextual equivalence* ∼=ctx between ECBPV terms. Two terms are contextually equivalent when they have no observable differences in behaviour. When we discuss *equivalences between evaluation orders* in Sect. 3, ∼=ctx is the notion of *equivalence between terms* that we consider.

Contextual equivalence is defined as follows. The *ground types* G are the value types that do not contain thunks:

$$G ::= \mathbf{unit} \mid G\_1 \times G\_2 \mid G\_1 + G\_2$$

A *value-term context* C[−] is a computation term with a single hole (written −), which occurs in a position where a value term is expected. We write C[V ] for the computation term that results from replacing the hole with V . Similarly, *computation-term contexts* C[−] are computation terms with a single hole where a computation term is expected, and C[M] is the term in which the hole is replaced by M. Contextual equivalence says that the terms cannot be distinguished by closed computations that return ground types. (Recall that is the empty typing context.)

**Definition 2 (Contextual equivalence).** *There are two judgement forms of* contextual equivalence*.*

*1. Between value terms:* Γ <sup>v</sup> V ∼=ctx W : A *if* Γ <sup>v</sup> V : A*,* Γ <sup>v</sup> W : A*, and for all ground types* G *and value-term contexts* C *such that* C[V ] : **Fr** G *and* C[W] : **Fr** G *we have*

$$\circ \vdash \mathcal{C}[V] \equiv \mathcal{C}[W] : \mathbf{Fr} \, G$$

*2. Between computation terms:* Γ M ∼=ctx N : C *if* Γ M : C*,* Γ N : C*, and for all ground types* G *and computation-term contexts* C[−] *such that* C[M] : **Fr** G *and* C[N] : **Fr** G *we have*

$$\circ \vdash \underline{\mathcal{L}}[M] \equiv \underline{\mathcal{L}}[N] : \mathbf{Fr} \, G$$

# **3 Call-by-Name and Call-by-Need**

Extended call-by-push-value can be used to prove equivalences between evaluation orders. In this section we prove a classic example: if the only effect in the source language is nontermination, then call-by-name is equivalent to call-by-need. We do this in two stages.

First, we show that call-by-name is equivalent to call-by-need *within* ECBPV (Sect. 3.1). Specifically, we show that

$$M \text{ ваше } \underline{x}. N \cong\_{\text{ctx}} M \text{ вееed } \underline{x}. N$$

(Recall that M name x. N is syntactic sugar for thunk M ' λy. N[x -→ force y].)

Second, an important corollary is that the meta-level reduction strategies are equivalent (Sect. 3.2). We show this by describing a lambda-calculus-based source language together with a call-by-name and a call-by-need operational semantics and giving sound (see Theorem 2) call-by-name and call-by-need translations into ECBPV. The former is based on the translation into the monadic metalanguage given by Moggi [25] (we expect Levy's translation [17] to work equally well). The call-by-need translation is new here, and its existence shows that ECBPV does indeed subsume call-by-need. We then show that given any source-language expression, the two translations give contextually equivalent ECBPV terms.

To model non-termination being our sole source-language effect, we use the ECBPV signature which contains a constant ⊥<sup>A</sup> : **U** (**Fr** A) for each value type A, representing a thunked diverging computation. It is likely that our proofs still work if we have general fixed-point operators as constants, but for simplicity we do not consider this here. The constants ⊥<sup>A</sup> enable us to define a diverging computation Ω<sup>C</sup> for each computation type C:

$$\Omega\_{\mathbf{Fr}A} \coloneqq \mathbf{force} \perp\_A \qquad \Omega\_{\prod\_{i \in I} \underline{\mathcal{L}\_i}} \coloneqq \lambda \{ i . \Omega\_{\underline{\mathcal{L}\_i}} \}\_{i \in I} \qquad \Omega\_{A \to \underline{\mathcal{L}}} \coloneqq \lambda x . \Omega\_{\underline{\mathcal{L}}}$$

We characterise nontermination by augmenting the equational theory of Sect. 2.3 with the axiom

$$F \vdash \, \Omega\_{\mathbf{Fr}A} \mathbf{ to} \, x, M \equiv \, \Omega\_{\underline{C}} \, : \, \underline{C} \, \tag{\text{Omega}}$$

for each context Γ, value type A and computation type C. In other words, diverging as part of a larger computation causes the entire computation to diverge. This is the only change to the equational theory we need to represent nontermination. In particular, we do not add additional axioms involving need.

#### **3.1 The Equivalence at the Object (Internal) Level**

In this section, we show our primary result that

#### M name x. N ∼=ctx M need x. N

As is usually the case for proofs of contextual equivalence, we use *logical relations* to get a strong enough inductive hypothesis for the proof to go through. However, unlike the usual case, it does not suffice to relate *closed* terms. To see why, consider a closed term M of the form

$$\Omega\_{\mathbf{Fr}A} \text{ אөead } \underline{x}. N\_1 \text{ to } y. N\_2$$

If we relate only closed terms, then we do not learn anything about N<sup>1</sup> itself (since x may be free in it). We could attempt to proceed by considering the closed term Ω**Fr**<sup>A</sup> need x. N1. For example, if this returns a value V then x cannot have been evaluated and M should have the same behaviour as Ω**Fr**<sup>A</sup> need x. N2[y -→ V ]. However, we get stuck when proving the last step. This is only a problem because Ω**Fr**<sup>A</sup> is a nonterminating computation: every terminating computation of returner type has the form return V (up to ≡), and when these are bound using need we can eliminate the binding using the equation

$$\text{return } V \text{ } \mathsf{need} \left\underline{x}. M \equiv M[\underline{x} \mapsto \mathsf{return } V].$$

The solution is to relate terms that may have free computation variables (we do not need to consider free value variables). The free computation variables should be thought of as referring to nonterminating computations (because we can remove the bindings of variables that refer to terminating computations). We relate open terms using *Kripke logical relations of varying arity*, which were introduced by Jung and Tiuryn [12] to study lambda definability.

We need a number of definitions first. A context Γ *weakens* another context Γ, written Γ Γ, whenever Γ is a sublist of Γ . For example, (Γ, x : **Fr** A) Γ. We define Term<sup>Γ</sup> <sup>A</sup> as the set of equivalence classes (up to the equational theory <sup>≡</sup>) of terms of value type <sup>A</sup> in context <sup>Γ</sup>, and similarly define Term<sup>Γ</sup> <sup>D</sup> for computation types:

$$\text{Term}\_A^{\Gamma} \coloneqq \{ [V] \sqsubseteq \mid \varGamma \vdash\_{\text{v}} V : A \} \qquad \underline{\text{Term}}\_{\underline{D}}^{\Gamma} \coloneqq \{ [M] \sqsubseteq \mid \varGamma \vdash M : \underline{D} \}$$

Since weakening is admissible for both typing judgements, Γ Γ implies that Term<sup>Γ</sup> <sup>A</sup> <sup>⊆</sup> TermΓ <sup>A</sup> and Term<sup>Γ</sup> <sup>D</sup> <sup>⊆</sup> TermΓ <sup>D</sup> (note the contravariance).

A *computation context*, ranged over by Δ, is a typing context that maps variables to computation types (i.e. has the form x<sup>1</sup> : **Fr** A1,...,x<sup>n</sup> : **Fr** An). Variables in computation contexts refer to nonterminating computations for the proof of contextual equivalence. A *Kripke relation* is a family of binary relations indexed by computation contexts that respects weakening of terms:

**Definition 3 (Kripke relation).** *A* Kripke relation R *over a value type* A *(respectively a computation type* <sup>D</sup>*) is a family of relations* <sup>R</sup><sup>Δ</sup> <sup>⊆</sup> Term<sup>Δ</sup> <sup>A</sup> × Term<sup>Δ</sup> <sup>A</sup> *(respectively* <sup>R</sup><sup>Δ</sup> <sup>⊆</sup> Term<sup>Δ</sup> <sup>D</sup> <sup>×</sup>Term<sup>Δ</sup> <sup>D</sup>*) indexed by computation contexts* Δ *such that whenever* <sup>Δ</sup> Δ *we have* <sup>R</sup><sup>Δ</sup> <sup>⊆</sup> <sup>R</sup><sup>Δ</sup> *.*

Note that we consider binary relations on equivalence classes of terms because we want to relate pairs of terms up to ≡ (to prove contextual equivalence). The relations we define are *partial equivalence relations* (i.e. symmetric and transitive), though we do not explicitly use this fact.

We need the Kripke relations we define over computation terms to be closed under sequencing with nonterminating computations. (For the rest of this section, we omit the square brackets around equivalence classes.)

**Definition 4.** *A Kripke relation* R *over a computation type* C *is* closed under sequencing *if each of the following holds:*


$$\begin{array}{ll}(N\ \mathsf{need}\ \underline{y}.M,\ N\ \mathsf{need}\ \underline{y}.M') & (M[\underline{y}\mapsto N],\ M'[\underline{y}\mapsto N])\\(M[\underline{y}\mapsto N],\ N\ \mathsf{need}\ \underline{y}.M') & (N\ \mathsf{need}\ \underline{y}.M,\ M'[\underline{y}\mapsto N])\end{array}$$

For the first case of the definition, recall that the computation variables in Δ refer to nonterminating computations. Hence the behaviour of M and M are irrelevant (they are never evaluated), and we do not need to assume they are related.<sup>3</sup> The second case implies (using axiom Omega) that

$$(\Omega\_{\mathbf{Fr}A} \text{ to } y. \, M, \, \Omega\_{\mathbf{Fr}A} \text{ to } y. \, M') \in R^{\Delta}$$

<sup>3</sup> This is why it suffices to consider only computation contexts. If we had to relate *M* to *M* then we would need to consider relations between terms with free value variables.

$$R\_A^{\Delta} \subseteq \text{Term}\_A^{\Delta} \times \text{Term}\_A^{\Delta}$$

$$\boxed{R\_{\underline{C}}^{\Delta} \subseteq \underline{\operatorname{Term}}\_{\underline{C}}^{\Delta} \times \underline{\operatorname{Term}}\_{\underline{C}}^{\Delta}}$$

$$R\_{\mathbf{Fr}A} \coloneqq \text{the smallest closed-under-sequencing Kripke relation such that}$$

$$(V, V') \in R\_A^{\Delta} \implies (\mathsf{return} \, V, \mathsf{return} \, V') \in R\_{\mathsf{Fr}A}^{\Delta}$$

$$R\_{\prod\_{i \in I} \underline{C\_i}} \coloneqq \{ (M, M') \mid \forall i \in I. \, (i \,^i M, i^\prime M') \in R\_{\underline{C}\_i}^{\Delta} \}$$

$$R\_{A \to \underline{C}}^{\Delta} \coloneqq \{ (M, M') \mid \forall \Delta', V, V'. \, \Delta' \rhd \Delta \land \langle V, V' \rangle \in R\_A^{\Delta'} \implies \langle V' \, M, V'^{\ell\_1} M' \rangle \in R\_{\underline{C}}^{\Delta'} \}$$

**Fig. 4.** Definition of the logical relation

mirroring the first case. The third case is the most important. It is similar to the first (it is there to ensure that the relation is closed under the primitives used to combine computations). However, since we are showing that need is contextually equivalent to substitution, we also want these to be related. We have to consider computation variables in the definition (as possible terms N) only because of our use of Kripke logical relations. For ordinary logical relations, there would be no free variables to consider.

The key part of the proof of contextual equivalence is the definition of the Kripke logical relation, which is a family of relations indexed by value and computation types. It is defined in Fig. 4 by induction on the structure of the types. In the figure, we again omit square brackets around equivalence classes.

The definition of the logical relation on ground types (**unit**, sum types and product types) is standard. Since the only way to use a thunk is to force it, the definition on thunk types just requires the two forced computations to be related.

For returner types, we want any pair of computations that return related values to be related. We also want the relation to be closed under sequencing, in order to show the fundamental lemma (below) for to and need. We therefore define R**Fr**<sup>A</sup> as the smallest such Kripke relation. For products of computation types the definition is similar to products of value types: we require that each of the projections are related. For function types, we require as usual that related arguments are sent to related results. For this to define a Kripke relation, we have to quantify over all computation contexts Δ that weaken Δ, because of the contravariance of the argument.

The relations we define are Kripke relations. Using the sequencing axioms of the equational theory, and the β and η laws for computation types, we can show that R<sup>C</sup> is closed under sequencing for each computation type C. These facts are important for the proof of the fundamental lemma.

*Substitutions* are given by the following grammar:

$$
\sigma ::= \diamond \mid \sigma, x \mapsto V \mid \sigma, \underline{x} \mapsto M
$$

We have a typing judgement Δ σ : Γ for substitutions, meaning in the context Δ the terms in σ have the types given in Γ. This is defined as follows:

Δ : Δ σ : Γ Δ <sup>v</sup> V : A Δ (σ, x-→V ):(Γ, x : A) Δ σ : Γ Δ M : **Fr** A Δ (σ, x-→M):(Γ, x : **Fr** A)

We write V [σ] and M[σ] for the applications of the substitution σ to value terms V and computation terms M. These are defined by induction on the structure of the terms. The key property of the substitution typing judgement is that if Δ σ : Γ, then Γ <sup>v</sup> V : A implies Δ <sup>v</sup> V [σ] : A and Γ M : C implies Δ M[σ] : C. The equational theory gives us an obvious pointwise equivalence relation <sup>≡</sup> on well-typed substitutions. We define sets Subst<sup>Δ</sup> <sup>Γ</sup> of equivalence classes of substitutions, and extend the logical relation by defining R<sup>Δ</sup> <sup>Γ</sup> <sup>⊆</sup> Subst<sup>Δ</sup> <sup>Γ</sup> <sup>×</sup> Subst<sup>Δ</sup> Γ :

$$\begin{aligned} \text{Subst}\_{\Gamma}^{\Delta} &:= \{ [\sigma] \equiv \mid \Delta \vdash \sigma : \Gamma \} \\ R\_{\diamond}^{\Delta} &:= \{ (\diamond, \diamond) \} \\ R\_{\Gamma, x:A}^{\Delta} &:= \{ ((\sigma, \, x \mapsto V), (\sigma', \, x \mapsto V')) \mid (\sigma, \sigma') \in R\_{\Gamma}^{\Delta} \land (V, V') \in R\_{A}^{\Delta} \} \\ R\_{\Gamma, \underline{x}: \textbf{Fr}A}^{\Delta} &:= \{ ((\sigma, \, \underline{x} \mapsto M), (\sigma', \, \underline{x} \mapsto M')) \mid (\sigma, \sigma') \in R\_{\Gamma}^{\Delta} \land (M, M') \in R\_{\textbf{Fr}A}^{\Delta} \} \end{aligned}$$

As usual, the logical relations satisfy a *fundamental lemma*.

#### **Lemma 1 (Fundamental)**

*1. For all value terms* Γ <sup>v</sup> V : A*,*

$$(\sigma, \sigma') \in R\_{\Gamma}^{\Delta} \quad \Rightarrow \quad (V[\sigma], V[\sigma']) \in R\_{A}^{\Delta}$$

*2. For all computation terms* Γ M : C*,*

$$(\sigma, \sigma') \in R\_{\varGamma}^{\Delta} \quad \Rightarrow \quad (M[\sigma], M[\sigma']) \in R\_{\varGamma}^{\Delta}$$

The proof is by induction on the structure of the terms. We use the fact that each R<sup>C</sup> is closed under sequencing for the to and need cases. For the latter, we also use the fact that the relations respect weakening of terms.

We also have the following two facts about the logical relation. The first roughly is that name is related to need by the logical relation, and is true because of the additional pairs that are related in the definition of closed-undersequencing (Definition 4).

**Lemma 2.** *For all computation terms* Γ M : **Fr** A *and* Γ, x : **Fr** A N : C *we have*

$$(\sigma, \sigma') \in R\_{\varGamma}^{\Delta} \quad \Rightarrow \quad ((N[\underline{x} \mapsto M])[\sigma], (M \text{ need } \underline{x}.N)[\sigma']) \in R\_{\varSigma}^{\Delta}$$

The second fact is that related terms are contextually equivalent.

#### **Lemma 3**

*1. For all value terms* Γ <sup>v</sup> V : A *and* Γ <sup>v</sup> V : A*, if* (V [σ], V [σ ]) <sup>∈</sup> <sup>R</sup><sup>Δ</sup> <sup>A</sup> *for all* (σ, σ ) <sup>∈</sup> <sup>R</sup><sup>Δ</sup> <sup>Γ</sup> *then*

$$F \vdash\_{\mathbf{v}} V \cong\_{\text{ctx}} V' : A$$

*2. For all computation terms* Γ M : C *and* Γ M : C*, if* (M[σ], M [σ ]) <sup>∈</sup> <sup>R</sup><sup>Δ</sup> C *for all* (σ, σ ) <sup>∈</sup> <sup>R</sup><sup>Δ</sup> <sup>Γ</sup> *then*

$$F \vdash M \cong\_{\text{ctx}} M' : \underline{C}$$

This gives us enough to achieve the goal of this section.

**Theorem 1.** *For all computation terms* Γ M : **Fr** A *and* Γ, x : **Fr** A N : C*, we have*

$$T \vdash M \text{ вапе } \underline{x}.N \cong\_{\text{ctx}} M \text{ вееd } \underline{x}.N:\underline{C}$$

#### **3.2 The Meta-level Equivalence**

In this section, we show that the equivalence between call-by-name and callby-need also holds on the meta-level; this is a consequence of the object-level theorem, rather than something that is proved from scratch as it would be in a term rewriting system.

To do this, we describe a simple lambda-calculus-based source language with divergence as the only side-effect and give it a call-by-name and a call-by-need operational semantics. We then describe two translations from the source language into ECBPV. The first is a call-by-name translation based on the embedding of call-by-name in Moggi's [25] monadic metalanguage. The second is a call-by-need translation that uses our new constructs. The latter witnesses the fact that ECBPV does actually support call-by-need. Finally, we show that the two translations give contextually equivalent ECBPV terms.

The syntax, type system and operational semantics of the source language are given in Fig. 5. Most of this is standard. We include only booleans and function types for simplicity. In expressions, we include a constant diverge<sup>A</sup> for each type A, representing a diverging computation. (As before, it should not be difficult to replace these with general fixed-point operators.) In typing contexts, we assume that all variables are distinct, and omit the required side-condition from the figure. There is a single set of variables x, y, . . . ; we implicitly map these to ECBPV value or computation variables as required.

$$\begin{array}{llll}\text{if true then } e\_2 \text{ else } e\_3 & \stackrel{\text{name}}{\leadsto}\_2 & \text{diverge}\_A \stackrel{\text{name}}{\leadsto}\_2 \text{ diverge}\_A\\\text{if false then } e\_2 \text{ else } e\_3 & \stackrel{\text{name}}{\leadsto}\_2 & \text{if } \text{diverge}\_{\text{bool}} \text{ then } e\_2 \text{ else } e\_3 & \stackrel{\text{name}}{\leadsto}\_2 \text{ diverge}\_A\\\left(\lambda x.e\right)e' & \stackrel{\text{name}}{\leadsto}\_2 \left[x \mapsto e'\right] & \qquad \text{diverge}\_{A \to B} e' & \stackrel{\text{name}}{\leadsto}\_2 \text{ diverge}\_B\\\\ & e\_1 & \stackrel{\text{name}}{\leadsto}\_1 e'\_1 & & e\_1 \stackrel{\text{name}}{\leadsto}\_2 e'\_1\\\text{if } e\_1 \text{ then } e\_2 \text{ else } e\_3 & \stackrel{\text{name}}{\leadsto}\_1 \text{ if } e'\_1 \text{ then } e\_2 \text{ else } e\_3 & & \\ \end{array}$$

$$\begin{array}{llll} \text{if true then } e\_2 \text{ else } e\_3 \stackrel{\text{neced}}{\leadsto} e\_2 & \text{diverge}\_A & \stackrel{\text{neced}}{\leadsto} \text{ diverge}\_A\\ \text{if false then } e\_2 \text{ else } e\_3 \stackrel{\text{neced}}{\leadsto} e\_3 & & E[\text{diverge}\_A] \stackrel{\text{neced}}{\leadsto} \text{ diverge}\_B\\ & \quad \quad \quad \quad \quad \quad (\lambda x.E[x]) \, v & & E[\text{diverge}\_A] \stackrel{\text{neced}}{\leadsto} \text{ diverge}\_B\\ & \quad \quad \quad \quad (\lambda x.a) \, e\_1 \, e\_2 & \stackrel{\text{neced}}{\leadsto} \text{ (\lambda x.a \, e\_2) \, e\_1} & & \quad \quad \quad \quad \quad \quad \quad E[e] \stackrel{\text{neced}}{\leadsto} \, e'\\\\ & \quad \quad \quad \quad \quad (\lambda x.E[x]) \, ((\lambda y.a) \, e) & \stackrel{\text{neced}}{\leadsto} \text{ (\lambda y. (\lambda x.E[x]) \, a) \, e} & & \\ & \quad \quad \quad (1) \, G \, \text{Ub. (\lambda x.a) \, a \, e } \text{ (\lambda x.a) \, e} \end{array}$$

**Fig. 5.** The source language

The call-by-name operational semantics is straightforward; its small-step reductions are written e name e .

The call-by-need operational semantics is based on Ariola and Felleisen [2]. The only differences between the source language and Ariola and Felleisen's calculus are the addition of booleans, divergeA, and a type system. It is likely that we can translate other call-by-need calculi, such as those of Launchbury [16] and Maraist et al. [22]. Call-by-need small-step reductions are written e need e .


The first two reduction axioms (on the left) of the call-by-need semantics (Fig. 5d) are obvious. The third axiom is the most important: it states that if the subexpression currently being evaluated is a variable x, and the environment maps x to a source-language value v, then that use of x can be replaced with v. Note that E[v] may contain other uses of x; the replacement only occurs when the value is actually needed. This axiom roughly corresponds to the first sequencing axiom of the equational theory of ECBPV (in Fig. 3c). The fourth and fifth axioms of the call-by-need operational semantics rearrange the environment into a standard form. Both use a syntactic restriction to answers so that each expression has at most one reduct (this restriction is not needed to ensure that need captures call-by-need). The rule on the right of the Fig. 5d states that the reduction relation is a congruence (a needed subexpression can be reduced).

The two translations from the source language to ECBPV are given in Fig. 6. The translation of types (Fig. 6a) is shared between call-by-name and call-byneed. The two translations differ only for contexts and expressions. Types A are translated into value types -A. The type **bool** becomes the two-element sum type **unit**+**unit**. The translation of a function type A → B is a thunked CBPV function type. The argument is a thunk of a computation that returns an -A, and the result is a computation that returns a -B.

For call-by-name (Fig. 6b), contexts Γ are translated into contexts -Γ name that contain thunks of computations. We could also have used contexts containing computation variables (omitting the thunks), but choose to use thunks to keep the translation as close as possible to previous translations into callby-push-value. A well-typed expression Γ e : A is translated into a ECBPV computation term ename that returns -A, in context -Γ name. The translation

$$\{\text{bool}\} \mathrel{\mathop{:=}} \text{unit} \mathrel{\mathop{:=}} \text{unit} \qquad \{A \to B\} \mathrel{\mathop{:=}} \text{if } \left(\mathbf{U} \left(\mathbf{Fr} \left\{A\right\} \right) \to \mathbf{Fr} \left\{B\right\} \right)$$

**Fig. 6.** Translation from the source language to ECBPV

of variables just forces the relevant variable in the context. The diverging computations diverge<sup>A</sup> just use the diverging constants from our ECBPV signature. The translations of true and false are simple: they are computations that immediately return one of the elements of the sum type **unit** + **unit**. The translation of if e<sup>1</sup> then e<sup>2</sup> else e<sup>3</sup> first evaluates e1name, then uses the result to choose between e2name and e3name. Lambdas are translated into computations that just return a thunked computation. Finally, application first evaluates the computation that returns a thunk of a function, and then forces this function, passing it a thunk of the argument.

For call-by-need (Fig. 6c), contexts Γ are translated into contexts -Γ need, containing computations that return values. The computations in the context are all bound using need. An expression Γ e : A is translated to a computation eneed that returns -A in the context -Γ need. The typing is therefore similar to call-by-name. The key case is the translation of lambdas. These become computations that immediately return a thunk of a function. The function places the computation given as an argument onto the context using need, so that it is evaluated at most once, before executing the body. The remainder of the cases are similar to call-by-name.

Under the call-by-need translation, the expression (λx. e1) e<sup>2</sup> is translated into a term that executes the computation e1need, and executes e2need only when needed. This is the case because, by the β rules for thunks, functions, and returner types:

$$\left\| \left( \lambda x. e\_1 \right) e\_2 \right\|^{\text{need}} \equiv \left\| e\_2 \right\|^{\text{need}} \text{ need } \underline{x}. \left\| e\_1 \right\|^{\text{need}}$$

As a consequence, translations of answers are particularly simple: they have the following form (up to ≡):

$$M\_1 \text{ веed } \underline{x}\_1. M\_2 \text{ веed } \underline{x}\_2. \text{ } \cdot \cdot \cdot M\_n \text{ веed } \underline{x}\_n. \text{ } \text{ } \text{теturn } V$$

which intuitively means the value V in the environment mapping each x<sup>i</sup> to Mi.

It is easy to see that both translations produce terms with the correct types. We prove that both translations are *sound*: if e name e then <sup>e</sup>name <sup>≡</sup> e name, and if e need e then <sup>e</sup>need <sup>≡</sup> e need. To do this for call-by-need, we first look at translations of evaluation contexts. The following lemma says the translation captures the idea that the hole in an evaluation context corresponds to the term being evaluated.

**Lemma 4.** *Define, for each evaluation context* E[−]*, the term* Ey-<sup>E</sup>[−]need *by:*

$$\begin{split} \mathcal{E}\_{y} \textsf{(}\![-]\text{)}^{\text{need}} &:= \texttt{return}\, y \\ \mathcal{E}\_{y} \textsf{(}\![\text{if }\,E[-]\text{ then }e\_{2}\text{ else }e\_{3}\text{)}^{\text{need}} &:= \mathcal{E} \{ \![E[-]\text{]}\text{)}^{\text{need}} \text{ to }x.\text{ force} (\texttt{case }x\text{ of} \\ &\quad \{\!in\!\!\/ \!\/.z.\text{th}\text{rank}\{e\_{2}\}^{\text{need}} \\ &\quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \downarrow \mathtt{in} \, z.\text{ then} \& \{e\_{3}\text{)}^{\text{need}} \text{'} \\ \mathcal{E}\_{y} \{ \![E[-]\,e\_{2}\}^{\text{need}} \text{'} &:= \mathcal{E}\_{y} \{ \![E[-]\}^{\text{need}} \text{ to }x.\text{ } \texttt{thm}\{e\_{2}\}^{\text{need}} \text{ '} \text{ force } x.\text{)} \\ \mathcal{E}\_{y} \{ \!\langle \lambda x.E[x] \rangle E'[-]\text{)}^{\text{need}} &:= \mathcal{E}\_{y} \{ E'[-]\}^{\text{need}} \text{ '} \text{need} \, \underline{x}.\text{ } \langle E[x] \rangle^{\text{need}} \\ \mathcal{E}\_{y} \{ \!\langle \lambda x.E[-]\,\cdot \rangle e\_{2}\text{)}^{\text{need}} &:= \{e\_{2}\}^{\text{need}} \text{ '} \text{need} \, \underline{x}.\mathcal{E}\_{y} \{ E[-]\}^{\text{need}} \end{split}$$

*For each expression* e *we have:*

$$\nleq\_{\mathbb{E}} \mathbb{E}[e] \mathbb{J}^{\text{need}} \equiv \mathbb{\ell}e \mathbb{J}^{\text{need}} \text{ to } y. \mathcal{E}\_y \nwarrow E[-] \mathbb{J}^{\text{need}}$$

This lemma omits the typing of expressions for presentational purposes. It is easy to add suitable constraints on typing. Soundness is now easy to show:

**Theorem 2 (Soundness).** *For any two well-typed source-language expressions* Γ e : A *and* Γ e : A*:*


Now that we have sound call-by-name and call-by-need translations, we can state the meta-level equivalence formally. Suppose we are given a possibly open source-language expression Γ e : B. Recall that the call-by-need translation uses a context containing computation variables (i.e. -Γ need) and the call-byname translation uses a context containing value variables, which map to thunks of computations. We have two ECBPV computation terms of type **Fr** -B in context -Γ need: one is just eneed, the other is ename with all of its variables substituted with thunked computations. The theorem then states that these are contextually equivalent.

**Theorem 3 (Equivalence between call-by-name and call-by-need).** *For all source-language expressions* e *satisfying* x<sup>1</sup> : A1,...,x<sup>n</sup> : A<sup>n</sup> e : B

> e name[x<sup>1</sup> -→ thunk x1,...,x<sup>n</sup> -→ thunk xn] ∼=ctx e need

*Proof.* The proof of this theorem is by induction on the typing derivation of e. The interesting case is lambda abstraction, where we use the internal equivalence between call-by-name and call-by-need (Theorem 1).

# **4 An Effect System for Extended Call-by-Push-Value**

The equivalence between call-by-name and call-by-need in the previous section is predicated on the only effect in the language being nontermination. However, suppose the primitives of language have various effects (which means that in general the equivalence fails) but a given subprogram may be statically shown to have at most nontermination effects. In this case, we should be allowed to exploit the equivalence on the subprogram, interchanging call-by-need and callby-name locally, even if the rest of the program uses other effects. In this section, we describe an *effect system* [20] for ECBPV, which statically estimates the sideeffects of expressions, allowing us to exploit equivalences which hold only within subprograms. Effect systems can also be used for other purposes, such as proving the correctness of effect-dependent program transformations [7,29]. The ECBPV effect system also allows these.

Call-by-need makes statically estimating effects difficult. Computation variables bound using need might have effects on their first use, but on subsequent uses do not. Hence to precisely determine the effects of a term, we must track which variables have been used. McDermott and Mycroft [23] show how to achieve this for a call-by-need effect system; their technique can be adapted to ECBPV. Here we take a simpler approach. By slightly restricting the *effect algebras* we consider, we remove the need to track variable usage information, while still ensuring the effect information is not an underestimate (an underestimate would enable incorrect transformations). This can reduce the precision of the effect information obtained, but for our use case (determining equivalences between evaluation orders) this is not an issue, since we primarily care about which effects are used (rather than e.g. how many times they are used).

#### **4.1 Effects**

The effect system is parameterized by an *effect algebra*, which specifies the information that is tracked. Different effect algebras can be chosen for different applications. There are various forms of effect algebra. We follow Katsumata [15] and use *preordered monoids*, which are the most general.

**Definition 5 (Preordered monoid).** *A* preordered monoid (F, ≤, ·, 1) *consists of a monoid* (F, ·, 1) *and a preorder* ≤ *on* F*, such that the binary operation* · *is monotone in each argument separately.*

Since we do not track variable usage information, we might misestimate the effect of a call-by-need computation variable evaluated for a second time (whose true effect is 1). To ensure this misestimate is an overestimate, we assume that the effect algebra is *pointed* (which is the case for most applications).

**Definition 6 (Pointed preordered monoid).** *A preordered monoid* (F, ≤ , ·, 1) *is* pointed *if for all* f ∈ F *we have* 1 ≤ f*.*

The elements f of the set F are called *effects*. Each effect abstractly represents some potential side-effecting behaviours. The order ≤ provides *approximation* of effects. When f ≤ f this means behaviours represented by f are included in those represented by f . The binary operation · represents sequencing of effects, and 1 is the effect of a side-effect-free expression.

Traditional (*Gifford-style*) effect systems have some set Σ of *operations* (for example, Σ := {read,write}), and use the preordered monoid (PΣ, ⊆,∪, ∅). In these cases, an effect f is just a set of operations. If a computation has effect f then f contains all of the operations the computation *may* perform. They can therefore be used to enforce that computations do not use particular operations. Another example is the preordered monoid (N<sup>+</sup>, <sup>≤</sup>, <sup>+</sup>, 1), which can be used to count the number of possible results a nondeterministic computation can return (or to count the number of times an operation is used).

In our example, where we wish to establish whether the effects of an expression are restricted to nontermination for our main example, we use the twoelement preorder {diveff ≤ } with join for sequencing and diveff as the unit 1. The effect diveff means side-effects restricted to (at most) nontermination, and means unrestricted side-effects. Thus we would enable the equivalence between call-by-name and call-by-need when the effect is diveff, and not when it is . All of these examples are pointed. Others can be found in the literature.

**Fig. 7.** Subtyping in the ECBPV effect system

#### **4.2 Effect System and Signature**

The effect system includes effects within types. Specifically, each computation of returner type will have some side-effects when it is run, and hence each returner type **Fr** A is annotated with an element f of F. We write the annotated type as fA. Formally we replace the grammar of ECBPV computation types (and similarly, the grammar of typing contexts) with

$$\begin{aligned} \underline{C}, \underline{D} &::= \prod\_{i \in I} \underline{C}\_i \mid A \to \underline{C} \mid \underline{\langle f \rangle} A \\\ F &::= \diamond \mid \, \Gamma, x:A \mid \overline{\Gamma, \underline{x}:\langle f \rangle A} \end{aligned}$$

(The highlighted parts indicate the differences.) The grammar used for value types is unchanged, except that it uses the new syntax of computation types.

The definition of ECBPV signature is similarly extended to contain the effect algebra as well as the set of constants:

**Definition 7 (Signature).** *A* signature (F, K) *consists of a pointed preordered monoid* (F, ≤, ·, 1) *of effects and, for each value type* A*, a set* K<sup>A</sup> *of constants of type* A*, including* () ∈ K**unit***.*

We assume a fixed effect system signature for the remainder of this section.

Since types contain effects, which have a notion of subeffecting, there is a natural notion of subtyping. We define (in Fig. 7) two subtyping relations: A <:<sup>v</sup> B for value types and C <: D for computation types.

We treat the type constructor f as an operation on computation types by defining computation types fC.

$$\langle f \rangle \left( \prod\_{i \in I} \underline{C}\_i \right) \coloneqq \prod\_{i \in I} \langle f \rangle \underline{C}\_i \qquad \langle f \rangle (A \to \underline{C}) \coloneqq A \to \langle f \rangle \underline{C} \qquad \langle f \rangle (\langle f' \rangle A) \coloneqq \langle f \cdot f' \rangle A$$

This is an *action* of the preordered monoid on computation types. Its purpose is to give the typing rule for sequencing of computations. The sequencing of a computation with effect f with a computation of type C has type fC.

**Fig. 8.** Effect system modifications to ECBPV

The typing judgements have exactly the same form as before (except for the new syntax of types). The majority of the typing rules, including all of the rules for value terms, are also unchanged. The only rules we change are those for computation variables, return, to and need, which are replaced with the first four rules in Fig. 8. We also add two subtyping rules, one for values and one for computations. These are the last two rules of Fig. 8.

The equational theory does not need to be changed to use it with the new effect system (except that the types appearing in each axiom now include effect information). For each axiom of the equational theory, the two terms still have the same type in the effect system. In particular, for the axiom

$$M \text{ need } \underline{x}. \underline{x} \text{ to } y. N \equiv M \text{ to } y. N[\underline{x} \mapsto \text{return } y].$$

if Γ M : fA and Γ, x : fA, y : A N : C then the left-hand side has type fC. For the right-hand-side, we have Γ, y : A N[x -→ return y] : C, because of the assumption that the preordered monoid is pointed (which implies return y can have *any* effect by subtyping, not just the unit effect 1). Hence the right-handside also has type fC. This axiom is the reason for our pointedness requirement. In particular, if we drop need from the language, the pointedness requirement is not required. Thus the rules we give also describe a fully general effect system for CBPV in which the effect algebra can be any preordered monoid.

#### **4.3 Exploiting Effect-Dependent Equivalences**

Our primary goal in adding an effect system to ECBPV is to exploit (local, effectjustified) equivalences between evaluation orders even without a whole-language restriction on effects. We sketch how to do this for our example.

When proving the equivalence between call-by-name and call-by-need in Sect. 3 we assumed that the only constants in the language were () and ⊥<sup>A</sup> : **U** (**Fr** A). To relax this restriction, we use the effect algebra with preorder {diveff ≤ } described above, and change the type of ⊥<sup>A</sup> from **U** (**Fr** A) to **U** (diveffA). We can include other effectful constants, and give them the effect (e.g. write : **U** (V → **unit**)).

The statement of the internal (object-level) equivalence becomes:

$$\begin{aligned} \text{if } I \vdash M : \langle \text{div} \mathbf{f} \mathbf{f} \rangle A \text{ and } \varGamma, \underline{x} : \langle \text{div} \mathbf{f} \mathbf{f} \rangle A \vdash N : \underline{C} \text{ then} \\ \varGamma \vdash M \text{ } \mathsf{name} \, \underline{x}. N \cong\_{\mathsf{ctx}} M \text{ } \mathsf{need} \, \underline{x}. N : \underline{C} \end{aligned}$$

The premise restricts the effect of M to diveff so that nontermination is its only possible side-effect. To prove this equivalence, we need a logical relation for the effect system, which means we have to define a Kripke relation R<sup>f</sup> <sup>A</sup> for each effect f. For Rdiveff <sup>A</sup> we use the same definition as before (the definition of R**Fr**A). The definition of R <sup>A</sup> depends on the specific other effects included.

To state and prove a meta-level equivalence for a source language that includes other side-effects, we need to define an effect system for the source language. This would use the same effect algebra as the ECBPV effect system, and be such that the translation of source language expressions preserves effects. To do this for the source language of Sect. 3, we replace the syntax of function types with f<sup>A</sup> <sup>f</sup> −→ B, where f is the effect of the argument (required due to lazy evaluation), and f is the latent effect of the function (the effect it has after application). The translation is then

f<sup>A</sup> <sup>f</sup> −→ B := **U** (**U** (f-A) → f -B)

Just as for the object-level equivalence, the statement of the meta-level equivalence similarly requires the source-language expression to have the effect diveff. We omit the details here.

#### **5 Related Work**

*Metalanguages for Evaluation Order.* Call-by-push-value is similar to Moggi's monadic metalanguage [25], except for the distinction between computations and values. Both support several evaluation orders, but neither supports callby-need. *Polarized* type theories [34] also take the approach of stratifying types into several kinds to capture multiple evaluation orders. Downen and Ariola [10] recently described how to capture call-by-need using polarity. They take a different approach to ours, by splitting up terms according to their evaluation order, rather than whether they might have effects. This means they have three kinds of type, resulting in a more complex language than ours. They also do not apply their language to reasoning about the differences between evaluation orders, which was the primary motivation for ECBPV. It is not clear whether their language can also be used for this purpose.

Multiple evaluation orders can also be captured in a Moggi-style language by using *joinads* instead of monads [28]. It is possible that there is some joinad structure implicit in extended call-by-push-value.

*Reasoning About Call-by-Need.* The majority of work on reasoning about callby-need source languages has concentrated on operational semantics based on environments [16], graphs [30,32], and answers [2,3,9,22]. However, these do not compare call-by-need with other evaluation orders. The only type-based analysis of a lazy source language we know of apart from McDermott and Mycroft's effect system [23] is [31,33].

*Logical Relations.* Kripke logical relations have previously been applied to the problems of lambda definability [12] and normalization [1,11]. Previous proofs of contextual equivalence relate only closed terms. We were forced to relate open terms because of the need construct.

Reasoning about effects using logical relations often runs into a difficulty in ensuring the relations are closed under sequencing of computations. We are able to work around this due to our specific choice of effects. It is possible that considering other effects would require a technique such as Lindley and Stark's *leapfrog method* [18,19].

*Effect Systems.* Effect systems have a long history, starting with Gifford-style effect systems [20]. We use preordered monoids as effect algebras following Katsumata [15]. Almost all of the previous work on effect systems has concentrated on call-by-value only. Kammar and Plotkin [13,14] describe a Gifford-style callby-push-value effect system, though their formulation does not generalise to other effect algebras. Our effect system is the first general effect system for a CBPV-like language. The only previous work on call-by-need effects is [23].

There has also been much work on reasoning about program transformations using effect systems, e.g. [4–8,29]. We expect it to be possible to recast much of this in terms of extended call-by-push-value, and therefore apply these transformations for various evaluation orders.

# **6 Conclusions and Future Work**

We have described extended call-by-push-value, a calculus that can be used for reasoning about several evaluation orders. In particular, ECBPV supports callby-need via the addition of the construct M need x. N. This allows us to prove that call-by-name and call-by-need reduction are equivalent if nontermination is the only effect in the source language, both inside the language itself, and on the meta-level. We proved the latter by giving two translations of a source language into ECBPV: one that captures call-by-name reduction, and one that captures call-by-need reduction. We also defined an effect system for ECBPV. The effect system statically bounds the side-effects of terms, allowing equivalences between evaluation orders to be used without restricting the entire language to particular effects. We close with a description of possible future work.

*Other Equivalences Between Evaluation Orders.* We have proved one example of an equivalence between evaluation orders using ECBPV, but there are others that we might also expect to hold. For example, we would expect call-by-need and call-by-value to be equivalent if the effects are restricted to nondeterminism, allocating state, and reading from state (but not writing). It should be possible to use ECBPV to prove these by defining suitable logical relations. More generally, it might be possible to characterize when particular equivalences hold in terms of the algebraic properties of the effects we restrict to.

*Denotational Semantics.* Using logical relations to prove contextual equivalence between terms directly is difficult. Adequate denotational semantics would allow us to reduce proofs of contextual equivalence to proofs of equalities in the model. Composing the denotational semantics with the call-by-need translation would also result in a call-by-need denotational semantics for the source language. Some potential approaches to describing the denotational semantics of ECBPV are Maraist et al.'s [21] translation into an affine calculus, combined with a semantics of linear logic [24], and also continuation-passing-style translations [27]. None of these consider side-effects however.

**Acknowledgements.** We gratefully acknowledge the support of an EPSRC studentship, and thank the anonymous reviewers for helpful comments.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Effectful Normal Form Bisimulation**

Ugo Dal Lago1,2(B) and Francesco Gavazzo1,2(B)

<sup>1</sup> University of Bologna, Bologna, Italy <sup>2</sup> Inria Sophia Antipolis, Sophia Antipolis Cedex, France ugo.dallago@unibo.it, francesco.gavazzo@gmail.com

**Abstract.** Normal form bisimulation, also known as open bisimulation, is a coinductive technique for higher-order program equivalence in which programs are compared by looking at their essentially infinitary tree-like normal forms, i.e. at their B¨ohm or L´evy-Longo trees. The technique has been shown to be useful not only when proving metatheorems about λ-calculi and their semantics, but also when looking at concrete examples of terms. In this paper, we show that there is a way to generalise normal form bisimulation to calculi with algebraic effects, `a la Plotkin and Power. We show that some mild conditions on monads and relators, which have already been shown to guarantee effectful applicative bisimilarity to be a congruence relation, are enough to prove that the obtained notion of bisimilarity, which we call effectful normal form bisimilarity, is a congruence relation, and thus sound for contextual equivalence. Additionally, contrary to applicative bisimilarity, normal form bisimilarity allows for enhancements of the bisimulation proof method, hence proving a powerful reasoning principle for effectful programming languages.

# **1 Introduction**

The study of program equivalence has always been one of the central tasks of programming language theory: giving satisfactory definitions and methodologies for it can be fruitful in contexts like program verification and compiler optimisation design, but also helps in understanding the *nature* of the programming language at hand. This is particularly true when dealing with higher-order languages, in which giving satisfactory notions of program equivalence is well-known to be hard. Indeed, the problem has been approached in many different ways. One can define program equivalence through denotational semantics, thus relying on a model. One could also proceed following the route traced by Morris [51], and define programs to be *contextually* equivalent when they behave the same in every context, this way taking program equivalence as the *largest* adequate congruence.

Both these approaches have their drawbacks, the first one relying on the existence of a (not too coarse) denotational model, the latter quantifying over all contexts, and thus making concrete proofs of equivalence hard. Among the

Thanks to the ANR projects 14CE250005 ELICA and 16CE250011 REPAS.

many alternative techniques the research community has been proposing along the years, one can cite logical relations and applicative bisimilarity [1,4,8], both based on the idea that equivalent higher-order terms should behave the same when fed with any (pair of related) inputs. This way, terms are compared mimicking any possible action a discriminating context could possibly perform on the tested terms. In other words, the universal quantification on all possible contexts, although not *explicitly* present, is anyway *implicitly* captured by the bisimulation or logical game.

Starting from the pioneering work by B¨ohm, another way of defining program equivalence has been proved extremely useful not only when giving metatheorems about λ-calculi and programming languages, but also when proving concrete programs to be (contextually) equivalent. What we are referring to, of course, is the notion of a *B¨ohm tree* of a λ-term e (see [5] for a formal definition), which is a possibly infinite tree representing the *head normal* h form of e, if e has one, but also analyzing the arguments to the head variable of h in a coinductive way. The celebrated B¨ohm Theorem, also known as Separation Theorem [11], stipulates that two terms are contextually equivalent *if and only if* their respective (appropriately η-equated) B¨ohm trees are the same.

The notion of equivalence induced by B¨ohm trees can be characterised without any reference to trees, by means of a suitable bisimilarity relation [37,65]. Additionally, B¨ohm trees can also be defined when λ-terms are *not* evaluated to their *head* normal form, like in the classical theory of λ-calculus, but to their *weak head* normal form (like in the call-by-name [37,65]), or to their *eager* normal form (like in the call-by-value λ-calculus [38]). In both cases, the notion of program equivalence one obtains by comparing the syntactic structure of trees, admits an elegant coinductive characterisation as a suitable bisimilarity relation. The family of bisimilarity relations thus obtained goes under the generic name of *normal form bisimilarity*.

Real world functional programming languages, however, come equipped not only with higher-order functions, but also with *computational effects*, turning them into *impure* languages in which functions cannot be seen merely as turning an input to an output. This requires switching to a new model, which cannot be the usual, pure, λ-calculus. Indeed, program equivalence in effectful <sup>λ</sup>calculi [49,56] have been studied by way of denotational semantics [18,20,31], logical relations [10,14], applicative bisimilarity [13,16,36], and normal form bisimilarity [20,41]. While the denotational semantics, logical relation semantics, and applicative bisimilarity of effectful calculi have been studied in the abstract [15,25,30], the same cannot be said about normal form bisimilarity. Particularly relevant for our purposes is [15], where a notion of applicative bisimilarity for generic algebraic effects, called *effectful applicative bisimilarity*, based on the (standard) notion of a monad, and on the (less standard) notion of a *relator* [71] or *lax extension* [6,26], is introduced.

Intuitively, a relator is an abstraction axiomatising the structural properties of relation lifting operations. This way, relators allow for an abstract description of the possible ways a relation between programs can be lifted to a relation between (the results of) effectful computations, the latter being described throughout monads and algebraic operations. Several concrete notions of program equivalence, such as pure, nondeterministic and probabilistic applicative bisimilarity [1,16,36,52] can be analysed using relators. Additionally, besides their prime role in the study of effectful applicative bisimilarity, relators have also been used to study logic-based equivalences [67] and applicative distances [23] for languages with generic algebraic effects.

The main contribution of [15] consists in devising a set of axioms on monads and relators (summarised in the notions of a Σ*-continuous monad* and a Σ*continuous relator* ) which are both satisfied by many concrete examples, and that abstractly guarantee that the associated notion of applicative bisimilarity is a congruence.

In this paper, we show that an abstract notion of normal form (bi)simulation can indeed be given for calculi with algebraic effects, thus defining a theory analogous to [15]. Remarkably, we show that the defining axioms of Σ-continuous monads and Σ-continuous relators guarantee the resulting notion of normal form (bi)similarity to be a (pre)congruence relation, thus enabling compositional reasoning about program equivalence and refinement. Given that these axioms have already been shown to hold in many relevant examples of calculi with effects, our work shows that there is a way to "cook up" notions of *effectful* normal form bisimulation *without* having to reprove congruence of the obtained notion of program equivalence: this comes somehow for free. Moreover, this holds both when call-by-name and call-by-value program evaluation is considered, although in this paper we will mostly focus on the latter, since the call-by-value reduction strategy is more natural in presence of computational effects<sup>1</sup>.

Compared to (effectful) applicative bisimilarity, as well as to other standard operational techniques—such as contextual and CIU equivalence [47,51], or logical relations [55,61]—(effectful) normal form bisimilarity has the major advantage of being an *intensional* program equivalence, equating programs according to the syntactic structure of their (possibly infinitary) normal forms. As a consequence, in order to deem two programs as normal form bisimilar, it is sufficient to test them in isolation, i.e. independently of their interaction with the environment. This way, we obtain easier proofs of equivalence between (effectful) programs. Additionally, normal form bisimilarity allows for enhancements of the bisimulation proof method [60], hence qualifying as a powerful and effective tool for program equivalence.

Intensionality represents a major difference between normal form bisimilarity and applicative bisimilarity, where the environment interacts with the tested programs by passing them arbitrary input arguments (thus making applicative bisimilarity an *extensional* notion of program equivalence). Testing programs in isolation has, however, its drawbacks. In fact, although we prove effectful normal form bisimilarity to be a sound proof technique for (effectful) applicative bisim-

<sup>1</sup> Besides, as we will discuss in Sect. 6.4, the formal analysis of call-by-name normal form bisimilarity strictly follows the corresponding (more challenging) analysis of call-by-value normal form bisimilarity.

ilarity (and thus for contextual equivalence), full abstraction fails, as already observed in the case of the pure λ-calculus [3,38] (nonetheless, it is worth mentioning that full abstraction results are known to hold for calculi with a rich expressive power [65,68]).

In light of these observations, we devote some energy to studying some concrete examples which highlight the weaknesses of applicative bisimilarity, on the one hand, and the strengths of normal form bisimilarity, on the other hand.

This paper is structured as follows. In Sect. 2 we informally discuss examples of (pairs of) programs which are operational equivalent, but whose equivalence cannot be readily established using standard operational methods. Throughout this paper, we will show how effectful normal form bisimilarity allows for handy proofs of such equivalences. Section 3 is dedicated to mathematical preliminaries, with a special focus on (selected) examples of monads and algebraic operations. In Sect. <sup>4</sup> we define our vehicle calculus Λ<sup>Σ</sup>, an untyped <sup>λ</sup>-calculus enriched with algebraic operations, to which we give call-by-value monadic operational semantics. Section 5 introduces relators and their main properties. In Sect. 6 we introduce *effectful eager normal form (bi)similarity*, the call-by-value instantiation of effectful normal form (bi)similarity, and its main metatheoretical properties. In particular, we prove effectful eager normal form (bi)similarity to be a (pre)congruence relation (Theorem 2) included in effectful applicative (bi)similarity (Proposition 5). Additionally, we prove soundness of eager normal bisimulation up-to context (Theorem 3), a powerful enhancement of the bisimulation proof method that allows for handy proof of program equivalence. Finally, in Sect. 6.4 we briefly discuss how to modify our theory to deal with call-by-name calculi.

# **2 From Applicative to Normal Form Bisimilarity**

In this section, some examples of (pairs of) programs which can be shown equivalent by effectful normal form bisimilarity will be provided, giving evidence on the flexibility and strength of the proposed technique. We will focus on examples drawn from fixed point theory, simply because these, being infinitary in nature, are quite hard to be dealt with "finitary" techniques like contextual equivalence or applicative bisimilarity.

*Example 1.* Our first example comes from the ordinary theory of pure, untyped λ-calculus. Let us consider Curry's and Turing's call-by-value fixed point combinators Y and Z:

$$Y \triangleq \lambda y. \Delta \Delta, \quad Z \triangleq \Theta \Theta, \quad \Delta \triangleq \lambda x. y(\lambda z. x xz), \quad \Theta \triangleq \lambda x. \lambda y. y(\lambda z. xxyz).$$

It is well known that Y and Z are contextually equivalent, although proving such an equivalence from first principles is doomed to be hard. For that reason, one usually looks at proof techniques for contextual equivalence. Here we consider applicative bisimilarity [1]. As in the pure λ-calculus applicative bisimilarity coincides with the intersection of applicative similarity and its converse, for the sake of the argument we discuss which difficulties one faces when trying to prove Z to be applicatively similar to Y .

Let us try to construct an applicative simulation <sup>R</sup> relating Y and Z. Clearly we need to have (Y ,Z) ∈ R. Since Y evaluates to λy.ΔΔ, and Z evaluates to λy.y(λz.ΘΘyz), in order for <sup>R</sup> to be an applicative simulation, we need to show that for any value v, (Δ[v/y]Δ[v/y], v(λz.ΘΘvz)) ∈ R. Since the result of the evaluation of Δ[v/y]Δ[v/y] is the same of v(λz.Δ[v/y]Δ[v/y]z), we have reached a point in which we are stuck: in order to ensure (Y ,Z) ∈ R, we need to show that (v(λz.Δ[v/y]Δ[v/y]z), v(λz.ΘΘvz)) ∈ R. However, the value v being provided by the environment, no information on it is available. That is, we have no information on how v tests its input program. In particular, given any context <sup>C</sup>[−], we can consider the value λx.C[x], meaning that proving Y and Z to be applicatively bisimilar is almost as hard as proving them to be contextually equivalent from first principles.

As we will see, proving Z to be normal form similar to Y is straightforward, since in order to test λy.ΔΔ and λy.y(λz.ΘΘyz), we simply test their subterms ΔΔ and y(λz.ΘΘyz), thus not allowing the environment to influence computations.

*Example 2.* Our next example is a refinement of Example 1 to a probabilistic setting, as proposed in [66] (but in a call-by-name setting). We consider a variation of Turing's call-by-value fixed point combinator which, at any iteration, can probabilistically decide whether to start another iteration (following the pattern of the standard Turing's fixed point combinator) or to turn for good into Y , where Y and Δ are defined as in Example 1:

$$Z \triangleq \Theta \Theta,\qquad \qquad \Theta \triangleq \lambda x.\lambda y.(y(\lambda z.\Delta \Delta z) \text{ or } y(\lambda z.xxyz)).$$

Notice that the constructor **or** behaves as a (fair) probabilistic choice operator, hence acting as an effect producer. It is natural to ask whether these new versions of Y and Z are still equivalent. However, following insights from previous example, it is not hard to see the equivalence between Y and Z cannot be readily proved by means of standard operational methods such as probabilistic contextual equivalence [16], probabilistic CIU equivalence and logical relations [10], and probabilistic applicative bisimilarity [13,16]. All the aforementioned techniques require to test programs in a given environment (such as a whole context or an input argument), and are thus ineffective in handling fixed point combinators such as Y and Z.

We will give an elementary proof of the equivalence between Y and Z in Example 17, and a more elegant proof relying on a suitable *up-to context* technique in Example 18. In [66], the call-by-name counterparts of Y and Z are proved to be equivalent using probabilistic environmental bisimilarity. The notion of an environmental bisimulation [63] involves both an environment storing pairs of terms played during the bisimulation game, and a clause universally quantifying over pairs of terms in the evaluation context closure of such an environment2, thus making environmental bisimilarity a rather heavy technique to use. Our proof of the equivalence of Y and Z is simpler: in fact, our notion of effectful normal form bisimulation does not involve any universal quantification over all possible closed function arguments (like applicative bisimilarity), or their evaluation context closure (like environmental bisimilarity), or closed instantiation of uses (like CIU equivalence).

*Example 3.* Our third example concerns call-by-name calculi and shows how our notion of normal form bisimilarity can handle even intricate recursion schemes. We consider the following argument-switching probabilistic fixed point combinators:

$$\begin{aligned} P & \triangleq AA, & A & \triangleq \lambda x. \lambda y. \lambda z. (y(xxyz) \text{ or } z(xxzy)),\\ Q & \triangleq BB, & B & \triangleq \lambda x. \lambda y. \lambda z. (y(xxzy) \text{ or } z(xxyz)). \end{aligned}$$

We easily see that P and Q satisfy the following (informal) program equations:

$$\text{(\*)}\;\_{Pef} = e(Pef)\;\text{or}\;\,f(Pfe), \qquad \qquad Qef = e(Qfe)\;\text{or}\;\,f(Qef).$$

Again, proving the equivalence between P and Q using applicative bisimilarity is problematic. In fact, testing the applicative behaviour of P and Q requires to reason about the behaviour of e.g. e(P ef), which in turn requires to reason about the (arbitrary) term e, on which no information is provided. The (essentially infinitary) normal forms of P and Q, however, can be proved to be essentially the same by reasoning about the syntactical structure of P and Q. Moreover, our *up-to context* technique enables an elegant and concise proof of the equivalence between P and Q (Sect. 6.4).

*Example 4.* Our last example discusses the use of the cost monad as an *instrument* to facilitate a more intensional analysis of programs. In fact, we can use the ticking operation **tick** to perform cost analysis. For instance, we can consider the following variation of Curry's and Turing's fixed point combinator of Example 1, obtained by adding the operation symbol **tick** after every λ-abstraction.

$$\begin{aligned} Y & \triangleq \lambda y. \mathbf{tick}(\Delta \Delta), & \qquad \Delta \triangleq \lambda x. \mathbf{tick}(y(\lambda z. \mathbf{tick}(xxz))),\\ Z & \triangleq \Theta \Theta, & \qquad \Theta \triangleq \lambda x. \mathbf{tick}(\lambda y. \mathbf{tick}(y(\lambda z. \mathbf{tick}(xxyz)))). \end{aligned}$$

Every time a β-redex (λx.**tick**(e))v is reduced, the ticking operation **tick** increases an imaginary cost counter of a unit. Using ticking, we can provide a more intensional analysis of the relationship between Y and Z, along the lines of Sands' improvement theory [62].

<sup>2</sup> Meaning that two terms e1, e<sup>2</sup> are tested for their applicative behaviour against all terms of the form E[e], E[e- ], for any pair of terms (e, e- ) stored in the environment.

### **3 Preliminaries: Monads and Algebraic Operations**

In this section we recall some basic definitions and results needed in the rest of the paper. Unfortunately, there is no hope to be comprehensive, and thus we assume the reader to be familiar with basic domain theory [2] (in particular with the notions of ω-complete (pointed) partial order—ω-cppo, for short—monotone, and continuous functions), basic order theory [19], and basic category theory [46]. Additionally, we assume the reader to be acquainted with the notion of a Kleisli triple [46] **<sup>T</sup>** <sup>=</sup> T, η, <sup>−</sup>†. As it is customary, we use the notation <sup>f</sup> † : T X <sup>→</sup> T Y for the Kleisli extension of f : X <sup>→</sup> T Y , and reserve the letter η to denote the unit of **T**. Due to their equivalence, oftentimes we refer to Kleisli triples as monads.

Concerning notation, we try to follow [46] and [2], with the only exception that we use the notation (x<sup>n</sup>)<sup>n</sup> to denote an <sup>ω</sup>-chain <sup>x</sup><sup>0</sup> ··· <sup>x</sup><sup>n</sup> ··· in a domain (X, , <sup>⊥</sup>). The notation **<sup>T</sup>** <sup>=</sup> T, η, <sup>−</sup>† for an arbitrary Kleisli triple is standard, but it is not very handy when dealing with multiple monads at the same time. To fix this issue, we sometimes use the notation **<sup>T</sup>** <sup>=</sup> T, <sup>t</sup>, <sup>−</sup>**<sup>T</sup>** to denote a Kleisli triple. Additionally, when unambiguous we omit subscripts. Finally, we denote by Set the category of sets and functions, and by Rel the category of sets and relations. We reserve the symbol 1 to denote the identity function. Unless explicitly stated, we assume functors (and monads) to be functors (and monads) on Set. As a consequence, we write *functors* to refer to endofunctors on Set.

We use monads to give operational semantics to our calculi. Following Moggi [49,50], we model notions of computation as monads, meaning that we use monads as mathematical models of the kind of (side) effects computations may produce. The following are examples of monads modelling relevant notions of computation. Due to space constraints, we omit several interesting examples such as the output, the exception, and the nondeterministic/powerset monad, for which the reader is referred to e.g. [50,73].

*Example 5 (Partiality).* Partial computations are modelled by the partiality (also called maybe) monad **<sup>M</sup>** <sup>=</sup> M, <sup>m</sup>, <sup>−</sup>**<sup>M</sup>**. The carrier MX of **<sup>M</sup>** is defined as {just x <sup>|</sup> x <sup>∈</sup> X} ∪ {⊥}, where <sup>⊥</sup> is a special symbol denoting divergence. The unit and Kleisli extension of **M** are defined as follows:

$$\mathsf{M}(x) \triangleq just \ x, \qquad \qquad f^{\mathsf{M}}(just \ x) \triangleq f(x), \qquad \qquad f^{\mathsf{M}}(\bot) \triangleq \bot.$$

*Example 6 (Probabilistic Nondeterminism).* In this example we assume sets to be countable<sup>3</sup>. The (discrete) distribution monad **<sup>D</sup>** <sup>=</sup> D, <sup>d</sup>, <sup>−</sup>**<sup>D</sup>** has carrier **<sup>D</sup>**X - {μ : X <sup>→</sup> [0, 1] <sup>|</sup> - <sup>x</sup> <sup>μ</sup>(x)=1}, whereas the maps <sup>d</sup> and <sup>−</sup>**<sup>D</sup>** are defined as follows (where y <sup>=</sup> x):

$$\mathsf{D}(x)(x) \stackrel{\Delta}{=} 1,\qquad \mathsf{D}(x)(y) \stackrel{\Delta}{=} 0,\qquad f^{\mathbb{D}}(\mu)(y) \stackrel{\Delta}{=} \sum\_{x \in X} \mu(x) \cdot f(x)(y).$$

<sup>3</sup> Although this is not strictly necessary, for simplicity we work with distributions over countable sets only, as the sets of values and normal forms are countable.

Oftentimes, we write a distribution μ as a weighted formal sum. That is, we write μ as the sum<sup>4</sup> - <sup>i</sup>∈<sup>I</sup> <sup>p</sup><sup>i</sup> ·x<sup>i</sup> such that <sup>μ</sup>(x) = - <sup>x</sup>i=<sup>x</sup> <sup>p</sup>i. **<sup>D</sup>** models probabilistic total computations, according to the rationale that a (total) probabilistic program evaluates to a distribution over values, the latter describing the possible results of the evaluation. Finally, we model probabilistic partial computations using the monad **DM** <sup>=</sup> DM, dm, <sup>−</sup>**DM**. The carrier of **DM** is defined as DMX - D(MX), whereas the unit dm is defined in the obvious way. For f : X <sup>→</sup> DMY , define:

$$f^{\boxplus \star}(\mu)(y) \triangleq \sum\_{x \in X} \mu(just\ x) \cdot f(x)(y) + \mu(\bot) \cdot \mathbf{D}(\bot)(y).$$

It is easy to see that **DM** is isomorphic to the subdistribution monad.

*Example 7 (Cost).* The cost (also known as ticking or improvement [62]) monad **<sup>C</sup>** <sup>=</sup> C, <sup>c</sup>, <sup>−</sup>**<sup>C</sup>** has carrier CX - M(**<sup>N</sup>** <sup>×</sup> X). The unit of **<sup>C</sup>** is defined as <sup>c</sup>(x) - just (0, x), whereas Kleisli extension is defined as follows:

$$f^c(\mathbb{X}) \stackrel{\triangle}{=} \begin{cases} \bot & \text{if } \mathbb{x} = \bot, \text{ or } \mathbb{x} = just \ (n, x) \text{ and } f(x) = \bot \\ just \ (n + m, y) & \text{if } \mathbb{x} = just \ (n, x) \text{ and } f(x) = just \ (m, y) \text{.} \end{cases}$$

The cost monad is used to model the cost of (partial) computations. An element of the form just (n, x) models the result of a computation outputting the value x with cost n (the latter being an abstract notion that can be instantiated to e.g. the number of reduction steps performed). Partiality is modelled as the element ⊥, according to the rationale that we can assume all divergent computations to have the same cost, so that such information need not be explicitly written (for instance, measuring the number of reduction steps performed, we would have that divergent computations all have cost ∞).

*Example 8 (Global states).* Let L be a set of public location names. We assume the content of locations to be encoded as families of values (such as numerals or booleans) and denote the collection of such values as V. A store (or state) is a function σ : L→V. We write S for the set of stores <sup>V</sup><sup>L</sup>. The global state monad **<sup>G</sup>** <sup>=</sup> G, <sup>g</sup>, <sup>−</sup>**<sup>G</sup>** has carrier GX -(X <sup>×</sup> S)<sup>S</sup>, whereas <sup>g</sup> and <sup>−</sup>**<sup>G</sup>** are defined by:

$$\mathsf{G}(x)(\sigma) \stackrel{\Delta}{=} (x,\sigma), \qquad \qquad \qquad f^{\odot}(\alpha)(\sigma) \stackrel{\Delta}{=} f(x')(\sigma'),$$

where α(σ)=(x , σ ). It is straightforward to see that we can combine the global state monad with the partiality monad, obtaining the monad **M** ⊗ **G** whose carrier is (M <sup>⊗</sup> G)X - M(X <sup>×</sup> S)<sup>S</sup>. In a similar fashion, we see that we can combine the global state monad with **DM** and **C**, as we are going to see in Remark 1.

*Remark 1.* The monads **DM** and **M** ⊗ **G** of Example 6 and Example 8, respectively, are instances of two general constructions, namely the *sum* and *tensor* of effects [28]. Although these operations are defined on Lawvere theories [29,40], here we can rephrase them in terms of monads as follows.

<sup>4</sup> For simplicity, we write only those p*i*s such that p*<sup>i</sup>* > 0.

**Proposition 1.** *Given a monad* **<sup>T</sup>** <sup>=</sup> T, <sup>t</sup>, <sup>−</sup>**<sup>T</sup>***, define the* sum **TM** *of* **<sup>T</sup>** *and* **<sup>M</sup>** *and the* tensor **<sup>T</sup>** <sup>⊗</sup> **<sup>G</sup>** *of* **<sup>T</sup>** *and* **<sup>G</sup>***, as the triples* TM, tm, <sup>−</sup>**TM** *and* <sup>T</sup> <sup>⊗</sup> <sup>G</sup>, <sup>t</sup> <sup>⊗</sup> <sup>g</sup>, <sup>−</sup>**<sup>T</sup>**⊗**<sup>G</sup>***, respectively. The carriers of the triples are defined as* TMX - T(MX) *and* (T <sup>⊗</sup> G)X - T(S <sup>×</sup> X)S*, whereas the maps* tm *and* <sup>t</sup> <sup>⊗</sup> <sup>g</sup> *are defined as* tm<sup>X</sup> <sup>t</sup>MX ◦ <sup>m</sup><sup>X</sup> *and* (<sup>t</sup> <sup>⊗</sup> <sup>g</sup>)<sup>X</sup> curry <sup>t</sup>S×X*, respectively. Finally, define:*

$$f^{\mathsf{T}\mathsf{H}} \stackrel{\Delta}{=} (f\_M)^{\mathsf{T}}, \qquad \qquad f^{\mathsf{T}\otimes\copyright } (\alpha)(\sigma) \stackrel{\Delta}{=} (\mathsf{uncur} \,\mathsf{y} \,\, f)^{\mathsf{T}}(\alpha)(\sigma),$$

*where, for a function* <sup>f</sup> : <sup>X</sup> <sup>→</sup> TMY *we define* <sup>f</sup><sup>M</sup> : MX <sup>→</sup> TMY *as* <sup>f</sup>M(⊥) - <sup>t</sup>MX(⊥)*,* f<sup>M</sup>(just x) f(x)*, and* curry *and* uncurry *are defined as usual. Then* **TM** *and* **T** ⊗ **G** *are monads.*

Proving Proposition 1 is a straightforward exercise (the reader can also consult [28]). We notice that tensoring **G** with **DM** we obtain a monad for probabilistic imperative computations, whereas tensoring **G** with **C** we obtain a monad for imperative computations with cost.

#### **3.1 Algebraic Operations**

Monads provide an elegant way to structure effectful computations. However, they do not offer any actual effect constructor. Following Plotkin and Power [56–58], we use *algebraic operations* as effect producers. From an operational perspective, algebraic operations are those operations whose behaviour is independent of their continuations or, equivalently, of the environment in which they are evaluated. Intuitively, that means that e.g. <sup>E</sup>[e<sup>1</sup> **or** <sup>e</sup><sup>2</sup>] is operationally equivalent to E[e<sup>1</sup>] **or** <sup>E</sup>[e<sup>2</sup>], for any evaluation context <sup>E</sup>. Examples of algebraic operations are given by (binary) nondeterministic and probabilistic choices as well as primitives for rising exceptions and output operations.

Syntactically, algebraic operations are given via a signature Σ consisting of a set of operation symbols (uninterpreted operations) together with their arity (i.e. their number of operands). Semantically, operation symbols are interpreted as algebraic operations on monads. To any n-ary operation symbol<sup>5</sup> (**op** : <sup>n</sup>) <sup>∈</sup> <sup>Σ</sup> and any set <sup>X</sup> we associate a map [[**op**]]<sup>X</sup> : (T X)<sup>n</sup> <sup>→</sup> T X (so that we equip T X with a Σ-algebra structure [12]) such that f † is <sup>Σ</sup>-algebra morphism, meaning that for any <sup>f</sup> : <sup>X</sup> <sup>→</sup> T Y , and elements *<sup>x</sup>*1, ... , *<sup>x</sup>*<sup>n</sup> <sup>∈</sup> T X we have [[**op**]]<sup>Y</sup> (<sup>f</sup> †(*x*1), ... , <sup>f</sup> †(*xn* )) = <sup>f</sup> †([[**op**]]X(*x*1, ... , *xn* )).

*Example 9.* The partiality monad **M** usually comes with no operation, as the possibility of divergence is an implicit feature of any Turing complete language. However, it is sometimes useful to add an explicit divergence operation (for instance, in strongly normalising calculi). For that, we consider the signature <sup>Σ</sup>**<sup>M</sup>** - {Ω : 0}. Having arity zero, the operation Ω acts as a constant, and has semantics [[Ω]] = <sup>⊥</sup>. Since f **<sup>M</sup>**(⊥) = <sup>⊥</sup>, we see that Ω in indeed an algebraic operation on **M**.

<sup>5</sup> Here **op** denotes the operation symbol, whereas <sup>n</sup> <sup>≥</sup> 0 denotes its arity.

For the distribution monad **<sup>D</sup>** we define the signature <sup>Σ</sup>**<sup>D</sup>** - {**or** : 2}. The intended semantics of a program <sup>e</sup><sup>1</sup> **or** <sup>e</sup><sup>2</sup> is to evaluate to <sup>e</sup><sup>i</sup> (<sup>i</sup> ∈ {1, 2}) with probability 0.5. The interpretation of **or** is defined by [[**or**]](μ, ν)(x) - 0.5·μ(x)+ 0.5 · ν(x). It is easy to see that **or** is an algebraic operation on **<sup>D</sup>**, and that it trivially extends to **DM**.

Finally, for the cost monad **<sup>C</sup>** we define the signature <sup>Σ</sup>**<sup>C</sup>** - {**tick** : 1}. The intended semantics of **tick** is to add a unit to the cost counter:

$$\begin{array}{ll} \mathsf{[tick]}(\bot) \triangleq \bot, & \quad \mathsf{[tick]}(just \ (n, x)) \triangleq just \ (n+1, x). \end{array}$$

The framework we have just described works fine for modelling operations with finite arity, but does not allow to handle operations with infinitary arity. This is witnessed, for instance, by imperative calculi with global stores, where it is natural to have operations of the form **get**(x.k) with the following intended semantics: **get**(x.k) reads the content of the location , say it is a value <sup>v</sup>, and continue as k[v/x]. In order to take such operations into account, we follow [58] and work with generalised operations.

<sup>A</sup> *generalised operation* (operation, for short) on a set X is a function ω : P <sup>×</sup> X<sup>I</sup> <sup>→</sup> <sup>X</sup>. The set <sup>P</sup> is called the *parameter set* of the operation, whereas the (index) set I is called the *arity* of the operation. A generalised operation ω : P <sup>×</sup> X<sup>I</sup> <sup>→</sup> <sup>X</sup> thus takes as arguments a parameter <sup>p</sup> (such as a location name) and a map κ : I <sup>→</sup> X giving for each index i <sup>∈</sup> I the argument κ(i) to pass to ω. Syntactically, generalised operations are given via a signature Σ consisting of a set of elements of the form **op** : P I (the latter being nothing but a notation denoting that the operation symbols **op** has parameter set P and index set I). Semantically, an interpretation of an operation symbol **op** : P I on a monad **<sup>T</sup>** associates to any set <sup>X</sup> a map [[**op**]]<sup>X</sup> : <sup>P</sup> <sup>×</sup> (T X)<sup>I</sup> <sup>→</sup> T X such that for any f : X <sup>→</sup> T Y , p <sup>∈</sup> P, and κ : I <sup>→</sup> T X:

$$f^\dagger([\mathbf{op}]\_X(p,\kappa)) = [\mathbf{op}]\_Y(p, f^\dagger \circ \kappa).$$

If **<sup>T</sup>** comes with an interpretation for operation symbols in Σ, we say that **<sup>T</sup>** is Σ*-algebraic*.

It is easy to see by taking the one-element set 1 = {∗} as parameter set and a finite set as arity set, generalised operations subsume finitary operations. For simplicity, we use the notation **op** : n in place of **op** : 1 n, and write **op**(*x*1, ... , *<sup>x</sup>*n) in place of **op**(∗, n → *<sup>x</sup>*n).

*Example 10.* For the global state monad we consider the signature <sup>Σ</sup>**<sup>G</sup>** - {**set** : <sup>V</sup> 1, **get** : 1 V | ∈ L}. From a computational perspective, such operations are used to build programs of the form **set**(v, e) and **get**(x.e). The former stores the value v in the location and continues as e, whereas the latter reads the content of the location , say it is v, and continue as e[v/x]. Here e is used as the description of a function <sup>κ</sup><sup>e</sup> from values to terms defined by <sup>κ</sup><sup>e</sup>(v) e[v/x]. The interpretation of the new operations on **G** is standard:

$$\begin{aligned} \[\mathbf{set}\_{\ell}\](v,\alpha)(\sigma) = \alpha(\sigma[\ell := v]), \qquad & \qquad [\mathbf{get}\_{\ell}](\kappa)(\sigma) = \kappa(\sigma(\ell))(\sigma). \end{aligned}$$

Straightforward calculations show that indeed **set** and **get** are algebraic operations on **G**. Moreover, such operations can be easily extended to the partial global state monad **M** ⊗ **G** as well as to the probabilistic (partial) global store monad **DM** ⊗ **G**. These extensions share a common pattern, which is nothing but an instance of the tensor of effects. In fact, given a Σ**T**-algebraic monad **<sup>T</sup>** we can define the signature <sup>Σ</sup>**T**⊗**<sup>G</sup>** as <sup>Σ</sup>**<sup>T</sup>** <sup>∪</sup> <sup>Σ</sup>**G**, and observe that the **<sup>T</sup>** <sup>⊗</sup> **<sup>G</sup>** is Σ**T**⊗**G**-algebraic. We refer the reader to [28] for details. Here we simply notice that we can define the interpretation [[**op**]]**<sup>T</sup>**⊗**<sup>G</sup>** of **op** : P <sup>V</sup> on **<sup>T</sup>** <sup>⊗</sup> **<sup>G</sup>** as [[**op**]]**<sup>T</sup>**⊗**<sup>G</sup>** <sup>X</sup> (p, <sup>κ</sup>)(σ) - [[**op**]]**<sup>T</sup>** <sup>S</sup>×<sup>X</sup>(p, v → κ(v)(σ)), where [[**op**]]**<sup>T</sup>** is the interpretation of **op** on **T** (the interpretations of **set** and **get** are straightforward).

Monads and algebraic operations provide mathematical abstractions to structure and produce effectful computations. However, in order to give operational semantics to, e.g., probabilistic calculi [17] we need monads to account for infinitary computational behaviours. We thus look at Σ*-continuous monads*.

**Definition 1.** *<sup>A</sup>* <sup>Σ</sup>*-algebraic monad* **<sup>T</sup>** <sup>=</sup> T, <sup>η</sup>, <sup>−</sup>† *is* Σ*-*continuous *(cf. [24]) if to any set* <sup>X</sup> *is associated an order* <sup>X</sup> *and an element* <sup>⊥</sup><sup>X</sup> <sup>∈</sup> T X *such that* T X, X, <sup>⊥</sup>X *is an* <sup>ω</sup>*-cppo, and for all* (**op** : <sup>P</sup> <sup>I</sup>) <sup>∈</sup> <sup>Σ</sup>*,* <sup>f</sup>, <sup>f</sup><sup>n</sup>, <sup>g</sup> : <sup>X</sup> <sup>→</sup> T Y *,* κ, κ<sup>n</sup>, <sup>ν</sup> : <sup>I</sup> <sup>→</sup> T X*, <sup>x</sup>* , *<sup>x</sup>*n, *<sup>y</sup>* <sup>∈</sup> T X*, we have* <sup>f</sup> †(⊥) = <sup>⊥</sup> *and:*

$$\begin{array}{lll} \kappa \sqsubseteq \nu \implies [\mathsf{op}](p,\kappa) \sqsubseteq [\mathsf{op}](p,\nu) & [\mathsf{op}](p,\bigcup\_{n} \kappa\_{n}) = \bigcup\_{n} [\mathsf{op}](p,\kappa\_{n}) \\\ f \sqsubseteq g \implies f^{\dagger} \sqsubseteq g^{\dagger} & (\bigcup\_{n} f\_{n})^{\dagger} = \bigcup\_{n} f\_{n}^{\dagger} \\\ \chi \sqsubseteq y \implies f^{\dagger}(\chi) \sqsubseteq f^{\dagger}(y) & \qquad \qquad f^{\dagger}(\bigcup\_{n} \chi\_{n}) = \bigcup\_{n} f^{\dagger}(\chi\_{n}). \end{array}$$

When clear from the context, we will omit subscripts in ⊥<sup>X</sup> and X.

*Example 11.* The monads **<sup>M</sup>**, **DM**, **GM**, and **<sup>C</sup>** are Σ-continuous. The order on MX and **<sup>C</sup>** is the flat ordering defined by *<sup>x</sup> <sup>y</sup>* ⇐⇒ *x* = ⊥ or *x* = *y*, whereas the order on DMX is defined by μ ν ⇐⇒ <sup>∀</sup>x <sup>∈</sup> X. μ(just x) <sup>≤</sup> ν(just x). Finally, the order on GMX is defined pointwise from the flat ordering on M(X <sup>×</sup> S).

Having introduced the notion of a Σ-continuous monad, we can now define our vehicle calculus <sup>Λ</sup><sup>Σ</sup> and its monadic operational semantics.

# **4 A Computational Call-by-value Calculus with Algebraic Operations**

In this section we define the calculus <sup>Λ</sup><sup>Σ</sup>. <sup>Λ</sup><sup>Σ</sup> is an untyped <sup>λ</sup>-calculus parametrised by a signature of operation symbols, and corresponds to the coarsegrain [44] version of the calculus studied in [15]. Formally, terms of <sup>Λ</sup><sup>Σ</sup> are defined by the following grammar, where x ranges over a countably infinite set of variables and **op** is a generalised operation symbol in Σ.

$$e \implies = x \quad \mid \quad \lambda x.e \quad \mid \quad ee \quad \mid \quad \mathbf{op}(p, x.e).$$

A value is either a variable or a λ-abstraction. We denote by Λ the collection of terms and by <sup>V</sup> the collection of values of ΛΣ. For an operation symbol **op** : P I, we assume that set I to be encoded by some subset of <sup>V</sup> (using e.g. Church's encoding). In particular, in a term of the form **op**(p, x.e), e acts as a function in the variable x that takes as input a value. Notice also how parameters p <sup>∈</sup> P are part of the syntax. For simplicity, we ignore the specific subset of values used to encode elements of I, and simply write **op** : P <sup>V</sup> for operation symbols in Σ.

We adopt standard syntactical conventions as in [5] (notably the so-called variable convention). The notion of a free (resp. bound) variable is defined as usual (notice that the variable x is bound in **op**(p, x.e)). As it is customary, we identify terms up to renaming of bound variables and say that a term is closed if it has no free variables (and that it is open, otherwise). Finally, we write f[e/x] for the capture-free substitution of the term e for all free occurrences of x in f. In particular, **op**(p, x .f)[e/x] is defined as **op**(p, x .f[e/x]).

Before giving <sup>Λ</sup><sup>Σ</sup> call-by-value operational semantics, it is useful to remark a couple of points. First of all, testing terms according to their (possibly infinitary) normal forms obviously requires to work with open terms. Indeed, in order to inspect the *intensional* behaviour of a value λx.e, one has to inspect the intensional behaviour of e, which is an open term. As a consequence, contrary to the usual practice, we give operational semantics to both *open* and *closed* terms. Actually, the very distinction between open and closed terms is not that meaningful in this context, and thus we simply speak of terms. Second, we notice that *values* constitute a syntactic category defined independently of the operational semantics of the calculus: values are just variables and λ-abstractions. However, giving operational semantics to arbitrary terms we are interested in richer collections of irreducible expressions, i.e. expressions that cannot be simplified any further. Such collections will be different accordingly to the operational semantics adopted. For instance, in a call-by-name setting it is natural to regard the term x((λx.x)v) as a terminal expression (being it a head normal form), whereas in a call-by-value setting x((λx.x)v) can be further simplified to xv, which in turn should be regarded as a terminal expression.

We now give <sup>Λ</sup><sup>Σ</sup> a monadic *call-by-value* operational semantics [15], postponing the definition of monadic *call-by-name* operational semantics to Sect. 6.4. Recall that a (call-by-value) evaluation context [22] is a term with a single hole [−] defined by the following grammar, where e <sup>∈</sup> Λ and v ∈ V:

$$E \implies [-] \quad \left| \quad Ee \quad \right| \quad vE.$$

We write E[e] for the term obtained by substituting the term e for the hole [−] in E.

Following [38], we define a *stuck term* as a term of the form E[xv]. Intuitively, a stuck term is an expression whose evaluation is stuck. For instance, the term e y(λx.x) is stuck. Obviously, e is not a value, but at the same time it cannot be simplified any further, as y is a variable, and not a λ-abstraction. Following this intuition, we define the collection E of *eager normal forms* (enfs hereafter) as the collection of values and stuck terms. We let letters s,t, ... range over elements in E.

**Lemma 1.** *Any term* e *is either a value* v*, or can be uniquely decomposed as either* E[vw] *or* E[**op**(p, x.f)]*.*

Operational semantics of <sup>Λ</sup><sup>Σ</sup> is defined with respect to a <sup>Σ</sup>-continuous monad **<sup>T</sup>** <sup>=</sup> T, η, <sup>−</sup>† relying on Lemma 1. More precisely, we define a *call-by-value* evaluation function [[−]] mapping each term to an element in TE. For instance, evaluating a probabilistic term e we obtain a distribution over eager normal forms (plus bottom), the latter being either values (meaning that the evaluation of e terminates) or stuck terms (meaning that the evaluation of e went stuck at some point).

**Definition 2.** *Define the* **<sup>N</sup>***-indexed family of maps* [[−]]<sup>n</sup> : <sup>Λ</sup> <sup>→</sup> <sup>T</sup><sup>E</sup> *as follows:*

$$\begin{aligned} [e]\_0 &\stackrel{\triangle}{=} \bot, \\ [v]\_{n+1} &\stackrel{\triangle}{=} \eta(v), \\ [E[xv]]\_{n+1} &\stackrel{\triangle}{=} \eta(E[xv]), \\ [E[(\lambda x.e)v]]\_{n+1} &\stackrel{\triangle}{=} [E[e[v/x]]]\_{n}, \\ [E[\mathbf{op}(p,x.e)]]\_{n+1} &\stackrel{\triangle}{=} [\mathbf{op}]\_{\mathcal{E}}(p,v \mapsto [E[e[v/x]]]\_{n}). \end{aligned}$$

The monad **<sup>T</sup>** being <sup>Σ</sup>-continuous, we see that the sequence ([[e]]n)<sup>n</sup> forms an ω-chain in TE, so that we can define [[e]] as <sup>n</sup>[[e]]n. Moreover, exploiting Σ-continuity of **<sup>T</sup>** we see that [[−]] is continuous.

We compare the behaviour of terms of <sup>Λ</sup><sup>Σ</sup> relying on the notion of an *effectful eager normal form (bi)simulation*, the extension of eager normal form (bi)simulation [38] to calculi with algebraic effects. In order to account for effectful behaviours, we follow [15] and parametrise our notions of equivalence and refinement by *relators* [6,71].

#### **5 Relators**

The notion of a *relator* for a functor T (on Set) [71] (also called *lax extension* of T [6]) is a construction lifting a relation <sup>R</sup> between two sets X and Y to a relation Γ<sup>R</sup> between T X and T Y . Besides their applications in categorical topology [6] and coalgebra [71], relators have been recently used to study notions of applicative bisimulation [15], logic-based equivalence [67], and bisimulation-based distances [23] for λ-calculi extended with algebraic effects. Moreover, several forms of monadic lifting [25,32] resembling relators have been used to study abstract notions of logical relations [55,61].

Before defining relators formally, it is useful to recall some background notions on (binary) relations. The reader is referred to [26] for further details. We denote by Rel the category of sets and relations, and use the notation <sup>R</sup> : X <sup>→</sup><sup>+</sup> <sup>Y</sup> for a relation <sup>R</sup> between sets X and Y . Given relations <sup>R</sup> : X <sup>→</sup><sup>+</sup> <sup>Y</sup> and <sup>S</sup> : <sup>Y</sup> <sup>→</sup><sup>+</sup> <sup>Z</sup>, we write S◦R : <sup>X</sup> <sup>→</sup><sup>+</sup> <sup>Z</sup> for their composition, and <sup>I</sup><sup>X</sup> : <sup>X</sup> <sup>→</sup><sup>+</sup> <sup>X</sup> for the identity relation on X. Finally, we recall that for all sets X, Y , the hom-set Rel(X, Y ) has a complete lattice structure, meaning that we can define relations both inductively and coinductively.

Given a relation <sup>R</sup> : X <sup>→</sup><sup>+</sup> <sup>Y</sup> , we denote by <sup>R</sup>◦ : <sup>Y</sup> <sup>→</sup><sup>+</sup> <sup>X</sup> its dual (or opposite) relations and by −◦ : Set → Rel the graph functor mapping each function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> to its graph <sup>f</sup>◦ : <sup>X</sup> <sup>→</sup><sup>+</sup> <sup>Y</sup> . The functor <sup>−</sup>◦ being faithful, we will often write <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> in place of <sup>f</sup>◦ : <sup>X</sup> <sup>→</sup><sup>+</sup> <sup>Y</sup> . It is useful to keep in mind the pointwise reading of relations of the form g◦ ◦S◦f, for a relation <sup>S</sup> : <sup>Z</sup> <sup>→</sup><sup>+</sup> <sup>W</sup> and functions f : X <sup>→</sup> Z, g : Y <sup>→</sup> W:

$$(g^\diamond \circ \mathcal{S} \circ f)(x, y) = \mathcal{S}(f(x), g(y)).$$

Given <sup>R</sup> : <sup>X</sup> <sup>→</sup><sup>+</sup> Y , we can thus express a generalised monotonicity condition in a pointfree fashion using the inclusion R ⊆ g◦◦S◦f. Finally, since we are interested in preorder and equivalence relations, we recall that a relation <sup>R</sup> : X <sup>→</sup><sup>+</sup> <sup>X</sup> is reflexive if I<sup>X</sup> ⊆ R, transitive if R◦R⊆R, and symmetric if R⊆R◦. We can now define relators formally.

**Definition 3.** *<sup>A</sup>* relator *for a functor* T *(on* Set*) is a set-indexed family of maps* (<sup>R</sup> : <sup>X</sup> <sup>→</sup><sup>+</sup> Y ) → (Γ<sup>R</sup> : T X <sup>→</sup><sup>+</sup> T Y ) *satisfying conditions* (rel 1)*–*(rel 4)*. We say that* Γ *is* conversive *if it additionally satisfies condition* (rel 5)*.*

$$\mathfrak{l}\_{TX} \subseteq \Gamma(\mathfrak{l}\_X), \tag{\text{rel } 1)$$

$$\mathfrak{r}\mathfrak{m} \subset \mathfrak{r}(\mathfrak{d}\_{-}, \mathfrak{m}) \tag{11.10}$$

$$
\Gamma \mathcal{S} \circ \Gamma \mathcal{R} \subseteq \Gamma (\mathcal{S} \circ \mathcal{R}),
\tag{rel.2}
$$

$$
\Gamma \iota \leftarrow \text{rev} \quad \iota \pi \iota \land \Diamond \frown \text{rev} \quad \tag{rel.2}
$$

$$Tf \subseteq \Gamma f, \quad (Tf)^{\diamond} \subseteq \Gamma f^{\diamond}, \tag{rel \ 3}$$
 
$$\mathfrak{m} \subset \mathfrak{m} \qquad \text{\(\prime\)}\\ \mathfrak{m} \subset \mathfrak{m} \mathfrak{c} \qquad \qquad \qquad \qquad (\text{rel } 3)$$

$$\mathcal{R} \subseteq \mathcal{S} \implies \Gamma \mathcal{R} \subseteq \Gamma \mathcal{S}, \tag{rel.4}$$

$$
\Gamma(\mathcal{R}^\circ) = (\Gamma \mathcal{R})^\circ. \tag{rel 5}
$$

Conditions (rel 1), (rel 2), and (rel 4) are rather standard<sup>6</sup>. As we will see, condition (rel 4) makes the defining functional of (bi)simulation relations monotone, whereas conditions (rel 1) and (rel 2) make notions of (bi)similarity reflexive and transitive. Similarly, condition (rel 5) makes notions of bisimilarity symmetric. Condition (rel 3), which actually consists of two conditions, states that relators behave as expected when acting on (graphs of) functions. In [15,43] a kernel preservation condition is required in place of (rel 3). Such a condition is also known as *stability* in [27]. Stability requires the equality Γ(g◦ ◦R◦ f)=(T g)◦ ◦ ΓR ◦ T f to hold. It is easy to see that a relator always satisfies stability (see Corollary III.1.4.4 in [26]).

Relators provide a powerful abstraction of notions of 'relation lifting', as witnessed by the numerous examples of relators we are going to discuss. However, before discussing such examples, we introduce the notion of a *relator for a monad* or *lax extension of a monad*. In fact, since we modelled computational effects as monads, it seems natural to define the notion of a relator for a *monad* (and not just for a functor).

<sup>6</sup> Notice that since I = (1)◦ we can derive condition (rel 1) from condition (rel 3).

**Definition 4.** *Let* **<sup>T</sup>** <sup>=</sup> T, η, <sup>−</sup>† *be a monad, and* Γ *be a relator for* T*. We say that* Γ *is a relator for* **<sup>T</sup>** *if it satisfies the following conditions:*

$$\mathcal{R} \subseteq \eta\_Y^\diamond \circlearrowleft \Gamma \mathcal{R} \circ \eta\_X,\tag{rel 7}$$

$$\mathcal{R} \subseteq g^{\diamond} \circ \Gamma \mathcal{S} \circ f \implies \Gamma \mathcal{R} \subseteq (g^{\dagger})^{\diamond} \circ \Gamma \mathcal{S} \circ f^{\dagger}. \tag{rel 8}$$

Finally, we observe that the collection of relators is closed under specific operations (see [43]).

**Proposition 2.** *Let* T,U *be functors, and let* UT *denote their composition. Moreover, let* <sup>Γ</sup>, <sup>Δ</sup> *be relators for* <sup>T</sup> *and* <sup>U</sup>*, respectively, and* {Γ<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> *be a family of relators for* T*. Then:*


*Example 12.* For the partiality monad **M** we define the set-indexed family of maps **<sup>M</sup>**<sup>ˆ</sup> : Rel(X, Y ) <sup>→</sup> Rel(MX, MY ) as:

$$\chi \hat{\mathbb{M}} \mathcal{R} \, y \stackrel{\triangle}{\Longleftrightarrow} (\chi = \bot) \lor (\exists x \in X. \, \exists y \in Y. \, \: \begin{array}{c} \chi = just \ x \land y = just \ y \land x \, \mathcal{R} \, y \text{)}. \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!} \, \text{!$$

The mapping **M**ˆ describes the structure of the usual *simulation* clause for partial computations, whereas **M**◦ describes the corresponding *co-simulation* clause. It is easy to see that **<sup>M</sup>**<sup>ˆ</sup> is a relator for **<sup>M</sup>**. By Proposition 2, the map **<sup>M</sup>**<sup>ˆ</sup> <sup>∧</sup> **<sup>M</sup>**<sup>ˆ</sup> ◦ is a conversive relator for **M**. It is immediate to see that the latter relator describes the structure of the usual *bisimulation* clause for partial computations.

*Example 13.* For the distribution monad we define the relator **D**ˆ relying on the notion of a *coupling* and results from optimal transport [72]. Recall that a *coupling* for μ <sup>∈</sup> D(X) and ν <sup>∈</sup> D(Y ) a is a joint distribution ω <sup>∈</sup> D(X <sup>×</sup> Y ) such that: μ <sup>=</sup> - <sup>y</sup>∈<sup>Y</sup> <sup>ω</sup>(−, <sup>y</sup>) and <sup>ν</sup> <sup>=</sup> - <sup>x</sup>∈<sup>X</sup> <sup>ω</sup>(x, <sup>−</sup>). We denote the set of couplings of μ and ν by Ω(μ, ν). Define the (set-indexed) map **<sup>D</sup>**<sup>ˆ</sup> : Rel(X, Y ) <sup>→</sup> Rel(DX, DY ) as follows:

$$
\mu \upharpoonright \mathbb{R} \mathcal{R} \,\nu \xleftarrow{\triangle} (\exists \omega \in \Omega(\mu, \nu). \,\forall x, y. \,\omega(x, y) > 0 \implies x \,\mathcal{R} \, y).
$$

We can show that **D**ˆ is a relator for **D** relying on *Strassen's Theorem* [69], which shows that **D**ˆ can be characterised universally (i.e. using an universal quantification).

**Theorem 1 (Strassen's Theorem** [69]**).** *For all* μ <sup>∈</sup> DX*,* ν <sup>∈</sup> DY *, and* <sup>R</sup> : X <sup>→</sup><sup>+</sup> <sup>Y</sup> *, we have:* <sup>μ</sup> **<sup>D</sup>**ˆ<sup>R</sup> <sup>ν</sup> ⇐⇒ ∀*<sup>X</sup>* <sup>⊆</sup> <sup>X</sup>. <sup>μ</sup>(*X*) <sup>≤</sup> <sup>ν</sup>(R[*X*])*.*

As a corollary of Theorem 1, we see that **D**ˆ describes the defining clause of Larsen-Skou bisimulation for Markov chains (based on full distributions) [34]. Finally, we observe that **DM**ˆ -**D**ˆ**M**ˆ is a relator for **DM**.

*Example 14.* For relations <sup>R</sup> : X <sup>→</sup><sup>+</sup> Y , <sup>S</sup> : X <sup>→</sup><sup>+</sup> Y , let R×S : X×X <sup>→</sup><sup>+</sup> Y <sup>×</sup>Y be defined as (R×S)((x, x ), (y, y )) ⇐⇒ <sup>R</sup>(x, y)∧S(x , y ). We define the relator **<sup>C</sup>**<sup>ˆ</sup> : Rel(X, Y ) <sup>→</sup> Rel(CX, CY ) for the cost monad **<sup>C</sup>** as **<sup>C</sup>**ˆ<sup>R</sup> - **<sup>M</sup>**ˆ(≥×R), where ≥ denotes the opposite of the natural ordering on **N**. It is straightforward to see that **C**ˆ is indeed a relator for **C**. The use of the opposite of the natural order in the definition of **C**ˆ captures the idea that we use **C**ˆ to measure complexity. Notice that **C**ˆ describes Sands' simulation clause for program improvement [62].

*Example 15.* For the global state monad **<sup>G</sup>** we define the map **<sup>G</sup>**<sup>ˆ</sup> : Rel(X, Y ) <sup>→</sup> Rel(GX, GY ) as α **<sup>G</sup>**ˆ<sup>R</sup> β ⇐⇒ <sup>∀</sup><sup>σ</sup> <sup>∈</sup> <sup>S</sup>. <sup>α</sup>(σ) (I<sup>S</sup> × R) <sup>β</sup>(σ). It is straightforward to see that **G**ˆ is a relator for **G**.

It is not hard to see that we can extend **<sup>G</sup>**<sup>ˆ</sup> to relators for **<sup>M</sup>** <sup>⊗</sup> **<sup>G</sup>**, **DM** <sup>⊗</sup> **<sup>G</sup>**, and **C** ⊗ **G**. In fact, Proposition 1 extends to relators. -

**Proposition 3.** *Given a monad* **<sup>T</sup>** <sup>=</sup> T, <sup>t</sup>, <sup>−</sup>**<sup>T</sup>** *and a relator* **<sup>T</sup>**<sup>ˆ</sup> *for* **<sup>T</sup>***, define the sum* **TM**ˆ *of* **T**ˆ *and* **M**ˆ *as* **T**ˆ**M**ˆ*. Additionally, define the tensor* **T** <sup>⊗</sup> **<sup>G</sup>** *of* **<sup>T</sup>**<sup>ˆ</sup> *and* **<sup>G</sup>**<sup>ˆ</sup> *by* α (**<sup>T</sup>** -<sup>⊗</sup> **<sup>G</sup>**)<sup>R</sup> <sup>β</sup> *if an only if* <sup>∀</sup>σ. <sup>α</sup>(σ) **<sup>T</sup>**ˆ(I<sup>S</sup> × R) <sup>β</sup>(σ)*. Then* **TM**<sup>ˆ</sup> *is a relator for* **TM***, and* (**T** -⊗ **G**) *is a relator for* **T** ⊗ **G***.*

Finally, we require relators to properly interact with the Σ-continuous structure of monads.

**Definition 5.** *Let* **<sup>T</sup>** <sup>=</sup> T, <sup>η</sup>, <sup>−</sup>† *be a* Σ*-continuous monad and* Γ *be relator for* **<sup>T</sup>***. We say that* Γ *is* Σ-continuous *if it satisfies the following clauses—called the* inductive conditions*—for any* <sup>ω</sup>*-chain* (*x*n)<sup>n</sup> *in* T X*, element <sup>y</sup>* <sup>∈</sup> T Y *, elements <sup>x</sup>* , *<sup>x</sup>* <sup>∈</sup> T X*, and relation* <sup>R</sup> : <sup>X</sup> <sup>→</sup><sup>+</sup> <sup>Y</sup> *.*

$$\perp \Gamma \mathcal{R} \, y, \quad \underline{\chi} \subseteq \underline{\chi'}, \underline{\chi'} \Gamma \mathcal{R} \, y \implies \underline{\chi} \, \Gamma \mathcal{R} \, y, \quad \forall n. \, \underline{\chi}\_n \Gamma \mathcal{R} \, y \implies \bigsqcup\_n \underline{\chi}\_n \Gamma \mathcal{R} \, y.$$

The relators **M**ˆ, **DM**ˆ , **C**ˆ, **M** <sup>⊗</sup> **<sup>G</sup>**, **DM**-⊗ **G**, **C** <sup>⊗</sup> **<sup>G</sup>** are all Σ-continuous. The reader might have noticed that we have not imposed any condition on how relators should interact with algebraic operations. Nonetheless, it would be quite natural to require a relator Γ to satisfy condition (rel 9) below, for all operation symbol **op** : P I <sup>∈</sup> Σ, maps κ, ν : I <sup>→</sup> T X, parameter p <sup>∈</sup> P, and relation <sup>R</sup>.

$$\forall i \in I. \,\,\kappa(i) \,\, \Gamma \mathcal{R} \,\,\nu(i) \implies \mathsf{[op]}(p, \kappa) \,\, \Gamma \mathcal{R} \,\, [\mathsf{op}](p, \nu) \qquad\qquad (\text{rel } 9)$$

Remarkably, if **<sup>T</sup>** is Σ-algebraic, then any relator for **<sup>T</sup>** satisfies (rel 9) (cf. [15]).

**Proposition 4.** *Let* **<sup>T</sup>** <sup>=</sup> T, <sup>η</sup>, <sup>−</sup>† *be a* Σ*-algebraic monad, and let* Γ *be a relator for* **<sup>T</sup>***. Then* Γ *satisfies condition* (rel 9)*.*

Having defined relators and their basic properties, we now introduce the notion of an effectful eager normal form (bi)simulation.

### **6 Effectful Eager Normal Form (Bi)simulation**

In this section we tacitly assume a Σ-continuous monad **<sup>T</sup>** <sup>=</sup> T, η, <sup>−</sup>† and a Σ-continuous relator Γ for it be fixed. Σ-continuity of Γ is not required for defining effectful eager normal form (bi)simulation, but it is crucial to prove that the induced notion of similarity and bisimilarity are precongruence and congruence relations, respectively.

Working with effectful calculi, it is important to distinguish between relations over *terms* and relations over *eager normal forms*. For that reason we will work with pairs of relations of the form (R<sup>Λ</sup> : <sup>Λ</sup> <sup>→</sup><sup>+</sup> <sup>Λ</sup>, <sup>R</sup><sup>E</sup> : <sup>E</sup> → E <sup>+</sup> ), which we call λ-term relations (or term relations, for short). We use letters <sup>R</sup>, <sup>S</sup>, ... to denote term relations. The collection of λ-term relations (i.e. Rel(Λ,Λ)×Rel(E, <sup>E</sup>)) inherits a complete lattice structure from Rel(Λ,Λ) and Rel(E, <sup>E</sup>) pointwise, hence allowing λ-term relations to be defined both inductively and coinductively. We use these properties to define our notion of effectful eager normal form similarity.

**Definition 6.** *A term relation* <sup>R</sup> = (R<sup>Λ</sup> : <sup>Λ</sup> <sup>→</sup><sup>+</sup> <sup>Λ</sup>, <sup>R</sup><sup>E</sup> : <sup>E</sup> → E <sup>+</sup> ) *is an* effectful eager normal form simulation *with respect to* Γ *(hereafter enf-simulation, as* Γ *will be clear from the context) if the following conditions hold, where in condition* (enf 4) z <sup>∈</sup> F V (E) <sup>∪</sup> F V (E )*.*

$$e\,\mathcal{R}\_{\Lambda}f \implies [e]\,\Gamma\mathcal{R}\_{\varepsilon}\,[f],\tag{euf1}$$

$$x \, \mathcal{R}\_{\mathcal{E}} \, s \implies s = x,\tag{euf \, 2}$$

$$
\lambda x.e \,\mathcal{R}\_{\mathcal{E}} \; s \implies \exists f. \; s = \lambda x.f \land e \,\mathcal{R}\_{\mathcal{A}} \; f,\tag{euf \; 3}
$$

$$E[xv]\,\mathcal{R}\_{\varepsilon}\,s \implies \exists E', v'.\,s = E'[xv'] \land v\,\mathcal{R}\_{\varepsilon}\,v' \land \exists z.\,E[z]\,\mathcal{R}\_{\Lambda}\,E'[z].\tag{euf4}$$

*We say that relation* R respects enfs *if it satisfies conditions* (enf 2)*–*(enf 4)*.*

Definition 6 is quite standard. Clause (enf 1) is morally the same clause on terms used to define effectful applicative similarity in [15]. Clauses (enf 2) and (enf 3) state that whenever two enfs are related by R<sup>E</sup> , then they must have the same outermost syntactic structure, and their subterms must be pairwise related. For instance, if λx.<sup>e</sup> <sup>R</sup><sup>E</sup> <sup>s</sup> holds, then <sup>s</sup> must the a <sup>λ</sup>-abstraction, i.e. an expression of the form λx.f, and e and f must be related by <sup>R</sup>Λ.

Clause (enf 4) is the most interesting one. It states that whenever <sup>E</sup>[xv]R<sup>E</sup> <sup>s</sup>, then s must be a stuck term E [xv ], for some evaluation context E and value v . Notice that E[xv] and s must have the same 'stuck variable' x. Additionally, <sup>v</sup> and <sup>v</sup> must be related by <sup>R</sup><sup>E</sup> , and <sup>E</sup> and <sup>E</sup> must be properly related too. The idea is that to see whether E and E are related, we replace the stuck expressions xv, xv with a fresh variable <sup>z</sup>, and test <sup>E</sup>[z] and <sup>E</sup> [z] (thus resuming the evaluation process). We require <sup>E</sup>[z] <sup>R</sup><sup>E</sup> <sup>E</sup> [z] to hold, for *some* fresh variable z. The choice of the variable does not really matter, provided it is fresh. In fact, as we will see, effectful eager normal form similarity <sup>E</sup> is substitutive and reflexive. In particular, if E[z] <sup>E</sup> <sup>E</sup> <sup>E</sup> [z] holds, then E[y] <sup>E</sup> <sup>E</sup> <sup>E</sup> [y] holds as well, for any variable y <sup>∈</sup> F V (E) <sup>∪</sup> F V (E ).

Notice that Definition 6 does not involve any universal quantification. In particular, enfs are tested by inspecting their syntactic structure, thus making the definition of an enf-simulation somehow 'local': terms are tested in isolation and not via their interaction with the environment. This is a major difference with e.g. applicative (bi)simulation, where the environment interacts with λabstractions by passing them arbitrary (closed) values as arguments.

Definition <sup>6</sup> induces a functional <sup>R</sup> → [R] on the complete lattice Rel(Λ,Λ)<sup>×</sup> Rel(E, E), where [R] = ([R]Λ, [R]<sup>E</sup> ) is defined as follows (here I<sup>X</sup> denotes the identity relation on variables, i.e. the set of pairs of the form (x, x)):

$$\begin{split} \left[\mathcal{R}\right]\_{\boldsymbol{A}} & \triangleq \{ (\boldsymbol{e}, \boldsymbol{f}) \mid \left[\boldsymbol{e}\right] \,\,\Gamma \mathcal{R}\_{\varepsilon} \,\,[\![f]\!] \} \\ & \left[\mathcal{R}\right]\_{\varepsilon} \triangleq \mathbb{I}\_{\mathcal{X}} \cup \{ (\lambda x. \boldsymbol{e}, \lambda x. \boldsymbol{f}) \mid \boldsymbol{e} \,\mathcal{R}\_{\boldsymbol{A}} \,\,{f} \}, \\ & \cup \{ (\boldsymbol{E}[x\boldsymbol{v}], \boldsymbol{E}'[x\boldsymbol{v}']) \mid \boldsymbol{v} \,\,\mathcal{R}\_{\varepsilon} \,\,{\boldsymbol{v}'} \wedge \exists \boldsymbol{z} \notin \boldsymbol{F} \boldsymbol{V}(\boldsymbol{E}) \cup \boldsymbol{F} \boldsymbol{V}(\boldsymbol{E}') \,\, \,{\boldsymbol{E}} [\![\boldsymbol{z}\right] \,\, \mathcal{R}\_{\boldsymbol{A}} \,\,{\boldsymbol{E}}' [\![\boldsymbol{z}\] \,\, \mathcal{R}\_{\boldsymbol{A}} \,\,{\boldsymbol{E}}' [\![\boldsymbol{z}\] \,\, \mathcal{R}\_{\boldsymbol{A}} \,\,{}' \} . \end{split}$$

It is easy to see that a term relation R is an enf-simulation if and only if R ⊆ [R]. Notice also that although [R]<sup>E</sup> always contains the identity relation on variables, R<sup>E</sup> does not have to: the empty relation (∅, ∅) is an enf-simulation. Finally, since relators are monotone (condition (rel 4)), R → [R] is monotone too. As a consequence, by Knaster-Tarski Theorem [70], it has a greatest fixed point which we call *effectful eager normal form similarity* with respect to Γ (hereafter enf-similarity) and denote by <sup>E</sup> = (<sup>E</sup> <sup>Λ</sup>, <sup>E</sup> <sup>E</sup> ). Enf-similarity is thus the largest enf-simulation with respect to Γ. Moreover, <sup>E</sup> being defined coinductively, it comes with an associated coinduction proof principle stating that if a term relation R is an enf-simulation, then it is contained in <sup>E</sup>. Symbolically: R ⊆ [R] =⇒ R⊆<sup>E</sup>.

*Example 16.* We use the coinduction proof principle to show that <sup>E</sup> contains the β-rule, viz. (λx.e)v <sup>E</sup> <sup>Λ</sup> <sup>e</sup>[v/x]. For that, we simply observe that the term relation ({((λx.e)v, <sup>e</sup>[v/x])}, <sup>I</sup><sup>E</sup> ) is an enf-simulation. Indeed, [[(λx.e)v]] = [[e[v/x]]], so that by (rel 1) we have [[(λx.e)v]] <sup>Γ</sup>I<sup>E</sup> [[e[v/x]]].

Finally, we define effectful eager normal form *bisimilarity*.

**Definition 7.** *A term relation* R *is an* effectful eager normal form bisimulation with respect to Γ *(enf-bisimulation, for short) if it is a* symmetric *enfsimulation.* Eager normal bisimilarity with respect to Γ *(enf-bisimilarity, for short)* <sup>E</sup> *is the largest symmetric enf-simulation. In particular, enf-bisimilarity (with respect to* Γ*) coincides with enf-similarity with respect to* Γ <sup>∧</sup> Γ◦*.*

*Example 17.* We show that the probabilistic call-by-value fixed point combinators Y and Z of Example <sup>2</sup> are enf-bisimilar. In light of Proposition 5, this allows us to conclude that Y and Z are applicatively bisimilar, and thus contextually equivalent [15]. Let us consider the relator **DM**ˆ for probabilistic partial computations. We show Y <sup>E</sup> <sup>Λ</sup> <sup>Z</sup> by coinduction, proving that the symmetric closure of the term relation R = (RΛ, R<sup>E</sup> ) defined as follows is an enf-simulation:

$$\begin{aligned} \mathcal{R}\_A & \triangleq \{ (Y, Z), (\Delta \Delta z, Zyz), (\Delta \Delta, y(\lambda z. \Delta \Delta z) \text{ or } y(\lambda z. Zyz)) \} \cup \mathsf{l}\_A\\ \mathcal{R}\_\varepsilon & \triangleq \{ (y(\lambda z. \Delta \Delta z), y(\lambda z. Zyz)), (\lambda z. \Delta \Delta z, \lambda z. Zyz),\\ & (\lambda y. \Delta \Delta, \lambda y. (y(\lambda z. \Delta \Delta z) \text{ or } y(\lambda z. Zyz))), (y(\lambda z. \Delta \Delta z)z, y(\lambda z. Zyz)z) \} \cup \mathsf{l}\_\varepsilon. \end{aligned}$$

The term relation <sup>R</sup> is obtained from the relation {(Y ,Z)} by progressively adding terms and enfs according to clauses (enf 1)–(enf 4) in Definition 6. Checking that R is an enf-simulation is straightforward. As an illustrative example, we prove that ΔΔz <sup>R</sup><sup>Λ</sup> Zyz implies [[ΔΔz]] **DM**<sup>ˆ</sup> (R<sup>E</sup> ) [[Zyz]]. The latter amounts to show:

$$\left(1 \cdot just\ y(\lambda z.\Delta\Delta z)z\right)\mathbb{D}\hat{\mathbb{M}}(\mathcal{R}\_{\varepsilon})\left(\frac{1}{2} \cdot just\ y(\lambda z.\Delta\Delta z)z + \frac{1}{2} \cdot just\ y(\lambda z.Zyz)z\right),$$

where, as usual, we write distributions as weighted formal sums. To prove the latter, it is sufficient to find a suitable coupling of [[ΔΔz]] and [[Zyz]]. Define the distribution ω <sup>∈</sup> D(ME × ME) as follows:

$$
\begin{aligned}
\omega(just\ y(\lambda z.\Delta\Delta z)z, just\ y(\lambda z.\Delta\Delta z)z) &= \frac{1}{2}, \\
\omega(just\ y(\lambda z.\Delta\Delta z)z, just\ y(\lambda z.Zyz)z) &= \frac{1}{2},
\end{aligned}
$$

and assigning zero to all other pairs in ME × ME. Obviously ω is a coupling of [[ΔΔz]] and [[Zyz]]. Additionally, we see that <sup>ω</sup>(*<sup>x</sup>* , *<sup>y</sup>*) implies *<sup>x</sup>* **<sup>M</sup>**ˆR<sup>E</sup> *<sup>y</sup>*, since both <sup>y</sup>(λz.ΔΔz)<sup>z</sup> <sup>R</sup><sup>E</sup> <sup>y</sup>(λz.ΔΔz)z, and <sup>y</sup>(λz.ΔΔz)<sup>z</sup> <sup>R</sup><sup>E</sup> <sup>y</sup>(λz.Zyz)<sup>z</sup> hold.

As already discussed in Example 2, the operational equivalence between Y and Z is an example of an equivalence that cannot be readily established using standard operational methods—such as CIU equivalence or applicative bisimilarity—but whose proof is straightforward using enf-bisimilarity. Additionally, Theorem 3 will allow us to reduce the size of R, thus minimising the task of checking that our relation is indeed an enf-bisimulation. To the best of the authors' knowledge, the probabilistic instance of enf-(bi)similarity is the first example of a *probabilistic eager normal form (bi)similarity* in the literature.

#### **6.1 Congruence and Precongruence Theorems**

In order for <sup>E</sup> and <sup>E</sup> to qualify as good notions of program refinement and equivalence, respectively, they have to allow for compositional reasoning. Roughly speaking, a term relation R is compositional if the validity of the relationship <sup>C</sup>[e]R C[e ] between compound terms <sup>C</sup>[e], <sup>C</sup>[e ] follows from the validity of the relationship e <sup>R</sup> e between the subterms <sup>e</sup>, <sup>e</sup> . Mathematically, the notion of compositionality is formalised throughout the notion of *compatibility*, which directly leads to the notions of a precongruence and congruence relation. In this section we prove that <sup>E</sup> and <sup>E</sup> are substitutive precongruence and congruence

**Fig. 1.** Compatible and substitutive closure construction.

relations, that is preorder and equivalence relations closed under term constructors of <sup>Λ</sup><sup>Σ</sup> and substitution, respectively. To prove such results, we generalise Lassen's relational construction for the pure call-by-name λ-calculus [37]. Such a construction has been previously adapted to the *pure* call-by-value λ-calculus (and its extension with delimited and abortive control operators) in [9], whereas Lassen has proved compatibility of pure eager normal form bisimilarity via a CPS translation [38]. Both those proofs rely on syntactical properties of the calculus (mostly expressed using suitable small-step semantics), and thus seem to be hardly adaptable to effectful calculi. On the contrary, our proofs rely on the properties of relators, thereby making our results and techniques more modular and thus valid for a large class of effects.

We begin proving precongruence of enf-similarity. The central tool we use to prove the wished precongruence theorem is the so-called *(substitutive) context closure* [37] RSC of a term relation R, which is inductively defined by the rules in Fig. 1, where <sup>x</sup> ∈ {Λ, E}, i ∈ {1, 2}, and z <sup>∈</sup> F V (E) <sup>∪</sup> F V (E ).

We easily see that RSC is the smallest term relation that contains R, it is closed under language constructors of <sup>Λ</sup><sup>Σ</sup> (a property known as *compatibility* [5]), and it is closed under the substitution operation (a property known as *substitutivity* [5]). As a consequence, we say that a term relation R is a *substitutive compatible* relation if RSC ⊆ R (and thus R = RSC). If, additionally, R is a preorder (resp. equivalence) relation, then we say that R is a *substitutive precongruence* (resp. *substitutive congruence*) relation.

We are now going to prove that if R is an enf-simulation, then so is RSC. In particular, we will infer that (<sup>E</sup>)SC is a enf-simulation, and thus it is contained in <sup>E</sup>, by coinduction.

# **Lemma 2 (Main Lemma).** *If* R *be an enf-simulation, then so is* RSC*.*

*Proof (sketch).* The proof is long and non-trivial. Due to space constraints here we simply give some intuitions behind it. First, a routine proof by induction shows that since R respects enfs, then so does RSC. Next, we wish to prove that e <sup>R</sup>SC <sup>Λ</sup> <sup>f</sup> implies [[e]] <sup>Γ</sup>RSC <sup>E</sup> [[f]]. Since <sup>Γ</sup> is inductive, the latter follows if for any n <sup>≥</sup> 0, e <sup>R</sup>SC <sup>Λ</sup> <sup>f</sup> implies [[e]]<sup>n</sup> <sup>Γ</sup>RSC <sup>E</sup> [[f]]. We prove the latter implication by lexicographic induction on (1) the natural number n and (2) the derivation e <sup>R</sup>SC <sup>Λ</sup> <sup>f</sup>. The case for <sup>n</sup> = 0 is trivial (since <sup>Γ</sup> is inductive). The remaining cases are nontrivial, and are handled observing that [[E[e]]] = (s → [[E[s]]])†[[e]] and [[e[v/x]]]<sup>n</sup> [[−[v/x]]]† <sup>n</sup>[[e]]n. Both these identities allow us to apply condition (rel 8) to simplify proof obligations (usually relying on part (2) of the induction hypothesis as well). This scheme is iterated until we reach either an enf (in which case we are done by condition (rel 7)) or a pair of expressions on which we can apply part (1) of the induction hypothesis.

**Theorem 2.** *Enf-similarity (resp. bisimilarity) is a substitutive precongruence (resp. congruence) relation.*

*Proof.* We show that enf-similarity is a substitutive precongruence relation. By Lemma 2, it is sufficient to show that <sup>E</sup> is a preorder. This follows by coinduction, since the term relations I and <sup>E</sup> ◦ <sup>E</sup> are enf-simulations (the proofs make use of conditions (rel 1) and (rel 2), as well as of substitutivity of <sup>E</sup>).

Finally, we show that enf-bisimilarity is a substitutive congruence relation. Obviously <sup>E</sup> is an equivalence relation, so that it is sufficient to prove (<sup>E</sup>)SC ⊆ <sup>E</sup>. That directly follows by coinduction relying on Lemma 2, provided that (<sup>E</sup>)SC is symmetric. An easy inspection of the rules in Fig. 1 reveals that RSC is symmetric, whenever R is.

#### **6.2 Soundness for Effectful Applicative (Bi)similarity**

Theorem 2 qualifies enf-bisimilarity and enf-similarity as good candidate notions of program equivalence and refinement for Λ<sup>Σ</sup>, at least from a structural perspective. However, we gave motivations for such notions looking at specific examples where effectful applicative (bi)similarity is ineffective. It is then natural to ask whether enf-(bi)similarity can be used as a proof technique for effectful applicative (bi)similarity.

Here we give a formal comparison between enf-(bi)similarity and effectful applicative (bi)similarity, as defined in [15]. First of all, we rephrase the notion of an effectful applicative (bi)simulation of [15] to our calculus Λ<sup>Σ</sup>. For that, we use the following notational convention. Let <sup>Λ</sup><sup>0</sup>, <sup>V</sup><sup>0</sup> denote the collections of closed terms and closed values, respectively. We notice that if e <sup>∈</sup> Λ<sup>0</sup>, then [[e]] <sup>∈</sup> <sup>T</sup>V0. As a consequence, [[−]] induces a closed evaluation function |−| : <sup>Λ</sup><sup>0</sup> <sup>→</sup> <sup>T</sup>V<sup>0</sup> characterised by the identity [[−]] ◦ <sup>ι</sup> <sup>=</sup> T ι ◦ |−|, where <sup>ι</sup> : <sup>V</sup><sup>0</sup> → E is the obvious inclusion map. We can thus phrase the definition of effectful applicative similarity (with respect to a relator Γ) as follows.

**Definition 8.** *A term relation* <sup>R</sup> = (R<sup>Λ</sup><sup>0</sup> : <sup>Λ</sup><sup>0</sup> <sup>→</sup><sup>+</sup> <sup>Λ</sup><sup>0</sup>, <sup>R</sup>V<sup>0</sup> : <sup>V</sup><sup>0</sup> → V <sup>+</sup> <sup>0</sup>) *is an effectful applicative simulation with respect to* Γ *(applicative simulation, for short) if:*

$$\begin{array}{c} e\,\mathcal{R}\_{\mathcal{A}\_0}f \implies |e|\,\Gamma\mathcal{R}\_{\mathcal{V}\_0}|f|, \\ \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \end{array} \tag{\text{app 1}}$$

$$
\lambda x.e.\mathcal{R}\_{\mathcal{V}\_0}\lambda x.f \implies \forall v \in \mathcal{V}\_0.\ e[v/x].\mathcal{R}\_{\mathcal{A}\_0}f[v/x].\tag{app\ 2}
$$

As usual, we can define effectful applicative similarity with respect to Γ (applicative similarity, for short), denoted by <sup>A</sup> <sup>0</sup> = (<sup>A</sup> <sup>Λ</sup><sup>0</sup> , <sup>A</sup> <sup>V</sup><sup>0</sup> ), coinductively as the largest applicative simulation. Its associated coinduction proof principle states that if a relation is an applicative simulation, then it is contained in applicative similarity. Finally, we extend <sup>A</sup> <sup>0</sup> to arbitrary terms by defining the relation <sup>A</sup> = (<sup>A</sup> <sup>Λ</sup>, <sup>A</sup> <sup>V</sup> ) as follows: let <sup>e</sup>, <sup>f</sup>, <sup>w</sup>, <sup>u</sup> be terms and values with free variables among ¯x <sup>=</sup> x1, ... , <sup>x</sup>n. We let ¯<sup>v</sup> range over <sup>n</sup>-ary sequences of closed values v1, ... , vn. Define:

$$e \preceq\_{\Lambda}^{\mathsf{A}} f \xleftarrow{\triangle} \forall \bar{v}. \; e[\bar{v}/\bar{x}] \preceq\_{\Lambda\_{0}}^{\mathsf{A}} f[\bar{v}/\bar{x}], \qquad w \preceq\_{\Lambda}^{\mathsf{A}} u \xleftarrow{\triangle} \forall \bar{v}. \; w[\bar{v}/\bar{x}] \preceq\_{\Lambda\_{0}}^{\mathsf{A}} u[\bar{v}/\bar{x}].$$

The following result states that enf-similarity is a sound proof technique for applicative similarity.

# **Proposition 5.** *Enf-similarity* <sup>E</sup> *is included in applicative similarity* <sup>A</sup>*.*

*Proof.* Let <sup>c</sup> = (<sup>c</sup> <sup>Λ</sup>, <sup>c</sup> <sup>V</sup> ) denote enf-similarity restricted to closed terms and values. We first show that <sup>c</sup> is an applicative simulation, from which follows, by coinduction, that it is included in <sup>A</sup> <sup>0</sup>. It is easy to see that <sup>c</sup> satisfies condition (app 2). In order to prove that it also satisfies condition (app 1), we have to show that for all e, f <sup>∈</sup> Λ◦, <sup>e</sup> <sup>c</sup> <sup>Λ</sup> <sup>f</sup> implies <sup>|</sup>e<sup>|</sup> <sup>Γ</sup><sup>c</sup> <sup>V</sup> <sup>|</sup>f|. Since <sup>e</sup> <sup>c</sup> <sup>Λ</sup> <sup>f</sup> obviously implies ι(e) <sup>E</sup> <sup>Λ</sup> <sup>ι</sup>(f), by (enf 1) we infer [[ι(e)]] <sup>Γ</sup><sup>E</sup> <sup>V</sup> [[ι(f)]], and thus T ι|e<sup>|</sup> <sup>Γ</sup><sup>E</sup> <sup>V</sup> T ι|f|. By stability of Γ, the latter implies <sup>|</sup>e<sup>|</sup> Γ(ι ◦ ◦ <sup>E</sup> ◦ <sup>ι</sup>) <sup>|</sup>f|, and thus the wished thesis, since ι ◦ ◦ <sup>E</sup> ◦ <sup>ι</sup> is nothing but <sup>c</sup> <sup>V</sup> . Finally, we show that for all terms e, f, if e<sup>E</sup> <sup>Λ</sup> <sup>f</sup>, then <sup>e</sup><sup>A</sup> <sup>Λ</sup> <sup>f</sup> (a similar result holds *mutatis mutandis* for values, so that we can conclude <sup>E</sup> ⊆ <sup>A</sup>). Indeed, suppose F V (e) <sup>∪</sup> F V (f) <sup>⊆</sup> <sup>x</sup>¯, then by substitutivity of <sup>E</sup> we have that e <sup>E</sup> <sup>Λ</sup> <sup>f</sup> implies <sup>e</sup>[¯v/x¯] <sup>E</sup> <sup>Λ</sup> <sup>f</sup>[¯v/x¯], for all closed values ¯v (notice that since we are substituting *closed* values, sequential and simultaneous substitution coincide). That essentially means e[¯v/x¯] <sup>c</sup> <sup>Λ</sup> <sup>f</sup>[¯v/x¯], and thus e[¯v/x¯] <sup>A</sup> <sup>Λ</sup><sup>0</sup> <sup>f</sup>[¯v/x¯]. We thus conclude <sup>e</sup> <sup>A</sup> Λ f.

Since in [15] it is shown that effectful applicative similarity (resp. bisimilarity) is contained in effectful contextual approximation (resp. equivalence), Proposition 5 gives the following result.

**Corollary 1.** *Enf-similarity and enf-bisimilarity are sound proof techniques for contextual approximation and equivalence, respectively.*

Although sound, enf-bisimilarity is *not* fully abstract for applicative bisimilarity. In fact, as already observed in [38], in the pure λ-calculus enf-bisimilarity is strictly finer than applicative bisimilarity (and thus strictly finer than contextual equivalence too). For instance, the terms xv and (λy.xv)(xv) are obviously applicatively bisimilar but not enf-bisimilar.

#### **6.3 Eager Normal Form (Bi)simulation Up-to Context**

The up-to context technique [37,60,64] is a refinement of the coinduction proof principle of enf-(bi)similarity that allows for handier proofs of equivalence and refinement between terms. When exhibiting a candidate enf-(bi)simulation relation R, it is desirable for R to be as small as possible, so to minimise the task of verifying that R is indeed an enf-(bi)simulation.

The motivation behind such a technique can be easily seen looking at Example 17, where we showed the equivalence between the probabilistic fixed point combinators Y and Z working with relations containing several administrative pairs of terms. The presence of such pairs was forced by Definition 7, although they appear somehow unnecessary in order to convince that Y and Z exhibit the same operational behaviour.

Enf-(bi)simulation up-to context is a refinement of enf-(bi)simulation that allows to check that a relation R behaves as an enf-(bi)simulation relation up to its substitutive and compatible closure.

**Definition 9.** *A term relation* <sup>R</sup> = (R<sup>Λ</sup> : <sup>Λ</sup> <sup>→</sup><sup>+</sup> <sup>Λ</sup>, <sup>R</sup><sup>E</sup> : <sup>E</sup> → E <sup>+</sup> ) *is an* effectful eager normal form simulation up-to context with respect to Γ *(enf-simulation up-to context, hereafter) if satisfies the following conditions, where in condition* (up-to 4) z <sup>∈</sup> F V (E) <sup>∪</sup> F V (E )*.*

$$e\,\mathcal{R}\_{\Lambda}f \implies [e]\,\Gamma\mathcal{R}\_{\varepsilon}^{\infty}\,[f],\tag{up\text{-}to\text{-}1}$$

$$x \, \mathcal{R}\_{\mathcal{E}} \, s \implies s = x,\tag{up-to 2}$$

$$
\lambda x.e \,\mathcal{R}\_{\varepsilon} \, s \implies \exists f. \,\, s = \lambda x.f \wedge e \,\mathcal{R}\_{\Lambda}^{\infty} \, f,\tag{up-to-3}
$$

$$E[xv]\,\mathcal{R}\_{\varepsilon}\,s \implies \exists E', v'.\,s = E'[xv'] \land v\,\mathcal{R}\_{\varepsilon}^{\infty}\,v' \land \exists z.\,E[z]\,\mathcal{R}\_{\Lambda}^{\infty}\,E'[z].\,\text{ (up-to 4)}$$

In order for the up-to context technique to be sound, we need to show that every enf-simulation up-to context is contained in enf-similarity. This is a direct consequence of the following variation of Lemma 2.

# **Lemma 3.** *If* R *is a enf-simulation up-to context, then* RSC *is a enf-simulation.*

*Proof.* The proof is structurally identical to the one of Lemma 2, where we simply observe that wherever we use the assumption that R is an enf-simulation, we can use the weaker assumption that R is an enf-simulation up-to context.

In particular, since by Lemma 2 we have that <sup>E</sup> = (<sup>E</sup>)SC, we see that enfsimilarity is an enf-simulation up-to context. Additionally, by Lemma 3 it is the largest such. Since the same result holds for enf-bisimilarity and enf-bisimilarity up-to context, we have the following theorem.

**Theorem 3.** *Enf-similarity is the largest enf-simulation up-to context, and enfbisimilarity is the largest enf-bisimulation up-to context.*

*Example 18.* We apply Theorem 3 to simplify the proof of the equivalence between Y and Z given in Example 17. In fact, it is sufficient to show that the symmetric closure of term relation R defined below is an enf-bisimulation up-to context.

$$\mathcal{R}\_A \triangleq \{ (Y, Z), (\Delta \Delta z, Zyz), (\Delta \Delta, y(\lambda z. \Delta \Delta z) \text{ or } y(\lambda z. Zyz)) \}, \quad \mathcal{R}\_\varepsilon \triangleq \mathsf{l}\_\varepsilon.$$

*Example 19.* Recall the fixed point combinators with ticking operations Y and Z of Example 4. Let us consider the relator **<sup>C</sup>**ˆ. It is not hard to see that Y and Z are not enf-bisimilar (that is because the ticking operation is evaluated at different moments, so to speak). Nonetheless, once we pass them a variable <sup>x</sup><sup>0</sup> as argument, we have Zx<sup>0</sup> <sup>E</sup> <sup>Λ</sup> Y x0. For, observe that the term relation <sup>R</sup> defined below is an enf-simulation up-context.

$$\mathcal{R}\_A \triangleq \{ (Yx\_0, Zx\_0), (\mathbf{tick}(\Delta[x\_0/y]\Delta[x\_0/y]z), \mathbf{tick}(\Theta\Theta x\_0 z)) \}, \qquad \mathcal{R}\_\varepsilon = \emptyset.$$

Intuitively, Y executes a tick first, and then proceeds iterating the evaluation of <sup>Δ</sup>[x<sup>0</sup>/y]Δ[x<sup>0</sup>/y], the latter involving two tickings only. On the contrary, Z proceeds by recursively call itself, hence involving three tickings at any iteration, so to speak. Since <sup>E</sup> is substitutive, for any value <sup>v</sup> we have Zv <sup>E</sup> Y v.

Theorem 3 makes enf-(bi)similarity an extremely powerful proof technique for program equivalence/refinement, especially because it is yet unknown whether there exist *sound* up-to context techniques for applicative (bi)similarity [35].

#### **6.4 Weak Head Normal Form (Bi)simulation**

So far we have focused on call-by-value calculi, since in presence of effects the call-by-value evaluation strategy seems the more natural one. Nonetheless, our framework can be easily adapted to deal with call-by-name calculi too. In this last section we spend some words on *effectful weak head normal form (bi)similarity* (whnf-(bi)similarity, for short). The latter is nothing but the call-by-name counterpart of enf-(bi)similarity. The main difference between enf-(bi)similarity and whnf-(bi)similarity relies on the notion of an evaluation context (and thus of a stuck term). In fact, in a call-by-name setting, <sup>Λ</sup><sup>Σ</sup> evaluation contexts are expressions of the form [−]e<sup>1</sup> ··· <sup>e</sup><sup>n</sup>, which are somehow simpler than their call-by-value counterparts. Such a simplicity is reflected in the definition of whnf-(bi)similarity, which allows to prove *mutatis mutandis* all results proved for enf-(bi)similarity (such results are, without much of a surprise, actually easier to prove).

We briefly expand on that. The collection of weak head normal forms (whnfs, for short) W is defined as the union of V and the collection of stuck terms, the latter being expressions of the form xe<sup>1</sup> ··· <sup>e</sup><sup>n</sup>. The evaluation function of Definition <sup>2</sup> now maps terms to elements in TW, and it is essentially obtained modifying Definition <sup>2</sup> defining [[E[xe]]]n+1 <sup>η</sup>(E[xe]) and [[E[(λx.f)e]]]n+1 - [[E[f[e/x]]]]n. The notion of a whnf-(bi)simulation (and thus the notions of whnf-(bi)similarity) is obtained modifying Definition 6 accordingly. In particular, clauses (enf 2) and (enf 4) are replaced by the following clause, where we use the notation <sup>R</sup> = (R<sup>Λ</sup> : <sup>Λ</sup> <sup>→</sup><sup>+</sup> <sup>Λ</sup>, <sup>R</sup><sup>W</sup> : <sup>W</sup> → W <sup>+</sup> ) to denote a (call-by-name) <sup>λ</sup>-term relation.

$$x e\_0 \cdots e\_k \mathcal{R}\_{\mathcal{W}} s \implies \exists f\_0, \dots, f\_k. \ s = x f\_0 \cdots f\_k \land \forall i. \ e\_i \mathcal{R}\_A \ f\_i.$$

A straightforward modifications of the rules in Fig. 1 allows to prove an analogous of Lemma 2 for whnf-simulations, and thus to conclude (pre)congruence properties of whnf-(bi)similarity. Additionally, such results generalise to whnf- (bi)simulation up to-context, the latter being defined according to Definition 9, so that we have an analogous of Theorem 3 as well. The latter allows to infer the equivalence of the argument-switching fixed point combinators of Example 3, simply by noticing that the symmetric closure of the term relation <sup>R</sup> = ({(P, Q), (Pyz, Qzy), (Pzy, Qyz)}, <sup>∅</sup>) is a whnf-bisimulation up-to context.

Finally, it is straightforward to observe that whnf-(bi)similarity is included in the call-by-name counterpart of effectful applicative (bi)similarity, but that the inclusion is strict. In fact, the (pure λ-calculus) terms xx and x(λy.xy) are applicatively bisimilar, but not whnf-bisimilar.

### **7 Related Work**

Normal form (bi)similarity has been originally introduced for the call-by-name λcalculus in [65], where it was called *open bisimilarity*. Open bisimilarity provides a coinductive characterisation of L´evy-Longo tree equivalence [42,45,53], and has been shown to coincide with the equivalence (notably weak bisimilarity) induced by Milner's encoding of the λ-calculus into the π-calculus [48].

In [37] normal form bisimilarity relations characterising both B¨ohm and L´evy-Longo tree equivalences have been studied by purely operational means, providing new congruence proofs of the aforementioned tree equivalences based on suitable relational constructions. Such results have been extended to the callby-value λ-calculus in [38], where the so-called *eager normal form bisimilarity* is introduced. The latter is shown to coincide with the L´evy-Longo tree equivalence induced by a suitable CPS translation [54], and thus to be a congruence relation. An elementary proof of congruence properties of eager normal form bisimilarity is given in [9], where Lassen's relational construction [37] is extended to the call-by-value λ-calculus, as well as its extensions with delimited and abortive control operators. Finally, following [65], eager normal form bisimilarity has been recently characterised as the equivalence induced by a suitable encoding of the (call-by-value) λ-calculus in the π-calculus [21].

Concerning effectful extensions of normal form bisimilarity, our work seems to be rather new. In fact, normal form bisimilarity has been studied for *deterministic* extensions of the λ-calculus with specific *non*-algebraic effects, notably control operators [9], as well as control and state [68] (where full abstraction of the obtained notion of normal form bisimilarity is proved). The only extension of normal form bisimilarity to an algebraic effect the authors are aware of, is given in [39], where normal form bisimilarity is studied for a *nondeterministic call-by-name* λ-calculus. However, we should mention that contrary to normal form bisimilarity, both nondeterministic [20] and probabilistic [41] extensions of B¨ohm tree equivalence have been investigated (although none of them employ, to the best of the authors' knowledge, coinductive techniques).

#### **8 Conclusion**

This paper shows that effectful normal form bisimulation is indeed a powerful methodology for program equivalence. Interestingly, the proof of congruence for normal form bisimilarity can be given just once, without the necessity of redoing it for every distinct notion of algebraic effect considered. This relies on the fact that the underlying monad and relator are Σ-continuous, something which has already been proved for many distinct notions of effects [15].

Topics for further work are plentiful. First of all, a natural question is whether the obtained notion of bisimilarity coincides with contextual equivalence. This is known *not* to hold in the deterministic case [37,38], but to hold in presence of control and state [68], which offer the environment the necessary discriminating power. Is there any (sufficient) condition on effects guaranteeing full abstraction of normal form bisimilarity? This is an intriguing question we are currently investigating. In fact, contrary to applicative bisimilarity (which is known to be unsound in presence of non-algebraic effects [33], such as local states), the syntactic nature of normal form bisimilarity seems to be well-suited for languages combining both algebraic and non-algebraic effects.

Another interesting topic for future research, is investigating whether normal form bisimilarity can be extended to languages having both algebraic operations and effect handlers [7,59].

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **On the Multi-Language Construction**

Samuele Buro(B) and Isabella Mastroeni(B)

Department of Computer Science, University of Verona, Strada le Grazie 15, 37134 Verona, Italy {samuele.buro,isabella.mastroeni}@univr.it

**Abstract.** Modern software is no more developed in a single programming language. Instead, programmers tend to exploit *cross-language interoperability mechanisms* to combine code stemming from different languages, and thus yielding fully-fledged *multi-language programs*. Whilst this approach enables developers to benefit from the strengths of each single-language, on the other hand it complicates the semantics of such programs. Indeed, the resulting multi-language does not meet any of the semantics of the combined languages. In this paper, we broaden the *boundary functions*-based approach `a la Matthews and Findler to propose an algebraic framework that provides a constructive mathematical notion of *multi-language* able to determine its *semantics*. The aim of this work is to overcome the lack of a formal method (resp., model) to design (resp., represent) a multi-language, regardless of the inherent nature of the underlying languages. We show that our construction ensures the uniqueness of the *semantic function* (i.e., the multi-language semantics induced by the combined languages) by proving the *initiality* of the term model (i.e., the abstract syntax of the multi-language) in its category.

**Keywords:** Multi-language design · Program semantics · Interoperability

# **1 Introduction**

Two elementary arguments lie at the heart of the *multi-language paradigm*: the large availability of existing programming languages, along with a very high number of already written libraries, and software that, in general, needs to *interoperate*. Although there is consensus in claiming that there is no best programming language regardless of the context [4,8], it is equally true that many of them are conceived and designed in order to excel for specific tasks. Such examples are R for statistical and graphical computation, Perl for data wrangling, Assembly and C for low-level memory management, etc. *"Interoperability between languages has been a problem since the second programming language was invented"* [8], so it is hardly surprising that developers have focused on the design of *cross-language interoperability mechanisms*, enabling programmers to combine code written in different languages. In this sense, we speak of *multi-languages*.

The field of cross-language interoperability has been driven more by practical concerns than by theoretical questions. The current scenario sees several engines and frameworks [13,28,29,44,47] (among others) to mix programming languages but only [30] discusses the semantic issues related to the multi-language design from a theoretical perspective. Moreover, the existing interoperability mechanisms differ considerably not only from the viewpoint of the combined languages, but also in terms of the approach used to provide the interoperation. For instance, Nashorn [47] is a JavaScript interpreter written in Java to allow embedding JavaScript in Java applications. Such engineering design works in a similar fashion of *embedded interpreters* [40,41].<sup>1</sup> On the contrary, Java Native Interface (JNI) framework [29] enables the interoperation of Java with native code written in C, C**++**, or Assembly through external procedure calls between languages, mirroring the widespread mechanism of *foreign function interfaces (FFI)* [14], whereas theoretical papers follow the more elegant approach of *boundary functions* (or, for short, *boundaries*) in the style of Matthews and Findler's multi-language semantics [30]. Simply put, boundaries act as a gate between single-languages. When a value needs to flow on the other language, they perform a conversion so that it complies to the other language specifications.

The major issue concerning this new paradigm is that multi-language programs do not obey any of the semantics of the combined languages. As a consequence, any method of formal reasoning (such as static program analysis or verification) is neutralized by the absence of a semantics specification. In this paper, we propose an algebraic framework based on the mechanism of boundary functions [30] that unambiguously yields the syntax and the semantics of the multi-language regardless the combined languages.

*The Lack of a Multi-Language Framework.* The notion of *multi-language* is employed naively in several works in literature [2,14,21,30,35–37,49] to indicate the embedding of two programming languages into a new one, with its own syntax and semantics.

The most recurring way to design a multi-language is to exploit a mechanism (like embedded interpreters, FFI, or boundary functions) able to regulate both control flow and value conversion between the underlying languages [30], thus adequate to provide *cross-language interoperability* [8]. The full construction is usually carried out manually by language designers, which define the multilanguage by reusing the formal specifications of the single-languages [2,30,36, 37] and by applying the selected mechanism for achieving the interoperation. Inevitably, therefore, all these resulting multi-languages notably differ one from another.

These different ways to achieve a cross-language interoperation are all attributable to the lack of a formal description of multi-language that does not provide neither a method for language designers to conceive new multi-languages nor any guarantee on the correctness of such constructions.

<sup>1</sup> Other popular engines that obey the embedded interpreters paradigm are Jython [28], JScript [44], and Rhino [13].

*The Proposed Framework: Roadmap and Contributions.* Matthews and Findler [30] propose *boundary functions* as a way to regulate the flow of values between languages. They show their approach on different variants of the same multi-language obtained by mixing ML [33] and Scheme [9], representing two "syntactically sugared" versions of the simply-typed and untyped lambda calculi, respectively.

Rather than showing the embedding of two fixed languages, we extend their approach to the much broader class of *order-sorted algebras* [19] with the aim of providing a framework that works regardless of the inherent nature of the combined languages. There are a number of reasons to choose order-sorted algebras as the underlying framework for generalizing the multi-language construction. From the first formulation of *initial algebra semantics* [17], the algebraic approach to program semantics [16] has become a cornerstone in the theory of programming languages [27]. Order-sorted algebras provide a mathematical tool for representing formal systems as algebraic structures through a systematic use of the notion of *sort* and *subsort* to model different forms of polymorphism [18,19], a key aspect when dealing with multi-languages sharing operators among the single-languages. They were initially proposed to ensure a rigorous model-theoretic semantics for error handling, multiple inheritance, retracts, selectors for multiple constructors, polymorphism, and overloading. In the years, several uses [3,6,11,24,25,38,39,52] and different variants [38,43,45, 51] have been proposed for order-sorted algebras, making them a solid starting point for the development of a new framework. In particular, results on *rewriting logic* [32] extend easily to the order-sorted case [31], thus facilitating a future extension of this paper towards the *operational semantics* world. Improvements of the order-sorted algebra framework have also been proposed to model languages together with their type systems [10] and to extend ordersorted specification with high-order functions [38] (see [48] and [18] for detailed surveys).

In this paper, we propose three different multi-language constructions according to the semantic properties of boundary functions. The first one models a general notion of multi-language that do not require any constraints on boundaries (Sect. 3). We argue that when such generality is superfluous, we can achieve a neater approach where boundary functions do not need to be annotated with sorts. Indeed, we show that when the cross-language conversion of a term does not depend on the sort at which the term is considered (i.e., when boundaries are *subsort polymorphic*) the framework is powerful enough to apply the correct conversion (Sect. 4.1). This last construction is an improvement of the original notion of boundaries in [30]. From a practical point of view, it allows programmers to avoid to explicitly deal with sorts when writing code, a non-trivial task that could introduce type cast bugs in real world languages. Finally, we provide a very specific notion of multi-language where no extra operator is added to the syntax (Sect. 4.2). This approach is particularly useful to extend a language in a modular fashion and ensuring the backward compatibility with "old" programs. For each one of these variants we prove an *initiality theorem*, which in turn ensures the uniqueness of the multi-language semantics and thereby legitimating the proposed framework. Moreover, we show that the framework guarantees a fundamental closure property on the construction: The resulting multi-language admits an order-sorted representation, i.e., it falls within the same formal model of the combined languages. Finally, we model the multi-language designed in [30] in order to show an instantiation of the framework (Sect. 6).

# **2 Background**

All the algebraic background of the paper is firstly stated in [15,17,19]. We briefly introduce here the main definitions and results, and we illustrate them on a simple running example.

Given a *set of sorts* S, an S-*sorted set* A is a family of sets indexed by S, i.e., <sup>A</sup> <sup>=</sup> { <sup>A</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> }. Similarly, an <sup>S</sup>-*sorted function* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> is a family of functions <sup>f</sup> <sup>=</sup> { <sup>f</sup><sup>s</sup> : <sup>A</sup><sup>s</sup> <sup>→</sup> <sup>B</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> }. We stick to the convention of using <sup>s</sup> and w as metavariables for sorts in S and S<sup>∗</sup>, respectively, and we use the **blackboard bold** typeface to indicate a specific sort in S. In addition, if A is an S-sorted set and <sup>w</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ...s<sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup>, we denote by <sup>A</sup><sup>w</sup> the cartesian product <sup>A</sup><sup>s</sup><sup>1</sup> ×···× <sup>A</sup><sup>s</sup><sup>n</sup> . Likewise, if <sup>f</sup> is an <sup>S</sup>-sorted function and <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup><sup>s</sup><sup>i</sup> for <sup>i</sup> = 1,...,n, then the function <sup>f</sup><sup>w</sup> : <sup>A</sup><sup>w</sup> <sup>→</sup> <sup>B</sup><sup>w</sup> is such that <sup>f</sup><sup>w</sup>(a1,...,a<sup>n</sup>)=(f<sup>s</sup><sup>1</sup> (a<sup>1</sup>),...,f<sup>s</sup><sup>n</sup> (a<sup>n</sup>)). Given P <sup>⊆</sup> S, the restriction of an S-sorted function f to P is denoted by f| <sup>P</sup> and it is the <sup>P</sup>-sorted function <sup>f</sup><sup>|</sup> <sup>P</sup> <sup>=</sup> { <sup>f</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>P</sup> }. Finally, if <sup>g</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> is a function, we still use the symbol g to denote the *direct image map of g* (also called the *additive lift* of g), i.e., the function g : <sup>℘</sup>(A) <sup>→</sup> <sup>℘</sup>(B) such that g(X) = { g(a) <sup>∈</sup> B <sup>|</sup> a <sup>∈</sup> X }. Analogously, if <sup>≤</sup> is a binary relation on a set A (with elements a <sup>∈</sup> A), we use the same relation symbol to denote its *pointwise extension*, i.e., we write <sup>a</sup><sup>1</sup> ...a<sup>n</sup> <sup>≤</sup> <sup>a</sup> <sup>1</sup> ...a <sup>n</sup> for <sup>a</sup><sup>1</sup> <sup>≤</sup> <sup>a</sup> <sup>1</sup>,...,a<sup>n</sup> <sup>≤</sup> <sup>a</sup> n.

The basic notions underpinning the order-sorted algebra framework are the definitions of *signature*, that models symbols forming terms of the language, and *algebra*, that provides an algebraic meaning to symbols.

**Definition 1 (Order-Sorted Signature).** *An* order-sorted signature *is a triple* S, <sup>≤</sup>, Σ*, where* S *is a set of sorts,* <sup>≤</sup> *is a binary relation on* S*, and* <sup>Σ</sup> *is an* <sup>S</sup><sup>∗</sup> <sup>×</sup>S*-sorted set* <sup>Σ</sup> <sup>=</sup> { <sup>Σ</sup>w,s <sup>|</sup> <sup>w</sup> <sup>∈</sup> <sup>S</sup><sup>∗</sup> <sup>∧</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> }*, satisfying the following conditions:*

*(1os)* S, ≤ *is a poset; and (2os)* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>w</sup>1,s<sup>1</sup> <sup>∩</sup> <sup>Σ</sup><sup>w</sup>2,s<sup>2</sup> *and* <sup>w</sup><sup>1</sup> <sup>≤</sup> <sup>w</sup><sup>2</sup> *imply* <sup>s</sup><sup>1</sup> <sup>≤</sup> <sup>s</sup><sup>2</sup>*.*

If <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w,s (or, <sup>σ</sup> : <sup>w</sup> <sup>→</sup> <sup>s</sup> and <sup>σ</sup> : <sup>s</sup> when <sup>w</sup> <sup>=</sup> <sup>ε</sup>, as shorthands), we call <sup>σ</sup> an *operator* (*symbol*) or *function symbol*, w the *arity*, s the *sort*, and (w, s) the *rank* of σ; if w <sup>=</sup> ε, we say that σ is a *constant* (*symbol*). We name <sup>≤</sup> the *subsort relation* and Σ <sup>a</sup> *signature* when S, ≤ is clear from the context. We abuse notation and write σ <sup>∈</sup> Σ when σ <sup>∈</sup> - w,s <sup>Σ</sup>w,s.

**Definition 2 (Order-Sorted Algebra).** *An* order-sorted S, <sup>≤</sup>, Σ*-*algebra <sup>A</sup> *over an order-sorted signature* S, <sup>≤</sup>, Σ *is an* S*-sorted set* A *of* interpretation domains *(or,* carrier sets *or* semantic domains*)* <sup>A</sup> <sup>=</sup> { <sup>A</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> }*, together with* interpretation functions σ w,s <sup>A</sup> : <sup>A</sup><sup>w</sup> <sup>→</sup> <sup>A</sup><sup>s</sup> *(or, if* <sup>w</sup> <sup>=</sup> <sup>ε</sup>*,* σ ε,s <sup>A</sup> <sup>∈</sup> <sup>A</sup>s*)*<sup>2</sup> *for each* σ <sup>∈</sup> Σw,s*, such that:*

*(1oa)* <sup>s</sup> <sup>≤</sup> <sup>s</sup> *implies* <sup>A</sup><sup>s</sup> <sup>⊆</sup> <sup>A</sup>s- *; and (2oa)* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w1,s<sup>1</sup> <sup>∩</sup> <sup>Σ</sup>w2,s<sup>2</sup> *and* <sup>w</sup><sup>1</sup> <sup>≤</sup> <sup>w</sup><sup>2</sup> *imply that* σ w1,s<sup>1</sup> <sup>A</sup> (a) = σ w2,s<sup>2</sup> <sup>A</sup> (a) *for each* <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>w</sup><sup>1</sup> *.*

An important property of signatures, related to polymorphism, is *regularity*. Its relevance lies in the possibility of linking each term to a unique least sort (see Proposition 2.10 in [19]).

**Definition 3 (Regularity of an Order-Sorted Signature).** *An order-sorted signature* S, <sup>≤</sup>, Σ *is* regular *if for each* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w, ˜ <sup>s</sup>˜ *and for each* lower bound <sup>w</sup><sup>0</sup> <sup>≤</sup> <sup>w</sup>˜ *the set* { (w, s) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w,s <sup>∧</sup> <sup>w</sup><sup>0</sup> <sup>≤</sup> <sup>w</sup> } *has minimum. This minimum is called least rank of* σ *with respect to* w0*.*

The freely generated algebra <sup>T</sup><sup>Σ</sup> over a given signature <sup>S</sup> <sup>=</sup> S, <sup>≤</sup>, Σ provides the notion of *term* with respect to S.

**Definition 4 (Order-Sorted Term Algebra).** *Let* S, <sup>≤</sup>, Σ *be an ordersorted signature. The* order-sorted term S, <sup>≤</sup>, Σ*-*algebra <sup>T</sup><sup>Σ</sup> *is an order-sorted algebra such that:*


Homomorphisms between algebras capture the *compositionality* nature of semantics: The meaning of a term is determined by the meanings of its constituents. They are defined as order-sorted functions that preserve the interpretation of operators.

<sup>2</sup> To be pedantic, we should introduce the *one-point domain* <sup>A</sup><sup>ε</sup> <sup>=</sup> {•} and then define σ ε,s <sup>A</sup> (•) <sup>∈</sup> <sup>A</sup>s.


**Fig. 1.** The BNF grammars of the running example languages.

$$\begin{cases} \begin{bmatrix} \cdot\\ \end{bmatrix} = \varepsilon\\ \begin{cases} \begin{bmatrix} n \end{bmatrix} = n\\ \begin{bmatrix} s + & \cdot\\ \end{bmatrix} = \begin{bmatrix} \cdot & \cdot \end{bmatrix} = \begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} s \end{bmatrix}\\ \begin{bmatrix} s + & \cdot \end{bmatrix} = \begin{bmatrix} \cdot & \cdot \end{bmatrix} = \begin{bmatrix} s \end{bmatrix}\\ \begin{bmatrix} s + & \cdot \ + \end{bmatrix} = \begin{bmatrix} s + & s' \end{bmatrix} \end{cases} \end{cases}$$
  $\begin{cases} \begin{bmatrix} n + & \cdot\\ \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \cdot\\ \end{bmatrix} + \begin{bmatrix} s'\\ \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \cdot\\ \end{bmatrix} + \begin{bmatrix} s'\\ \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \cdot\\ \end{bmatrix} \end{bmatrix}$   $\begin{cases} \begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$   $\begin{bmatrix} \cdot\\ \end{bmatrix}$ 

**Fig. 2.** The two formal semantics of the running example languages.

**Definition 5 (Order-Sorted Homomorphism).** *Let* <sup>A</sup> *and* <sup>B</sup> *be* S, <sup>≤</sup>, Σ *algebras. An* order-sorted S, <sup>≤</sup>, Σ*-*homomorphism *from* <sup>A</sup> *to* <sup>B</sup>*, denoted by* h: A→B*, is an* <sup>S</sup>*-sorted function* <sup>h</sup>: <sup>A</sup> <sup>→</sup> <sup>B</sup> <sup>=</sup> { <sup>h</sup><sup>s</sup> : <sup>A</sup><sup>s</sup> <sup>→</sup> <sup>B</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> } *such that:*

*(1oh)* h<sup>s</sup>(σ w,s <sup>A</sup> (a)) = σ w,s <sup>B</sup> (h<sup>w</sup>(a)) *for each* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w,s *and* <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>w</sup>*; and (2oh)* <sup>s</sup> <sup>≤</sup> <sup>s</sup> *implies* h<sup>s</sup>(a) = h<sup>s</sup>-(a) *for each* a <sup>∈</sup> A<sup>s</sup>*.*

The class of all the order-sorted S, <sup>≤</sup>, Σ-algebras and the class of all ordersorted S, <sup>≤</sup>, Σ-homomorphisms form a category denote by **OSAlg**(S, <sup>≤</sup>, Σ). Furthermore, the homomorphism definition determines the property of the term algebra <sup>T</sup><sup>Σ</sup> of being an *initial object* in its category whenever the signature is *regular*. Since *initiality* is preserved by isomorphisms, it allows to identify <sup>T</sup><sup>Σ</sup> with the *abstract syntax* of the language. If <sup>T</sup><sup>Σ</sup> is initial, the homomorphism leaving <sup>T</sup><sup>Σ</sup> and going to an algebra <sup>A</sup> is called the *semantic function* (with respect to A).

**Example.** Let <sup>L</sup><sup>1</sup> and <sup>L</sup><sup>2</sup> be two formal languages (see Fig. 1). The former is a language to construct simple mathematical expressions: n <sup>∈</sup> **<sup>N</sup>** is the metavariable for natural numbers, while e inductively generates all the possible additions (Fig. 1a). The latter is a language to build strings over a finite alphabet of symbols **<sup>A</sup>** <sup>=</sup> { <sup>a</sup>, <sup>b</sup>,..., <sup>z</sup> }: a <sup>∈</sup> **<sup>A</sup>** is the metavariable for atoms (or, characters), whereas <sup>s</sup> concatenates them into strings (Fig. 1b). A term in <sup>L</sup><sup>1</sup> and <sup>L</sup><sup>2</sup> denotes an element in the sets **N** and **A**∗, accordingly to equations in Fig. 2a and b, respectively.

The syntax of the language <sup>L</sup><sup>1</sup> can be modeled by an order-sorted signature <sup>S</sup><sup>1</sup> <sup>=</sup> S<sup>1</sup>, <sup>≤</sup><sup>1</sup>, Σ<sup>1</sup> defined as follows: <sup>S</sup><sup>1</sup> <sup>=</sup> { **<sup>e</sup>**, **<sup>n</sup>** }, a set with sorts **<sup>e</sup>** (stands for *expressions*) and **<sup>n</sup>** (stands for *natural numbers*); <sup>≤</sup><sup>1</sup> is the reflexive relation on <sup>S</sup><sup>1</sup> plus **<sup>n</sup>** <sup>≤</sup><sup>1</sup> **<sup>e</sup>** (natural numbers are expressions); and the operators in <sup>Σ</sup><sup>1</sup> are <sup>0</sup>, <sup>1</sup>, <sup>2</sup>,... : **<sup>n</sup>** and <sup>+</sup>: **e e** <sup>→</sup> **<sup>e</sup>**. Similarly, the signature <sup>S</sup><sup>2</sup> <sup>=</sup> S<sup>2</sup>, <sup>≤</sup><sup>2</sup>, Σ<sup>2</sup> models the syntax of the language <sup>L</sup><sup>2</sup>: the set <sup>S</sup><sup>2</sup> <sup>=</sup> { **<sup>s</sup>**, **<sup>a</sup>** } carries the sort for *strings* **<sup>s</sup>** and the sort for *atomic symbols* (or, characters) **<sup>a</sup>**; the subsort relation <sup>≤</sup><sup>2</sup> is the reflexive relation on <sup>S</sup><sup>2</sup> plus **<sup>a</sup>** <sup>≤</sup><sup>2</sup> **<sup>s</sup>** (characters are one-symbol strings); and the operator symbols in <sup>Σ</sup><sup>2</sup> are <sup>a</sup>,..., <sup>z</sup>: **<sup>a</sup>**, -: **<sup>s</sup>**, and <sup>+</sup>: **s s** <sup>→</sup> **<sup>s</sup>**. Semantics of <sup>L</sup><sup>1</sup> and <sup>L</sup><sup>2</sup> can be embodied by algebras <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> over the signatures <sup>S</sup><sup>1</sup> and <sup>S</sup>2, respectively. We set the interpretation domains of <sup>A</sup><sup>1</sup> to <sup>A</sup><sup>1</sup> **<sup>n</sup>** <sup>=</sup> <sup>A</sup><sup>1</sup> **<sup>e</sup>** <sup>=</sup> **<sup>N</sup>** and those of <sup>A</sup><sup>2</sup> to <sup>A</sup><sup>2</sup> **<sup>a</sup>** <sup>=</sup> **<sup>A</sup>** <sup>⊆</sup> **<sup>A</sup>**<sup>∗</sup> <sup>=</sup> <sup>A</sup><sup>2</sup> **<sup>s</sup>** . Moreover, we define the interpretation functions as follows (the juxtaposition of two or more strings denotes their concatenation, and we use ˆa as metavariable ranging over **<sup>A</sup>**∗):

$$\begin{cases} \left\[n\right]\_{\mathcal{A}\_{1}}^{\varepsilon,n} = n\\ \left\[\star\right\}\_{\mathcal{A}\_{1}}^{\mathsf{e},\mathsf{e},\mathsf{e}}(n\_{1},n\_{2}) = n\_{1} + n\_{2} \end{cases} \qquad\qquad\qquad \begin{cases} \left\[\neg\right\rbrack\_{\mathcal{A}\_{2}}^{\varepsilon,\mathsf{e}} = \varepsilon\\ \left\[a\right\}\_{\mathcal{A}\_{2}}^{\varepsilon,a} = a\\ \left\[\star\right\}\_{\mathcal{A}\_{2}}^{\mathsf{e},\mathsf{e},\mathsf{e}}(\hat{a}\_{1},\hat{a}\_{2}) = \hat{a}\_{1}\hat{a}\_{2} \end{cases}$$

Since S<sup>1</sup> and S<sup>2</sup> are regular, then A<sup>1</sup> and A<sup>2</sup> induce the semantic functions <sup>h</sup><sup>1</sup> : <sup>T</sup><sup>Σ</sup><sup>1</sup> → A<sup>1</sup> and <sup>h</sup><sup>2</sup> : <sup>T</sup><sup>Σ</sup><sup>2</sup> → A2, providing semantics to the languages.

# **3 Combining Order-Sorted Theories**

The first step towards a multi-language specification is the choice of which terms of one language can be employed in the others [30,35,36]. For instance, a multilanguage requirement could demand to use ML expressions in place of Scheme expressions and, possibly, but not necessarily, vice versa (such a multi-language is designed in [30]). A *multi-language signature* is an amenable formalism to specify the compatibility relation between syntactic categories across two languages.

**Definition 6 (Multi-Language Signature).** *A* multi-language signature *is a triple* S1, <sup>S</sup>2, ≤*, where* <sup>S</sup><sup>1</sup> <sup>=</sup> S1, <sup>≤</sup><sup>1</sup>, Σ<sup>1</sup> *and* <sup>S</sup><sup>2</sup> <sup>=</sup> S2, <sup>≤</sup><sup>2</sup>, Σ<sup>2</sup> *are ordersorted signatures, and* <sup>≤</sup> *is a binary relation on* <sup>S</sup> <sup>=</sup> <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup>2*, such that satisfies the following condition:*

*(1s)* s, s <sup>∈</sup> <sup>S</sup><sup>i</sup> *implies* <sup>s</sup> <sup>≤</sup> <sup>s</sup> *if and only if* <sup>s</sup> <sup>≤</sup><sup>i</sup> <sup>s</sup> *, for* i = 1, <sup>2</sup>*.*

*To make the notation lighter, we introduce the following binary relations on* S*:* s<sup>s</sup> *if* <sup>s</sup> <sup>≤</sup> <sup>s</sup> *but neither* <sup>s</sup> <sup>≤</sup><sup>1</sup> <sup>s</sup> *nor* <sup>s</sup> <sup>≤</sup><sup>2</sup> <sup>s</sup> *, and* s s *if* s <sup>≤</sup> s *but not* ss *.*

In the following, we always assume that the sets of sorts <sup>S</sup><sup>1</sup> and <sup>S</sup><sup>2</sup> of the ordersorted signatures S<sup>1</sup> and S<sup>2</sup> are disjoint.<sup>3</sup> Condition (1s) requires the *multilanguage subsort relation* <sup>≤</sup> to *preserve* the original subsort relations <sup>≤</sup><sup>1</sup> and <sup>≤</sup><sup>2</sup> (i.e., ≤ ∩ <sup>S</sup><sup>i</sup> <sup>×</sup> <sup>S</sup><sup>i</sup> <sup>=</sup> <sup>≤</sup>i). The *join relation* provides a compatibility relation between sorts<sup>4</sup> in <sup>S</sup><sup>1</sup> and <sup>S</sup>2. More precisely, <sup>S</sup><sup>i</sup> <sup>s</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>j</sup> suggests that we want to use terms in <sup>T</sup><sup>Σ</sup>i,s in place of terms in <sup>T</sup><sup>Σ</sup><sup>j</sup> ,s-, whereas the *intra-language*

<sup>3</sup> This hypothesis is non-restrictive: We can always perform a renaming of the sorts.

<sup>4</sup> Sorts may be understood as syntactic categories, in the sense of formal grammars. Given a context-free grammar G, it is possible to define a many-sorted signature Σ<sup>G</sup> where non-terminals become sorts and such that each term t in the term algebra T<sup>Σ</sup><sup>G</sup> is isomorphic to the parse tree of t with respect to G (see [15] for details).

*subsort relation* shifts the standard notion of subsort from the order-sorted to the multi-language world. In a nutshell, the relation ≤ = - <sup>∪</sup> can only join (through -) the underlying languages without introducing distortions (indeed, -= ≤<sup>1</sup> ∪ ≤2).

The role of an algebra is to provide an interpretation domain for each sort, as well as the meaning of every operator symbol in a given signature. When moving towards the multi-language context, the join relation may add subsort constraints between sorts belonging to different signatures. Consequently, if ss , a multi-language algebra has to specify how values of sort s may be interpreted as values of sort s . These specifications are called *boundary functions* [30] and provide an algebraic meaning to the subsort constraints added by -. Henceforth, we define <sup>S</sup> <sup>=</sup> <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup><sup>2</sup>, <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup><sup>1</sup> <sup>∪</sup> <sup>Σ</sup><sup>2</sup>, and, given (w, s) <sup>∈</sup> <sup>S</sup><sup>∗</sup> <sup>i</sup> <sup>×</sup> <sup>S</sup><sup>i</sup>, we denote by Σi w,s the (w, s)-sorted component in <sup>Σ</sup><sup>i</sup>.

**Definition 7 (Multi-Language Algebra).** *Let* S1, <sup>S</sup>2, ≤ *be a multilanguage signature. A* multi-language S1, <sup>S</sup>2, ≤*-*algebra <sup>A</sup> *is an* <sup>S</sup>*-sorted set* A *of* interpretation domains *(or,* carrier sets *or* semantic domains*)* A <sup>=</sup> { <sup>A</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> }*, together with* interpretation functions σ w,s <sup>A</sup> : <sup>A</sup><sup>w</sup> <sup>→</sup> <sup>A</sup><sup>s</sup> *for each* σ <sup>∈</sup> Σw,s*, and with a* -*-sorted set* α *of* boundary functions α <sup>=</sup> { αs,s- : <sup>A</sup><sup>s</sup> <sup>→</sup> <sup>A</sup><sup>s</sup>- <sup>|</sup> s s }*, such that the following constraint holds:*

*(1a) the* projected algebra <sup>A</sup>i*, where* <sup>i</sup> = 1, <sup>2</sup>*, specified by the carrier set* <sup>A</sup><sup>i</sup> <sup>=</sup> { Ai <sup>s</sup> <sup>=</sup> <sup>A</sup><sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>i</sup> } *and interpretation functions* σ w,s <sup>A</sup><sup>i</sup> <sup>=</sup> σ w,s <sup>A</sup> *for each* σ <sup>∈</sup> Σ<sup>i</sup> w,s*, must be an order-sorted* <sup>S</sup>i*-algebra.*

If <sup>M</sup> is an algebra, we adopt the convention of denoting by M (standard math font) its carrier set and by μ (Greek math font) its boundary functions whenever possible. Condition (1a) is the semantic counterpart of condition (1s): It requires the multi-language to carry (i.e., preserve) the underlying languages order-sorted algebras, whereas the boundary functions model how values can flow between languages.

Given two multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebras <sup>A</sup> and <sup>B</sup> we can define morphisms between them that preserve the sorted structure of the underlying projected algebras.

**Definition 8 (Multi-Language Homomorphism).** *Let* <sup>A</sup> *and* <sup>B</sup> *be multilanguage* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-algebras with sets of boundary functions* <sup>α</sup> *and* <sup>β</sup>*, respectively. A* multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-*homomorphism <sup>h</sup>: A→B *is an* <sup>S</sup>*-sorted function* h: A <sup>→</sup> B *such that:*

*(1h) the restriction* h<sup>|</sup> <sup>S</sup><sup>i</sup> *is an order-sorted* <sup>S</sup>i*-homomorphism* <sup>h</sup><sup>|</sup> <sup>S</sup><sup>i</sup> : <sup>A</sup><sup>i</sup> → Bi*, for* i = 1, <sup>2</sup>*; and (2h)* s s *implies* <sup>h</sup><sup>s</sup>- ◦ αs,s- <sup>=</sup> βs,s-◦ h<sup>s</sup>*.*

Conditions (1h) and (2h) are easily intelligible when the domain algebra is the abstract syntax of the language [15]: Simply put, both conditions require the semantics of a term to be a function of the meaning of its subterms, in the sense of [15,46]. In particular, the second condition demands that boundary functions act as operators.<sup>5</sup>

The identity homomorphism on a multi-language algebra A is denoted by id<sup>A</sup> and it is the set-theoretic identity on the carrier set <sup>A</sup> of the algebra <sup>A</sup>. The composition of two homomorphisms f : A→B and g : B→C is defined as the sorted function composition <sup>g</sup> ◦ <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>C</sup>, thus id<sup>A</sup> ◦<sup>f</sup> <sup>=</sup> <sup>f</sup> <sup>=</sup> <sup>f</sup> ◦ id<sup>B</sup> and associativity follows easily by the definition of ◦.

**Proposition 1.** *Multi-language homomorphisms are closed under composition.*

Hence, as in the many-sorted and order-sorted case [15,19], we have immediately the category of all the multi-language algebras over a multi-language signature:

**Theorem 1.** *Let* S1, <sup>S</sup>2, ≤ *be a multi-language signature. The class of all* S1, <sup>S</sup>2, ≤*-algebras and the class of all* S1, <sup>S</sup>2, ≤*-homomorphisms form a category denoted by* **Alg**(S1, <sup>S</sup>2, <sup>≤</sup>)*.*

### **3.1 The Initial Term Model**

In this section, we introduce the concepts of *(multi-language) term* and *(multi-language) semantics* in order to show how a multi-language algebra yields a unique interpretation for any *regular* (see Definition 11) multi-language specification.

Multi-language terms should comprise all of the underlying languages terms, plus those obtained by the merging of the two languages according to the join relation -. In particular, we aim for a construction where subterms of sort s may have been replaced by terms of sort s, whenever ss (we recall that s and s are two syntactic categories of different languages due to Definition 6). Nonetheless, we must be careful not to add ambiguities during this process: A term t may belong to both S<sup>1</sup> and S<sup>2</sup> term algebras but with different meanings <sup>t</sup><sup>A</sup><sup>1</sup> and tA2 (assuming that <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> are algebras over <sup>S</sup><sup>1</sup> and <sup>S</sup>2, respectively). When <sup>t</sup> is included in the multi-language, we lose the information to determine which one of the two interpretations choose, thus making the (multi-language) semantics of t ambiguous. The same problem arises whenever an operator σ belongs to both languages with different interpretation functions. The simplest solution to avoid such issues is to add syntactical notations to make explicit the context of the language in which we are operating.

**Definition 9 (Associated Signature).** *The* associated signature *to the multilanguage signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *is the ordered triple* S, -, Π*, where* S <sup>=</sup> S<sup>1</sup>∪S<sup>2</sup>*,* -<sup>=</sup> <sup>≤</sup><sup>1</sup> ∪ ≤2*, and*

> <sup>Π</sup> <sup>=</sup> { <sup>σ</sup><sup>1</sup> : <sup>w</sup> <sup>→</sup> <sup>s</sup> <sup>|</sup> <sup>σ</sup> : <sup>w</sup> <sup>→</sup> <sup>s</sup> <sup>∈</sup> <sup>Σ</sup><sup>1</sup> } ∪ { <sup>σ</sup><sup>2</sup> : <sup>w</sup> <sup>→</sup> <sup>s</sup> <sup>|</sup> <sup>σ</sup> : <sup>w</sup> <sup>→</sup> <sup>s</sup> <sup>∈</sup> <sup>Σ</sup><sup>2</sup> } ∪ { →s,s- : s <sup>→</sup> s <sup>|</sup> s s }

<sup>5</sup> This is essential in order to generalize the concept of syntactical boundary functions of [30] to semantic-only functions in Sect. 4.2.

It is trivial to prove that an associated signature is indeed an order-sorted signature, thus admitting a term algebra TΠ. All the symbols forming terms in T<sup>Π</sup> carry the source language information as a subscript, and all the new operators →s,s specify when a term of sort s is used in place of a term of sort s . Although T<sup>Π</sup> seems a suitable definition for multi-language terms, it is not a multi-language algebra according to Definition 7. However, we can exploit the construction of T<sup>Π</sup> in order to provide a fully-fledged multi-language algebra able to generate multi-language terms.

**Definition 10 (Multi-Language Term Algebra).** *The* multi-language term algebra <sup>T</sup> *over a multi-language signature* S1, <sup>S</sup>2, ≤ *with boundary functions* τ *is defined as follows:*

*(1t)* <sup>s</sup> <sup>∈</sup> <sup>S</sup> *implies* <sup>T</sup><sup>s</sup> <sup>=</sup> <sup>T</sup>Π,s*; (2t)* σ <sup>∈</sup> Σ<sup>i</sup> w,s *implies* σ w,s <sup>T</sup> <sup>=</sup> σi w,s <sup>T</sup><sup>Π</sup> *for* <sup>i</sup> = 1, <sup>2</sup>*; and (3t)* s <sup>s</sup> *implies* τs,s- = -→s,s- s,s- TΠ *.*

Proving that T satisfies Definition 7 is easy and omitted. T and T<sup>Π</sup> share the same carrier sets (condition (1t)), and each single-language operator σ <sup>∈</sup> Σ<sup>i</sup> w,s is interpreted as its annotated version <sup>σ</sup><sup>i</sup> in <sup>T</sup><sup>Π</sup> (condition (2t)). Furthermore, the multi-language operators →s,s no longer belong to the signature (they do not belong neither to S<sup>1</sup> nor to S2) but their semantics is inherited by the boundary functions τ (condition (3t)), while their syntactic values are still in the carrier sets of the algebra (this construction is highly technical and very similar to the freely generated Σ(X)-algebra over a set of variables X, see [15]).

Note that this is exactly the formalization of the ad hoc multi-language specifications in [2,30,36,37]: [2,36,37] exploit distinct colors to disambiguate the source language of the operators, whereas [30] use different font styles for different languages. Moreover, boundary functions in [30] conceptually match the introduced operators →s,s-.

The last step in order to finalize the framework is to provide semantics for each term in <sup>T</sup> . As with the order-sorted case, we need a notion of *regularity* for proving the initiality of the term algebra in its category, which in turn ensures a single eligible *(initial algebra) semantics*.

**Definition 11 (Regularity).** *A multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *is* regular *if its associated signature* S, -, Π *is regular.*

**Proposition 2.** *The associated signature* S, -, Π *of a multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *is regular if and only if* <sup>S</sup><sup>1</sup> *and* <sup>S</sup><sup>2</sup> *are regular.*

The last proposition enables to avoid checking the multi-language regularity whenever the regularity of the order-sorted signatures is known.

**Theorem 2 (Initiality of** <sup>T</sup> **).** *The multi-language term algebra* <sup>T</sup> *over a regular multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *is initial in the category* **Alg**(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>)*.*

Initiality of T is essential to assign a unique mathematical meaning to each term, as in the order-sorted case: Given a multi-language algebra A, there is only one way of interpreting each term t ∈ T in <sup>A</sup> (satisfying the homomorphism conditions).

**Definition 12 ((Multi-Language) Semantics).** *Let* <sup>A</sup> *be a multi-language algebra over a regular multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*. The* (multilanguage) semantics *of a (multi-language) term* t ∈ T *induced by* <sup>A</sup> *is defined as*

$$\left[\boldsymbol{t}\right]\_{\mathcal{A}} = h\_{\mathrm{ls}(t)}(\boldsymbol{t})$$

The last equation is well-defined since h is the unique multi-language homomorphism h: T →A and for each t ∈ T there exists a least sort ls(t) <sup>∈</sup> S such that <sup>t</sup> <sup>∈</sup> <sup>T</sup>Π,ls(t) (see Prop. 2.10 in [19]).

**Example.** Suppose we are interested in a multi-language over the signatures S<sup>1</sup> and S<sup>2</sup> specified in the example given in the background section such that satisfies the following properties:


In order to achieve such a multi-language specification, we can simply provide a join relation on S and a boundary function αs,s for each extra-language subsort relation s s introduced by -. We define the join relation and the boundary functions as follows:

$$\begin{array}{ccccc} \mathsf{e} \times \mathsf{o} & \wedge & \mathsf{n} \times \mathsf{o} & \longrightarrow & \alpha\_{\mathsf{e},\mathsf{o}}(n) = \alpha\_{\mathsf{e},\mathsf{o}}(n) = \mathsf{chr}(n) \\\\ \mathsf{s} \times \mathsf{n} & \wedge & \mathsf{o} \times \mathsf{n} & \longrightarrow & \begin{cases} \alpha\_{\mathsf{e},\mathsf{n}}(a) = \mathrm{ord}(a) \\\\ \alpha\_{\mathsf{s},\mathsf{r}}(a\_{0} \dots a\_{n}) = \sum\_{k=0}^{n} \alpha\_{\mathsf{e},\mathsf{r}}(a\_{k}) \cdot 10^{k} \end{cases} \end{array}$$

The multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebra <sup>A</sup> can now be obtained by joining the projected algebras <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> with the set of boundary functions <sup>α</sup>. The term algebra <sup>T</sup> over S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ provides all the multi-language terms, and Theorem <sup>2</sup> ensures a unique denotation of each t ∈ T in <sup>A</sup>. For instance, the term

$$t = \underbrace{\longleftrightarrow\_{\mathfrak{s}, \mathfrak{n}} \left( \underbrace{\mathfrak{t}\_2 \{ \mathfrak{f}\_2, \mathfrak{t}\_2 \{ \mathbf{o}\_2, \underbrace{\longleftrightarrow\_{\mathfrak{e}, \mathfrak{o}} \left(\mathfrak{t}\_1 \{ \mathbf{1} \{ \mathbf{1}\_1, \mathfrak{t}\_1 \} \right)} \}}\_{t\_3} \right)}\_{t\_1} \right) \tag{1}$$

is syntactically equivalent to the following but with a less pedantic notation, where language subscripts are replaced by colors (red for one, and blue for two) and prefix notation is replaced by infix notation

→**s**,**n(**f+o+ →**e**,**a(**<sup>10</sup> <sup>+</sup> <sup>5</sup>**))**

and it denotes the natural numbers 765:

<sup>t</sup><sup>4</sup><sup>A</sup> <sup>=</sup> <sup>h</sup>ls(t4)(t<sup>4</sup>) = <sup>h</sup>**<sup>e</sup>**(t<sup>4</sup>) = -+ **e e**,**e** <sup>A</sup> (-<sup>10</sup><sup>A</sup>, -<sup>5</sup>A) = -+ **e e**,**e** <sup>A</sup> (10, 5) = 15 <sup>t</sup><sup>3</sup><sup>A</sup> <sup>=</sup> <sup>h</sup>ls(t3)(t<sup>3</sup>) = <sup>h</sup>**<sup>a</sup>**(t<sup>3</sup>) = -→**e**,**a e**,**a** <sup>A</sup> (t<sup>4</sup>A) = -→**e**,**a e**,**a** <sup>A</sup> (15) = <sup>o</sup> <sup>t</sup><sup>2</sup><sup>A</sup> <sup>=</sup> <sup>h</sup>ls(t2)(t<sup>2</sup>) = <sup>h</sup>**<sup>s</sup>**(t<sup>2</sup>) = -+ **s s**,**s** <sup>A</sup> (<sup>o</sup><sup>A</sup>, t<sup>3</sup>A) = -+ **s s**,**s** <sup>A</sup> (o, <sup>o</sup>) = oo <sup>t</sup><sup>1</sup><sup>A</sup> <sup>=</sup> <sup>h</sup>ls(t1)(t<sup>1</sup>) = <sup>h</sup>**<sup>s</sup>**(t<sup>1</sup>) = -+ **s s**,**s** <sup>A</sup> (<sup>f</sup><sup>A</sup>, t<sup>2</sup>A) = -+ **s s**,**s** <sup>A</sup> (f, oo) = foo <sup>t</sup><sup>A</sup> <sup>=</sup> <sup>h</sup>ls(t)(t) = <sup>h</sup>**<sup>n</sup>**(t) = -→**s**,**n s**,**n** <sup>A</sup> (t<sup>1</sup>A) = -→**s**,**n s**,**n** <sup>A</sup> (foo) = 765

(see the proof of Prop. 2.10 in [19] to check how to compute the least sort of a term).

### **4 Refining the Construction**

The construction in Sect. 3 does not set any constraint on boundary functions, thus giving a great deal of flexibility to language designers. For instance, they can provide boundary functions that act differently with respect to the intralanguage subsort relation -: According to the previous example, it would have been possible to define <sup>α</sup>**<sup>n</sup>**,**<sup>a</sup>** <sup>=</sup> <sup>α</sup>**<sup>e</sup>**,**<sup>a</sup>** to employ different value conversion specifications for terms in T**<sup>n</sup>**, based on whether they are used as natural numbers (**n**) or as expressions (**e**). However, when this amount of flexibility is not needed, we can refine the previous construction by reducing the amount of syntax introduced by the associated signature. In this section we examine


In both cases, we prove that the introduced refinements do not affect the initiality of the term algebra, thereby providing unambiguous semantics to the multilanguage.

#### **4.1 Subsort Polymorphic Boundary Functions**

In Sect. 3, the join relation constraints s s are turned in syntactical operators →s,s in the associated signature S, -, Π. We now show how to handle all the syntactical overhead introduced by with a single polymorphic operator <sup>→</sup> whenever the boundary functions satisfy the monotonicity conditions of the order-sorted algebras [19]. Such conditions require a subsort relation <sup>s</sup><sup>1</sup> <sup>≤</sup> <sup>s</sup><sup>2</sup> between the sorts of a polymorphic operator <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>w</sup>1,s<sup>1</sup> <sup>∩</sup> <sup>Σ</sup><sup>w</sup>2,s<sup>2</sup> , assuming that <sup>w</sup><sup>1</sup> <sup>≤</sup> <sup>w</sup><sup>2</sup>. In our case, <sup>σ</sup> <sup>=</sup>→, and thus we extend Definition <sup>6</sup> with the following ad hoc constraint (2s∗):

**Definition 6**<sup>∗</sup> **(SP Multi-Language Signature).** *A* subsort polymorphic (SP) multi-language signature *is a multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *such that*

$$\{\langle 2s^\* \rangle \; s\_1 \ltimes s\_1', \; s\_2 \ltimes s\_2', \; and \; s\_1 \precsim s\_2 \text{ imply } s\_1' \precsim s\_2'.$$

Furthermore, order-sorted algebras demand consistency of the interpretation functions of a subsort polymorphic operator on the smaller domain, which results in the following condition (2a∗) on boundary functions (that extends Definition 7):

**Definition 7**<sup>∗</sup> **(SP Multi-Language Algebra).** *Let* S1, <sup>S</sup>2, ≤ *be a SP multilanguage signature*. *<sup>A</sup>* subsort polymorphic (SP) multi-language S1, <sup>S</sup>2, ≤ algebra *is a multi-language* S1, <sup>S</sup>2, ≤-*algebra* <sup>A</sup> *such that*

$$\begin{cases} \left(2a^\*\right)s\_1 \ltimes s\_1', \ s\_2 \ltimes s\_2', \text{ and } s\_1 \npreceq s\_2 \text{ imply that } \alpha\_{s\_1, s\_1'}(a) = \alpha\_{s\_2, s\_2'}(a) \text{ for each } a \in A\_{s\_1}.\\\ a \in A\_{s\_1}. \end{cases}$$

The notion of homomorphism in this new context does not change (an homomorphism between two SP algebras is still an S-sorted function decomposable in two order-sorted homomorphisms that commutes with boundaries), whereas the associated signature to an SP multi-language signature merely differs from Definition <sup>9</sup> for having a unique polymorphic operator <sup>→</sup> instead of a family of parametrized symbols { →s,s- : s <sup>→</sup> s <sup>|</sup> s s }.

**Definition 9**<sup>∗</sup> **(SP Associated Signature).** *The* subsort polymorphic (SP) associated signature *to the SP multi-language signature* S1, <sup>S</sup>2, ≤ *is the ordered triple* S, -, Π, *where* <sup>S</sup> <sup>=</sup> <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup><sup>2</sup>, -<sup>=</sup> <sup>≤</sup><sup>1</sup> ∪ ≤2, *and*

$$\begin{aligned} II &= \{ \begin{array}{l} \sigma\_1 \colon w \to s \mid \sigma \colon w \to s \in \Sigma\_1 \} \\ \cup \{ \begin{array}{l} \sigma\_2 \colon w \to s \mid \sigma \colon w \to s \in \Sigma\_2 \end{array} \} \end{array} \} \\ &\cup \{ \begin{array}{l} \longleftarrow \colon s \to s' \mid s \nmid s' \} \end{array} \end{aligned}$$

Since the associated signature is the basis for the term algebra, we need to modify the condition (3t) in Definition 9:

**Definition 10**<sup>∗</sup> **(SP Multi-Language Term Algebra).** *The* subsort polymorphic (SP) multi-language term algebra <sup>T</sup> *over a SP multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *with boundary functions* <sup>τ</sup> *is defined as follows:*

$$\begin{array}{l} \left(1t\right) \ s \in S \implies T\_{s} = T\_{\Pi,s},\\ \left(2t\right) \sigma \in \Sigma\_{w,s}^{i} \implies \left[\sigma\right]\_{T}^{w,s} = \left[\sigma\_{i}\right]\_{T\_{\Pi}^{\Pi}}^{w,s} \text{ for } i = 1,2; \text{ and} \\ \left(3t^{\*}\right) \ s \ltimes s' \implies \tau\_{s,s'} = \left[\rightharpoonup\right]\_{T\_{\Pi}^{\sf a}}^{s,s'}. \end{array}$$

Signature regularity is still defined as in Definition 11 and Proposition 2 still holds for the extended version developed in this section. As a result, the SP multi-language term S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebra <sup>T</sup> is still initial in the category **Alg**∗(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>) of SP multi-language algebras over the SP multi-language signature S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤.

**Theorem 3.** *Let* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *be a SP multi-language signature. The class of all SP* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-algebras and the class of all* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-homomorphisms form a category denoted by* **Alg**∗(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>)*.*

**Theorem 4 (Initiality of** <sup>T</sup> **).** *The SP multi-language term algebra* <sup>T</sup> *over a regular SP multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *is initial in the category* **Alg**∗(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>)*.*

The semantics of a term <sup>t</sup> induced by a SP multi-language algebra <sup>A</sup> is defined in the same way of Definition 12, thanks to the initiality result: <sup>t</sup><sup>A</sup> <sup>=</sup> <sup>h</sup>ls(t)(t). The main advantage of dealing with SP multi-language terms is that the framework is able to determine the correct interpretation function of the operator →, making the subscript notation developed in the previous section superfluous. This also means that programmers are exempted from explicitly annotating multi-language programs with sorts, a non-trivial task in the general case that could introduce type cast bugs.

**Example.** The boundary functions of the previous example are subsort polymorphic: <sup>α</sup>**<sup>a</sup>**,**n**(a) = ord(a) = <sup>α</sup>**<sup>s</sup>**,**n**(a) for each character <sup>a</sup> <sup>∈</sup> **<sup>A</sup>**, and <sup>α</sup>**<sup>n</sup>**,**<sup>a</sup>** <sup>=</sup> <sup>α</sup>**<sup>e</sup>**,**<sup>a</sup>** by definition. Thus, the equivalent of the term t (see Eq. 1) in the SP term algebra is

$$\dot{t} = \hookrightarrow \{ \star\_2(\mathsf{f}\_2, \mathsf{\star\_2}(\mathsf{o}\_2, \longleftrightarrow \{\star\_1(10\_1, 5\_1)\})) \}\tag{2}$$

or, according to the previous notation,

$$\begin{array}{c} \longleftrightarrow \begin{pmatrix} \mathtt{f} \ + \ \mathtt{o} \ + \ \longleftarrow \begin{pmatrix} 10 \ + \ 5 \end{pmatrix} \end{array} \end{array} \begin{pmatrix} \mathtt{i} \ 0 \ + \ 5 \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \xrightarrow{\sim} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \xrightarrow{\sim} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \xrightarrow{\sim} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \xrightarrow{\sim} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \begin{pmatrix} \mathtt{i} \ - \ \mathtt{i} \end{pmatrix} \end{array}$$

and denoting the same natural number 765.

#### **4.2 Semantic-Only Boundary Functions**

In the previous section, we have shown how to handle the flow of values across different languages with a single polymorphic operator. Now, we present a new multi-language construction where neither extra operators are added to the associated signature, nor single-language operators have to be annotated with subscripts indicating their original language. Thus, the resulting multi-language syntax comprises only symbols in <sup>Σ</sup><sup>1</sup> <sup>∪</sup> <sup>Σ</sup><sup>2</sup>. Such a construction is achieved by:


The variant of the framework presented in this section is particularly useful when designing the extension of a language in a modular fashion. For instance, if the signature S<sup>1</sup> models the syntax of a simple functional language (for an example, see [15, p. 77]) without an explicit encoding for string values, and S<sup>2</sup> is a language for manipulating strings (similar to the language <sup>L</sup><sup>2</sup> of the running example of this paper), we can exploit the construction presented below in order to embed S<sup>2</sup> into S1.

**Signature.** The main issue that can arise at this stage of multi-language signature is the presence of shared operators in <sup>Σ</sup><sup>1</sup> and <sup>Σ</sup><sup>2</sup>. Contrary to the previous cases where such ambiguity is solved by adding subscripts in the associated signature, the trade off here is requiring ad hoc or subsort polymorphism across signatures.

**Definition 6 (SO Multi-Language Signature).** *A* semantic-only (SO) multi-language signature *is a multi-language signature* S1, <sup>S</sup>2, ≤ *such that*

*(2s)* S, ≤ *is a poset*; *and (3s)* σ <sup>∈</sup> Σ<sup>i</sup> <sup>w</sup>1,s<sup>1</sup> <sup>∩</sup>Σ<sup>j</sup> <sup>w</sup>2,s<sup>2</sup> *and* <sup>w</sup><sup>1</sup> <sup>w</sup><sup>2</sup> *imply* <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup> *with* i, j = 1, <sup>2</sup> *and* <sup>i</sup> <sup>=</sup> <sup>j</sup>.

Condition (2s) forces the subsort relation to be directed, avoiding symmetricity of syntactic categories (this is typical when modeling language extensions), while condition (3s) shifts the monotonicity condition of order-sorted signature to syntactically equal operators in <sup>Σ</sup><sup>1</sup> <sup>∩</sup> <sup>Σ</sup><sup>2</sup>.

The associated signature is defined without adding extra symbols in the signature, i.e., <sup>Π</sup> <sup>=</sup> <sup>Σ</sup><sup>1</sup> <sup>∪</sup> <sup>Σ</sup><sup>2</sup>, and deliberately confounding the relations and - in ≤:

**Definition 9 (SO Associated Signature).** *The* SO associated signature *to the SO multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *is the ordered triple* S, <sup>≤</sup>, Π, *where* <sup>S</sup> <sup>=</sup> <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup><sup>2</sup>, <sup>≤</sup> <sup>=</sup> - <sup>∪</sup> -, and <sup>Π</sup> <sup>=</sup> <sup>Σ</sup><sup>1</sup> <sup>∪</sup> <sup>Σ</sup><sup>2</sup>.

The embedding of in <sup>≤</sup> (i.e., - ⊆ ≤) in the associated signature enables the order-sorted term algebra construction to automatically build multi-language terms, without the need for an explicit operator <sup>→</sup> that acts as a bridge between syntactic categories. It is easy to see that the term algebra over the associated signature is precisely the symbols-free version of multi-language described at the beginning.

Unfortunately, multi-language regularity does not follow anymore from single-languages regularity and vice versa (see Figs. 3 and 4)<sup>6</sup>. More formally, Proposition 2 does not hold in this new context:

<sup>6</sup> An (horizontal) arrow from an arity symbol w to a sort s labelled with an operator symbol σ is an alternative shorthand for σ : w → s. A (vertical) single line between two sorts s below s labelled with a binary relation ≤ means that s ≤ s- (if the binary relation is the join relation the line is doubled). A dotted rectangle around operators is a graphical representation of the set of ranks (w, s) that must have a minimum element (red arrows) in order for the signature to be regular.

**Fig. 3.** A non-regular multi-language signature comprising two regular order-sorted signatures.

**Fig. 4.** A regular multi-language signature comprising a non-regular order-sorted signature.


A positive result can be obtained by recalling that regularity is easier to check when S, ≤ satisfies the descending chain condition (DCC):

**Lemma 1 (Regularity over DCC poset** [19]**).** *An order-sorted signature* Σ *over a DCC poset* S, ≤ *is regular if and only if whenever* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w1,s<sup>1</sup> <sup>∩</sup> <sup>Σ</sup>w2,s<sup>2</sup> *and there is some* <sup>w</sup><sup>0</sup> <sup>≤</sup> <sup>w</sup><sup>1</sup>, w<sup>2</sup>*, then there is some* <sup>w</sup> <sup>≤</sup> <sup>w</sup><sup>1</sup>, w<sup>2</sup> *such that* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w,s *and* <sup>w</sup><sup>0</sup> <sup>≤</sup> <sup>w</sup>*.*

At this point, we can relate the DCC of the poset S, ≤ in the associated signature of S1, <sup>S</sup>2, ≤ to the DCC of S1, <sup>≤</sup>1 and S2, <sup>≤</sup>2:

**Proposition 3.** *Let* S, <sup>≤</sup>, Σ *be the associated signature of* S1, <sup>S</sup>2, ≤*. Then,* S, ≤ *is DCC if and only if* S1, <sup>≤</sup>1 *and* S2, <sup>≤</sup>2 *are DCC.*

As a result, whenever we know that S1, <sup>≤</sup>1 and S2, <sup>≤</sup>2 are DCC, we can check the regularity of S1, <sup>S</sup>2, ≤ by employing the Lemma <sup>1</sup> without checking whether S, ≤ is DCC.

**Algebra.** In this multi-language construction, the boundary functions behaviour is no more bounded to syntactical operators as in the previous sections, but it is inherited by homomorphisms. A necessary condition to accomplish this aim is the commutativity of interpretation functions with boundary functions:

**Definition 7 (SO Multi-Language Algebra).** *Let* S1, <sup>S</sup>2, ≤ *be an SO multi-language signature*. *<sup>A</sup>* semantic-only (SO) multi-language S1, <sup>S</sup>2, ≤ algebra *is an SP multi-language* S1, <sup>S</sup>2, ≤-*algebra* <sup>A</sup> *such that*

*(3a)* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>w</sup>1,s<sup>1</sup> <sup>∩</sup> <sup>Σ</sup><sup>w</sup>2,s<sup>2</sup> and <sup>w</sup><sup>1</sup> <sup>w</sup><sup>2</sup> imply that <sup>α</sup><sup>s</sup>1,s<sup>2</sup> (σ w1,s<sup>1</sup> <sup>A</sup> (a)) = σ w2,s<sup>2</sup> <sup>A</sup> (α<sup>w</sup>1,w<sup>2</sup> (a)) for each <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>w</sup><sup>1</sup> .

Note that <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>w</sup>1,s<sup>1</sup> <sup>∩</sup>Σ<sup>w</sup>2,s<sup>2</sup> and <sup>w</sup><sup>1</sup> <sup>w</sup><sup>2</sup> imply <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup> by condition (3s). The notion of homomorphism remains unchanged from Definition 8 (to understand how the homomorphisms inherit the boundary functions behaviour, see the proof of Theorem 6).

The term algebra is defined similarly to Definition 10, except for boundary functions:

**Definition 10 (SO Multi-Language Term Algebra).** *The* semantic-only (SO) multi-language term algebra <sup>T</sup> *over an SO multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *with boundary functions* τ *is defined as follows:*

*(1t)* <sup>s</sup> <sup>∈</sup> <sup>S</sup> *implies* <sup>T</sup><sup>s</sup> <sup>=</sup> <sup>T</sup>Π,s; *(2t)* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>w,s *implies* σ w,s <sup>T</sup> <sup>=</sup> σ w,s <sup>T</sup><sup>Π</sup> ; *and (3t)* s s *implies* τs,s-= id<sup>T</sup><sup>s</sup> .

Since the subsort relation <sup>≤</sup> includes the join relation -, s<sup>s</sup> implies <sup>T</sup>Π,s <sup>=</sup> <sup>T</sup><sup>s</sup> <sup>⊆</sup> <sup>T</sup><sup>s</sup>- <sup>=</sup> TΠ,s- . Thus, the boundary function τs,s can be defined as the identity on the smaller domain (note that it trivially satisfies the commutativity condition (3a)).

**Proposition 4.** *Let* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *be an SO multi-language signature. Then, the SO multi-language term* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-algebra is a proper SO multi-language algebra.*

**Theorem 5.** *Let* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤ *be a SO multi-language signature. The class of all SO* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-algebras and the class of all* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-homomorphisms form a category denoted by* **Alg**(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>)*.*

We can now prove the initiality of T in its category.

**Theorem 6 (Initiality of** <sup>T</sup> **).** *Let* S1, <sup>S</sup>2, ≤ *be a regular multi-language signature. Then, the term algebra* <sup>T</sup> *is an initial object in the category* **Alg**(S1, <sup>S</sup>2, <sup>≤</sup>)*.*

Thanks to the initiality of the term algebra, the definition of term semantics is the same of Definition 12.

**Example.** Let A<sup>1</sup> and A<sup>2</sup> be two order-sorted algebras over the signatures S<sup>1</sup> and S2, respectively, as formalized in the example in Sect. 3. Suppose we are interested in a new multi-language A over S<sup>1</sup> and S<sup>2</sup> such that any string expressions <sup>t</sup> of sort **<sup>s</sup>** in <sup>S</sup><sup>2</sup> can denote the natural number length(<sup>t</sup><sup>A</sup><sup>2</sup> ) when embedded in S<sup>1</sup> terms. For instance, we require that -<sup>10</sup> <sup>+</sup> <sup>5</sup><sup>A</sup> <sup>=</sup> -<sup>10</sup> <sup>+</sup> <sup>5</sup><sup>A</sup><sup>1</sup> = 15 and f+o<sup>A</sup> <sup>=</sup> f+o<sup>A</sup><sup>2</sup> <sup>=</sup> fo, but -(f+o) <sup>+</sup> (10 <sup>+</sup> 5)<sup>A</sup> <sup>=</sup> fo + 15<sup>L</sup> = 17 (parentheses in the last term have only been used to disambiguate the parsing result).

Since the requirements demand to use string expressions in place of natural numbers, the join relation shall define **s n** and ensure transitivity, hence **s e**, **a n**, and **a e**.

The signatures S<sup>1</sup> and S<sup>2</sup> are trivially regular. However, by merging S<sup>1</sup> and S2, we are causing subsort polymorphism on the symbol +, which is used as sum operator in A<sup>1</sup> and as concatenation operator in A2, and therefore we have to check the regularity: Let <sup>w</sup><sup>1</sup> <sup>=</sup> **e e**, <sup>w</sup><sup>2</sup> <sup>=</sup> **s s**, <sup>s</sup><sup>1</sup> <sup>=</sup> **<sup>e</sup>**, and <sup>s</sup><sup>2</sup> <sup>=</sup> **<sup>s</sup>**. Given <sup>+</sup> <sup>∈</sup> <sup>Σ</sup><sup>w</sup>1,s<sup>1</sup> <sup>∩</sup> <sup>Σ</sup><sup>w</sup>2,s<sup>2</sup> and the lower bound <sup>w</sup><sup>0</sup> <sup>=</sup> **a a** <sup>≤</sup> <sup>w</sup><sup>1</sup>, w<sup>2</sup>, then there exists <sup>w</sup> <sup>=</sup> **s s** such that <sup>w</sup> <sup>≤</sup> <sup>w</sup><sup>1</sup>, w<sup>2</sup> and <sup>+</sup> <sup>∈</sup> <sup>Σ</sup>w,s, where <sup>s</sup> <sup>=</sup> **<sup>s</sup>** <sup>≤</sup> <sup>s</sup><sup>1</sup>, s<sup>2</sup> (we have employed Lemma <sup>1</sup> thanks to Proposition 3). Analogously, when <sup>w</sup><sup>0</sup> <sup>=</sup> <sup>w</sup><sup>1</sup>, w<sup>2</sup> the relative least rank is (**s s**,**s**).

The multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebra <sup>A</sup> is now defined by joining the projected algebras <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> and by defining boundary functions <sup>a</sup>s,s for each s s such that convert strings in naturals (their length) when strings are used in place of naturals:

$$a\_{\mathfrak{a},\mathfrak{n}}(a) = a\_{\mathfrak{a},\mathfrak{e}}(a) = 1 \qquad \qquad a\_{\mathfrak{s},\mathfrak{n}}(\hat{a}) = a\_{\mathfrak{s},\mathfrak{e}}(\hat{a}) = \text{length}(\hat{a})^{\mathfrak{e}}$$

The above definition of boundary functions satisfy both conditions (2a∗) and (3a).

The initiality theorem yields the semantic homomorphism from T to A. For instance, suppose we want to compute the semantics of the term

$$t = \star(\underbrace{\star(\mathbf{f}, \mathbf{o})}\_{t\_1}, \overbrace{\star(10, 5)}^{t\_2})$$

The least sorts of <sup>t</sup>, <sup>t</sup><sup>1</sup>, and <sup>t</sup><sup>2</sup> are **<sup>e</sup>**, **<sup>s</sup>**, and **<sup>e</sup>**, respectively. The operator <sup>+</sup> belongs to both <sup>Σ</sup>**e e**,**<sup>e</sup>** and <sup>Σ</sup>**s s**,**s**, and its least rank w.r.t. the lower bound ls(t<sup>1</sup>) ls(t<sup>2</sup>) = **s e** is (**e e**, **<sup>e</sup>**). By Definition <sup>12</sup> we have

$$\left[\mathbb{I}\right]\_{\mathcal{A}} = h\_{\mathfrak{e}}(t) = \left[\mathbb{H}\right]\_{\mathcal{A}}^{\mathfrak{e},\mathfrak{e},\mathfrak{e}}(h\_{\mathfrak{e}}(t\_1), h\_{\mathfrak{e}}(t\_2))^{\mathfrak{e}}$$

At this point, since ls(t<sup>1</sup>) = **<sup>s</sup>** and ls(f) = ls(o) = **<sup>a</sup>**, then the least rank of the root symbol <sup>+</sup> of <sup>t</sup><sup>1</sup> w.r.t. the lower bound ls(f) ls(o) = **a a** is (**s s**,**s**), thus

$$h\_{\mathfrak{a}}(t\_1) = a\_{\mathfrak{a},\mathfrak{a}}(h\_{\mathfrak{a}}(t\_1)) = a\_{\mathfrak{a},\mathfrak{a}}(\left[\![\![\![\![\![\mathfrak{f}\]\!]\!]\!]\_{\mathfrak{A}}(\mathfrak{f}\mathfrak{a})\!),h\_{\mathfrak{a}}(\mathfrak{o})\!)\right]) = a\_{\mathfrak{a},\mathfrak{a}}(\left[\![\![\![\![\mathfrak{f}\!]\!]\!]\!]\_{\mathfrak{A}}(\mathfrak{f}\mathfrak{a})\!) = a\_{\mathfrak{a},\mathfrak{a}}(\mathfrak{f}\mathfrak{a}) = 2\pi)$$

Similarly, ls(t<sup>2</sup>) = **<sup>e</sup>** and ls(10) = ls(5) = **<sup>n</sup>**. Then, the least rank of the root symbol <sup>+</sup> of <sup>t</sup><sup>2</sup> w.r.t. the lower bound (**n**, **<sup>n</sup>**) is (**e e**, **<sup>e</sup>**) and therefore we have

$$h\_{\mathfrak{e}}(t\_2) = \left[\mathfrak{\star}\right]\_{\mathcal{A}}^{\mathfrak{e},\mathfrak{e},\mathfrak{e}}(h\_{\mathfrak{e}}(10), h\_{\mathfrak{e}}(5)) = \left[\mathfrak{\star}\right]\_{\mathcal{A}}^{\mathfrak{e},\mathfrak{e},\mathfrak{e}}(10, 5) = 15$$

Finally,

$$\left[\mathbb{I}\right]\_{\mathcal{A}} = h\_{\mathfrak{e}}(t) = \left[\mathbb{H}\right]\_{\mathcal{A}}^{\mathfrak{e},\mathfrak{e},\mathfrak{e}}(h\_{\mathfrak{e}}(t\_1), h\_{\mathfrak{e}}(t\_2)) = \left[\mathbb{H}\right]\_{\mathcal{A}}^{\mathfrak{e},\mathfrak{e},\mathfrak{e}}(2,15) = 170$$

as desired.

We can observe that without any syntactical operator the framework is still able to apply the correct boundary functions to move values across languages.

#### **5 Reduction to Order-Sorted Algebra**

The constructions in the previous sections beg the question whether a multilanguage algebra admits an equivalent order-sorted representation. Conceptually, it would mean that being a multi-language is essentially a matter of perspective: By forgetting how the multi-language has been constructed, what is left is simply an ordinary language. Mathematically speaking, it requires us to exhibit a *reduction functor* F from the multi-language category to an order-sorted one, such that there is an isomorphism φ between the carrier sets of the multi-language term S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebra <sup>T</sup> and <sup>F</sup>(<sup>T</sup> ), and such that <sup>t</sup><sup>A</sup> <sup>=</sup> <sup>φ</sup>(t)<sup>F</sup> (A) for each <sup>t</sup> ∈ T and for each multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebra <sup>A</sup>.

In the following, we denote the reduction functor by F, F<sup>∗</sup>, and <sup>F</sup> accordingly whether its domain is the category **Alg**(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>), **Alg**∗(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>), and **Alg**(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>), respectively.

In the case of **Alg**(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>) and **Alg**∗(S<sup>1</sup>, <sup>S</sup><sup>2</sup>, <sup>≤</sup>) categories, the construction of F and F<sup>∗</sup> is very simple, and we illustrate it only for the plain multilanguage algebras of Sect. 3: Let <sup>A</sup> be a multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebra. Then, we define the order-sorted S, -, Π-algebra <sup>A</sup><sup>Π</sup> (called the *associated order-sorted algebra* of <sup>A</sup>) by setting

(1π) <sup>A</sup>Π,s <sup>=</sup> <sup>A</sup><sup>s</sup> for each <sup>s</sup> <sup>∈</sup> <sup>S</sup>; (2π) σi w,s <sup>A</sup><sup>Π</sup> <sup>=</sup> σ w,s <sup>A</sup> for each <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>i</sup> w,s and <sup>i</sup> = 1, 2; and (3π) -→s,s- s,s- <sup>A</sup><sup>Π</sup> <sup>=</sup> <sup>α</sup>s,s for each s s .

If <sup>A</sup> and <sup>B</sup> are multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-algebras, and <sup>h</sup> is a multi-language S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤-homomorphism from <sup>A</sup> to <sup>B</sup>, the functor F maps <sup>A</sup> and <sup>B</sup> to their associated order-sorted algebras <sup>A</sup><sup>Π</sup> and <sup>B</sup><sup>Π</sup> and the homomorphism <sup>h</sup> to itself. Since <sup>A</sup><sup>Π</sup> <sup>=</sup> <sup>A</sup>, the isomorphism <sup>φ</sup> is the identity function.

**Theorem 7.** F : **Alg**(S1, <sup>S</sup>2, <sup>≤</sup>) <sup>→</sup> **OSAlg**(S1, <sup>S</sup>2, <sup>≤</sup>) *is a functor for every multi-language signature* S1, <sup>S</sup>2, ≤*. Moreover,* <sup>t</sup><sup>A</sup> <sup>=</sup> <sup>t</sup><sup>F</sup> (A) *for each* <sup>t</sup> ∈ T *and for each multi-language* S1, <sup>S</sup>2, ≤*-algebra* <sup>A</sup>*.*

If <sup>A</sup> is an SP multi-language S1, <sup>S</sup>2, ≤-algebra, the construction of the reduction functor F<sup>∗</sup> is similar to the definition of <sup>F</sup>. The only difference is the equation in the condition (3π) that turns into

(3π<sup>∗</sup>) -→ s,s- <sup>A</sup><sup>Π</sup> <sup>=</sup> <sup>α</sup>s,s for each s s .

Finally, the definition of F starting from the category **Alg**(S1, <sup>S</sup>2, <sup>≤</sup>) of SO multi-language algebras is slightly different. We define F as a map from the multi-language category **Alg**(S1, <sup>S</sup>2, <sup>≤</sup>) to the order-sorted category **OSAlg**(S, -, Σ). We denote the reduction of a multi-language algebra <sup>A</sup> and a homomorphism h: A→B as F(A) = <sup>A</sup> and <sup>F</sup>(h) = <sup>h</sup>- : A- → B-. The order-sorted algebra A has the same carrier sets of the multi-language algebra <sup>A</sup>, i.e., A- <sup>=</sup> <sup>A</sup>, and interpretation functions σ w,s A- = σ w,s <sup>A</sup> . Furthermore, we define h- <sup>=</sup> <sup>h</sup>. Intuitively, the algebra <sup>A</sup> is formally defined simply by forgetting about the boundary functions, while the homomorphism h- : A- → B inherits their semantics from h. Again, the isomorphism φ is the identity.

**Theorem 8.** F : **Alg**(S1, <sup>S</sup>2, <sup>≤</sup>) <sup>→</sup> **OSAlg**(S, -, Σ) *is a functor for every SO multi-language signature* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*. Moreover,* <sup>t</sup><sup>A</sup> <sup>=</sup> <sup>t</sup><sup>F</sup> (A) *for each* t ∈ T *and for each SO multi-language* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-algebra* <sup>A</sup>*.*

Unfortunately, even though <sup>T</sup> is an initial algebra in its category, F(<sup>T</sup> ) = <sup>T</sup> is not: Given two multi-language algebras A and A that differ only in the boundary functions (we denote by α and α the families of boundary functions of <sup>A</sup> and A , respectively) they both get mapped by F to the same order-sorted algebra A-. Thus, if h: T →A and h : T →A are the unique homomorphisms going from T to A and A , the functor F maps them to two different order-sorted homomorphisms h- : T- → A and <sup>h</sup> - : T- → A both leaving T and going to A-, hence losing the uniqueness property. However, this does not pose a problem once fixed a family of boundary functions:

**Theorem 9.** *Let* <sup>T</sup> *be the multi-language term* S<sup>1</sup>, <sup>S</sup><sup>2</sup>, ≤*-algebra and* <sup>A</sup> *be an order-sorted* S, -, Σ*-algebra. Given a family of boundary functions* α <sup>=</sup> { αs,s- <sup>|</sup> s s } *such that satisfies condition (3a), there exists a unique ordersorted* S, -, Σ*-homomorphism* h<sup>α</sup> : <sup>T</sup>- → A *commuting with* <sup>α</sup>*, i.e., if* <sup>s</sup> s *, then* h<sup>α</sup> s- (t) = <sup>α</sup>s,s- (hα <sup>s</sup> (t)) *for each* <sup>t</sup> <sup>∈</sup> <sup>T</sup><sup>s</sup>*.*

The reduction theorems presented in this section have a strong consequence: all the already known results for the order-sorted algebras can be lifted to the multi-language world.

### **6 An Example of Multi-Language Construction**

The first theoretical paper addressing the problem of multi-language construction is [30]. The authors study the so-called *natural embedding* (a more realistic improvement of the *lump embedding* [7,30,34,40]), in which Scheme terms can be converted to equivalent ML terms, and vice versa.<sup>7</sup> The novelty in their approach is how they succeed to define boundaries in order to translate values from Scheme to ML. Indeed, the latter does not admit an equivalent representation for each Scheme function. Their solution is to *"represent a Scheme procedure in ML at type* <sup>τ</sup><sup>1</sup> <sup>→</sup> <sup>τ</sup><sup>2</sup> *by a new procedure that takes an argument of type* <sup>τ</sup>1*, converts it to a Scheme equivalent, runs the original Scheme procedure on that value, and then converts the result back to ML at type* τ2*"*.

Our goal here is not to discuss a fully explained presentation of ML and Scheme languages in the form of order-sorted algebras, but rather to show how we can model the natural embedding construction in our framework. Doing so, we provide a sketchy formalization of Scheme and ML syntax and semantics, and we redirect the reader to [30] for all the languages details.

To provide the semantics of Scheme, we follow the same approach of Goguen et al. [15] where the denotational semantics of the *simple applicative language* (SAL) introduced by Reynolds [42] is given by means of an algebra, exploiting the initiality theorem. Such a language is a "syntactically sugared" version of the untyped lambda calculus with the fixpoint operator, which in turn is very similar to Scheme.

Let X <sup>=</sup> { <sup>x</sup><sup>1</sup>, <sup>x</sup><sup>2</sup>,... } be a set of variables and **<sup>N</sup>** be the naturals lattice with and <sup>⊥</sup> adjoined. From [46], there exists a complete lattice V such that satisfies the isomorphism <sup>φ</sup>: <sup>V</sup> <sup>∼</sup><sup>=</sup> **<sup>N</sup>** <sup>+</sup> <sup>V</sup> → <sup>V</sup> , where + is the disjoint union with minimum and maximum elements identified, and V → V is the complete lattice of Scott-continuous functions from V to V . Given ξ ∈ { **<sup>N</sup>**, V → <sup>V</sup> }, we define the injections <sup>j</sup><sup>ξ</sup> : <sup>ξ</sup> <sup>→</sup> **<sup>N</sup>** <sup>+</sup> <sup>V</sup> → <sup>V</sup> and <sup>i</sup><sup>ξ</sup> <sup>=</sup> <sup>φ</sup><sup>−</sup><sup>1</sup> ◦ <sup>j</sup><sup>ξ</sup>, and the projection <sup>π</sup><sup>ξ</sup> : <sup>V</sup> <sup>→</sup> <sup>ξ</sup> such that <sup>π</sup><sup>ξ</sup>(v) = **(** <sup>φ</sup>(v) <sup>∈</sup> <sup>ξ</sup> **?** <sup>φ</sup>(v) **:** <sup>⊥</sup> **)**. The set of all Scheme environments is the lattice of all total functions P = X <sup>→</sup> V with componentwise ordering ρ ρ if and only if <sup>ρ</sup>(x) <sup>ρ</sup> (x) in V for all x <sup>∈</sup> X. Furthermore, we define auxiliary functions (see [15] for a more detailed explanation) in order to provide the semantics of the language (in the following, x <sup>∈</sup> X and n <sup>∈</sup> **<sup>N</sup>**):


<sup>7</sup> To be specific, the authors combine *"an extended model of the untyped call-by-value lambda calculus, which is used as a stand-in for Scheme, and an extended model of the simply-typed lambda calculus, which is used as a stand-in for ML"*.


$$clicize(v\_1, v\_2, v\_3) = \begin{cases} \top & \text{if } v\_1 = \top \\ v\_2 & \text{if } v\_1 = 0 \\ v\_3 & \text{if } v\_1 \neq 0 \\ \bot & \text{otherwise} \end{cases} \quad add(v\_1, v\_2) = \begin{cases} \top & \text{if } v\_1, v\_2 = \top \\ v\_1 + v\_2 & \text{if } v\_1, v\_2 \in \mathbb{N} \\ \bot & \text{otherwise} \end{cases}$$

The definition of *sub* is analogous to the function *add*, with the only difference that, in the second case, *sub*(v1, v<sup>2</sup>) = <sup>v</sup><sup>1</sup> <sup>−</sup>**<sup>N</sup>** <sup>v</sup><sup>2</sup>, where <sup>v</sup><sup>1</sup> <sup>−</sup>**<sup>N</sup>** <sup>v</sup><sup>2</sup> <sup>=</sup> max { <sup>v</sup><sup>1</sup> <sup>−</sup> <sup>v</sup>2, <sup>0</sup> } for each <sup>v</sup>1, v<sup>2</sup> <sup>∈</sup> **<sup>N</sup>**.

The semantics of the language is obtained by defining an algebra H over a signature H, <sup>8</sup> then the initiality yields the unique homomorphism from the term algebra. A Scheme term denotes a continuous function in the semantic domain <sup>H</sup>**<sup>e</sup>** = P → <sup>V</sup> . The interpretation functions of the operators are defined by the following equations:

$$\begin{array}{ll} \left[x\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}} = get\_{x} & \left[\lambda x\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}}(\hat{e}) = i\_{V\diamondsuit \cdot V} \circ abs\_{\mathbf{P},V,V}(\hat{e}\circ put\_{x}) \\ \left[\mathsf{T}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}}(\hat{e}\_{1},\hat{e}\_{2}) = app \diamond \left\{\hat{e}\_{1},\hat{e}\_{2}\right\} & \left[\mathsf{proc}\mathsf{P}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}}(\hat{e}) = proc\mathcal{I} \diamond \hat{e} \\ \left[\overline{n}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}} = val\_{n} & \left[\mathsf{if}\,\mathbf{0}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e},\mathbf{e}}(\hat{e}\_{1},\hat{e}\_{2},\hat{e}\_{3}) = choice \diamondsuit\left\{\hat{e}\_{1},\hat{e}\_{2},\hat{e}\_{3}\right\} \\ \left[\mathsf{T}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}}(\hat{e}\_{1},\hat{e}\_{2}) = add \diamondsuit\left\{\hat{e}\_{1},\hat{e}\_{2}\right\} & \left[\mathsf{nat}\,\mathbf{?}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}}(\hat{e}) = nat\mathcal{I} \diamond\hat{e} \\ \left[\mathsf{T}\right]\_{\mathcal{H}}^{\mathbf{e},\mathbf{e}}(\hat{e}\_{1},\hat{e}\_{2}) = sub \diamondsuit\left\{\hat{e}\_{1},\hat{e}\_{2}\right\} \end{array}$$

For the sake of simplicity, we made a minor change to the language presented in [30]. They have an extra operator wrong to print an error message in case of an illegal operation, due to the lack of a type system. For instance, the sum of two functions produces the error wrong "non-number". To avoid to add cases almost everywhere in the definition of the interpretation functions, we let illtyped terms to denote the value ⊥ without an explicit encoding of the error message. Furthermore, we denote by ' the function application.

<sup>8</sup> We do not define H explicitly since it can be inferred by the algebra equations below.

The ML-like language defined in [30] is an extended version of the simply-typed lambda calculus. As before, we provide its semantics by defining an algebra M over an order-sorted signature <sup>M</sup> <sup>=</sup> S<sup>2</sup>, <sup>≤</sup><sup>2</sup>, Σ<sup>2</sup>.

Let I (should read 'iota') be a set of *base types* and K a I-sorted set of *base values* <sup>K</sup> <sup>=</sup> { <sup>K</sup><sup>ι</sup> <sup>|</sup> <sup>ι</sup> <sup>∈</sup> <sup>I</sup> }. We inductively define the set of *simple types* T: If ι is a base type, then it is a simple type; If τ,τ are simple types, then (τ ) <sup>→</sup> (τ ) is a simple type (henceforth we omit the parentheses). We abuse notation and extend <sup>K</sup> to the T-sorted set of *simple values* <sup>K</sup> <sup>=</sup> { <sup>K</sup><sup>τ</sup> <sup>|</sup> <sup>τ</sup> <sup>∈</sup> <sup>T</sup> } where K<sup>τ</sup>→τ- <sup>=</sup> <sup>K</sup><sup>τ</sup> <sup>→</sup> <sup>K</sup><sup>τ</sup>-.

The set of all ML environments is defined as the set of all total functions Δ <sup>=</sup> Y <sup>→</sup> K, where Y <sup>=</sup> { <sup>y</sup><sup>1</sup>, <sup>y</sup><sup>2</sup>,... } is a set of variables disjoint from <sup>X</sup> (this assumption comes from [30]) and K <sup>=</sup> - <sup>τ</sup>∈<sup>T</sup> <sup>K</sup><sup>τ</sup> . We instantiate I = { **<sup>n</sup>** } and <sup>K</sup>**<sup>n</sup>** <sup>=</sup> **<sup>N</sup>**. The poset S2, <sup>≤</sup>2 carries all the simple types (i.e., T <sup>⊆</sup> <sup>S</sup><sup>2</sup>) and the sort **<sup>t</sup>**; <sup>≤</sup><sup>2</sup> is the reflexive relation on <sup>S</sup><sup>2</sup> plus <sup>τ</sup> <sup>≤</sup><sup>2</sup> **<sup>t</sup>** for each <sup>τ</sup> <sup>∈</sup> T. An ML term of type <sup>τ</sup> denotes a total function in <sup>M</sup><sup>τ</sup> <sup>=</sup> <sup>Δ</sup> <sup>→</sup> <sup>K</sup><sup>τ</sup> , and we define <sup>M</sup>**<sup>t</sup>** <sup>=</sup> <sup>Δ</sup> <sup>→</sup> <sup>K</sup>. Due to the Turing-incompleteness of such a language, we do not need all the mathematical machinery of [15,46] to formalize its semantics.

$$\begin{split} \left[\mathbbm{1}\right]\_{\mathcal{M}}^{\varepsilon,\texttt{f}} &\delta \mapsto \delta(\mathsf{y}) \quad \quad \quad \quad \quad \left[\mathbbm{1}\right]\_{\mathcal{M}}^{\tau',\tau-\tau'}(\hat{t}) = \delta \mapsto k\_{\tau} \mapsto \hat{t}(\delta[\mathsf{f}\_{\mathsf{f}}/\mathsf{y}]) \\ \left[\mathbbm{1}\right]\_{\mathcal{M}}^{\varepsilon,\texttt{n}} &\delta \mapsto n \quad \quad \quad \quad \quad \quad \left[\mathbbm{1}\right]\_{\mathcal{M}}^{\tau-\tau',\tau-\tau'}(\hat{t}\_{1},\hat{t}\_{2}) = \delta \mapsto (\hat{t}\_{1}(\delta))(\hat{t}\_{2}(\delta)) \\ \left[\mathbbm{1}\right]\_{\mathcal{M}}^{\mathsf{n},\texttt{n}}(\hat{n}\_{1},\hat{n}\_{2}) = \delta \mapsto \hat{n}\_{1}(\delta) + \hat{n}\_{2}(\delta) \quad \quad \quad \quad \left[\mathbbm{1}\right]\_{\mathcal{M}}^{\mathsf{n},\texttt{n},\texttt{n}}(\hat{n}\_{1},\hat{n}\_{2}) = \delta \mapsto \hat{n}\_{1}(\delta) - \mathsf{y}\_{1}\hat{n}\_{2}(\delta) \\ \left[\mathbbm{1}\mathbf{1}\mathbf{0}\right]\_{\mathcal{M}}^{\mathsf{n},\texttt{\tau},\texttt{\tau}}(\hat{n}\_{1},\hat{t}\_{2}) = \delta \mapsto \\ \left[\hat{n}(\delta) = 0 \ \narrow \hat{t}\_{1}(\delta) \circ \hat{t}\_{2}(\delta)\right] \end{split}$$

Until now, we have just formalized the single-languages. The multi-language A that combines Scheme and ML is obtained by requiring **e** τ and τ **e** in order to use ML terms in place of Scheme terms and vice versa. However, in the simplest version of the natural embedding, *"the system has stuck states, since a boundary might receive a value of an inappropriate shape"* [30]. They restore the type-soundness by first employing dynamic checks, and then by decoupling error-handling from the value conversion through the use of higher-order contracts [12]. We limit ourselves here to describe the first version; the subsequent refinements can be embodied by further complicating the semantics of the boundary functions (we do not have forced any constraints on them).

Since we need a value representing the notion of *stuck state* in ML, we have to extend the algebra M. This is particularly easy by exploiting the underlying framework: We make <sup>M</sup><sup>⊥</sup> into an order-sorted <sup>M</sup>-algebra by defining <sup>M</sup><sup>⊥</sup> <sup>τ</sup> = Δ<sup>⊥</sup> <sup>→</sup> K<sup>⊥</sup> <sup>τ</sup> , where <sup>Δ</sup><sup>⊥</sup> <sup>=</sup> <sup>Y</sup> <sup>→</sup> <sup>K</sup><sup>⊥</sup>, <sup>K</sup><sup>⊥</sup> <sup>=</sup> - <sup>τ</sup>∈<sup>T</sup> <sup>K</sup><sup>⊥</sup> <sup>τ</sup> , and <sup>K</sup><sup>⊥</sup> <sup>τ</sup> <sup>=</sup> <sup>K</sup><sup>τ</sup> ∪{⊥}, and the T-sorted injection <sup>φ</sup> from <sup>M</sup><sup>τ</sup> to <sup>M</sup><sup>⊥</sup> <sup>τ</sup> such that <sup>ϕ</sup>(<sup>t</sup> ˆ) = t ˆ. Now, <sup>M</sup><sup>⊥</sup> becomes an algebra by letting ϕ to be an order-sorted <sup>M</sup>-homomorphism (this in turn forces -− w,s <sup>M</sup><sup>⊥</sup> <sup>=</sup> -− w,s <sup>M</sup> ) and letting the interpretation functions to denote the value ⊥ in the remaining non-yet defined cases (namely, they compute the value ⊥ whenever one of their arguments is ⊥).

The boundary function <sup>α</sup>**e**,τ (ˆe) moves the Scheme value ˆe: P → <sup>V</sup> in <sup>M</sup><sup>τ</sup> :

$$\alpha\_{\mathfrak{e},\tau}(\hat{e}) = \begin{cases} \alpha\_{\mathfrak{e},\tau}^{\mathbb{N}^{\phi}}(\hat{e}) & \text{if } \hat{e} = val\_n \text{ for some } n \in \mathbb{N}^{\phi} \\ \alpha\_{\mathfrak{e},\tau}^{V\diamond \longleftrightarrow V}(\hat{e}) & \text{otherwise} \end{cases}$$

where α**N <sup>e</sup>**,τ (*val*n) = **(** <sup>τ</sup> <sup>=</sup> **<sup>n</sup>** <sup>∧</sup> <sup>n</sup> <sup>∈</sup> **N ?** <sup>δ</sup> → <sup>n</sup> **:** <sup>⊥</sup> **)** and

$$\alpha^{V\diamondsuit \to V}\_{\mathbf{e},\tau}(\hat{e}) = \begin{cases} \delta \mapsto k'\_{\tau} \mapsto \lceil \lambda y^{\tau'} \rceil\_{\mathcal{M}\downarrow}^{\tau'', \tau' \to \tau''} (\alpha\_{\mathbf{e},\tau\prime}(\hat{e}' \diamond put\_{x}(\perp, \alpha\_{\tau', \mathbf{e}}(k\_{\tau')}))) \\ \qquad\text{if } \tau = \tau' \to \tau'' \text{ and } \hat{e} = i\_{V\diamondsuit \to V} \diamond abs\_{\mathbf{P},V,V}(\hat{e}' \diamond put\_{x}) \\ \qquad\text{for some } x \in X \text{ and } \hat{e}' \in V \diamond \longleftrightarrow V \\ \bot \\ \text{otherwise} \end{cases}$$

Vice versa, <sup>α</sup>τ,**e**(t ˆ) moves values from ML to Scheme. Its definition is analogous to the previous case: <sup>α</sup>**<sup>n</sup>**,**e**(ˆn) = *val*<sup>n</sup> where ˆ<sup>n</sup> <sup>=</sup> <sup>δ</sup> → <sup>n</sup>, and

$$\alpha\_{\tau \to \tau', \mathfrak{e}} = \rho \mapsto v \mapsto \lceil \lambda x \rceil\_{\mathcal{H}}^{\mathfrak{e}, \mathfrak{e}} (\alpha\_{\tau', \mathfrak{e}} (\hat{t} (\bot [\alpha\_{\mathfrak{e}, \tau} (v)/y]))) \rceil$$

These definitions adhere the conversion approach of the natural embedding in [30]: If ˆe is the value denoted by a natural number in Scheme, then it is converted—aside from cases deriving from ill-typed terms—by α**<sup>N</sup> <sup>e</sup>**,**<sup>n</sup>** to the corresponding constant function denoting the same natural value in ML. Otherwise, if ˆe is the value denoted by a Scheme function, then it is mapped by α<sup>V</sup> →<sup>V</sup> **<sup>e</sup>**,τ→τ to the ML function with variable x at type τ <sup>→</sup> τ such that converts its argument of type <sup>τ</sup> to the Scheme equivalent by its conversion through <sup>α</sup>τ,**<sup>e</sup>** to <sup>x</sup>. Then it runs the original procedure ˆe on it and convert back the result by α**<sup>e</sup>**,τ-.

Since the given boundary functions are subsort polymorphic, we can improve the construction and handle all the value conversions with a single polymorphic operator as explained in Sect. 4.1.

### **7 Concluding Remarks**

In this paper, we have addressed the problem of providing a formal semantics to the combination of programming languages, the so-called *multi-languages*. We have introduced a new algebraic framework for modeling this new paradigm, and we have constructively shown how to attain a multi-language specification by only stipulate (1) how the syntactic categories of the single-languages have to be combined and (2) how the values may flow from one language to the other. We have proved the suitability of the framework to unambiguously yield the algebraic semantics of each multi-language term, while simultaneously preserving the single-languages semantics. We have also proved that combining languages is a close operation, i.e., that every multi-language admits an equivalent ordersorted representation. In particular, we have focused our study on the semantic properties of boundary functions in order to provide three different notions of multi-language designed to suit both general and specific cases.

To the best of our knowledge, this is the first attempt to provide a formal semantics of a multi-language independently from the combined languages.

*Related Works.* Cross-language interoperability is a well-researched area both from theoretical and practical points of view. The most related work to our approach is undoubtedly [30], which provides operational semantics to a combined language obtained by embedding a Scheme-like language into an ML-like language. Such an outcome is achieved by introducing *boundaries*, syntactic constructs that model the flow of values from one language to the other. Ours *boundary functions* draw heavily from their work. Nonetheless, we shift them to a semantic level, in order to several variants of multi-language constructions.

[7,21,36,40,53] take a similar line and combine typed and untyped languages (Lua and ML [40], Java and PLT Scheme [21], or Assembly and a typed functional language [36]), focusing on typing issues and values exchanging techniques. Instead of focusing on a particular problem, we adopt a rather general framework to model languages. This choice abstracts away many low-level details, allowing us to reason on semantic concerns in more general terms, without having to fix any particular pair of languages.

A lot of work has been done on multi-language runtime mechanisms: [20] provides a type system for a fragment of Microsoft Intermediate Language (IL) used by the .NET framework, that allows programmers to write components in several languages (C#, Visual Basic, VBScript, . . . ) which are then translated to IL. [22] proposes a virtual machine that can execute the composition of dynamically typed programming languages (Ruby and JavaScript) and statically typed one (C). [4,5] describes a multi-language runtime mechanism achieved by combining single-language interpreters of (different versions of) Python and Prolog.

*Future Works.* From our perspective, the research presented in this paper opens up on three directions. Firstly, future works should aim to provide an operational semantics to the formalization of multi-languages. Rewriting logic seems the most reasonable approach to unifying the denotational world, presented in this paper, to the operational one [31]. This line of research is particularly useful in order to move towards an implementation of an automatic tool able to combine languages such that the resulting multi-language guarantees the results proved in the paper.

Secondly, future research applies to use the multi-language model in order to study the problem of analyzing multi-language programs. In particular, we aim at investigating how it is possible to obtain analyses of multi-language programs by merging already existing analyses of the single combined languages.

Finally, further studies should investigate the problem of compiling multilanguages. Current compilers are closed tools, non-parametric on language constructs (for instance, we cannot compile a single if-then-else term of a standard language like C or Java unless it is plugged into a valid program). Several works on typing [1,20,26], compiling [2,37], and running [23,50] multi-language programs already exist, but without providing a formal notion of multi-language. It would be beneficial to study how their approaches can be applied to the formal framework developed in this paper.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Probabilistic Programming Inference via Intensional Semantics**

Simon Castellan<sup>1</sup> and Hugo Paquet2(B)

<sup>1</sup> Imperial College London, London, UK simon.castellan@phis.me <sup>2</sup> University of Cambridge, Cambridge, UK hugo.paquet@cl.cam.ac.uk

**Abstract.** We define a new denotational semantics for a first-order probabilistic programming language in terms of *probabilistic event structures*. This semantics is *intensional*, meaning that the interpretation of a program contains information about its behaviour throughout execution, rather than a simple distribution on return values. In particular, occurrences of sampling and conditioning are recorded as explicit events, partially ordered according to the data dependencies between the corresponding statements in the program.

This interpretation is *adequate*: we show that the usual measuretheoretic semantics of a program can be recovered from its event structure representation. Moreover it can be leveraged for MCMC inference: we prove correct a version of single-site Metropolis-Hastings with *incremental recomputation*, in which the proposal kernel takes into account the semantic information in order to avoid performing some of the redundant sampling.

**Keywords:** Probabilistic programming · Denotational semantics · Event structures · Bayesian inference

# **1 Introduction**

Probabilistic programming languages [8] were put forward as promising tools for practitioners of Bayesian statistics. By extending traditional programming languages with primitives for sampling and conditioning, they allow the user to express a wide class of statistical models, and provide a simple interface for encoding inference problems. Although the subject of active research, it is still notoriously difficult to design inference methods for probabilistic programs which perform well for the full class of expressible models.

One popular inference technique, proposed by Wingate et al. [21], involves adapting well-known *Monte-Carlo Markov chain* methods from statistics to probabilistic programs, by manipulating *program traces*. One such method is the Metropolis-Hastings algorithm, which relies on a key *proposal* step: given a program trace x (a sequence x1,...,x<sup>n</sup> of random choices with their likelihood), a proposal for the *next* trace sample is generated by choosing <sup>i</sup> ∈ {1,...,n} uniformly, resampling xi, and then continuing to execute the program, only performing additional sampling for those random choices not appearing in x. The variables already present in x are not resampled: only their likelihood is updated according to the new value of xi. Likewise, some conditioning statements must be re-evaluated in case the corresponding weight is affected by the change to xi.

Observe that there is some redundancy in this process, since the updating process above will only affect variables and observations when their density directly depends on the value of xi. This may significantly affect performance: to solve an inference problem one must usually perform a large number of proposal steps. To overcome this problem, some recent implementations, notably [12,25], make use of *incremental recomputation*, whereby some of the redundancy can be avoided via a form of static analysis. However, as pointed out by Kiselyov [13], establishing the correctness of such implementations is tricky.

Here we address this by introducing a theoretical framework in which to reason about data dependencies in probabilistic programs. Specifically, our first contribution is to define a *denotational semantics* for a first-order probabilistic language, in terms of graph-like structures called *event structures* [22]. In event structures, computational events are partially ordered according to the dependencies between them; additionally they can be equipped with quantitative information to represent probabilistic processes [16,23]. This semantics is *intensional*, unlike most existing semantics for probabilistic programs, in which the interpretation of a program resembles a probability distribution on output values. We relate our approach to a measure-theoretic semantics [18] through an *adequacy* result.

Our second contribution is the design of a Metropolis-Hastings algorithm which exploits the event structure representation of the program at hand. Some of the redundancy in the proposal step of the algorithm is avoided by taking into account the extra dependency information given by the semantics. We provide a proof of correctness for this algorithm, and argue that an implementation is realistically achievable: we show in particular that all graph structures involved and the associated quantitative information admit a finite, concrete representation.

*Outline of the Paper.* In Sect. 2 we give a short introduction to probabilistic programming. We define our main language of study and its measure-theoretic semantics. In Sect. 3.1, we introduce MCMC methods and the Metropolis-Hastings algorithm in the context of probabilistic programming. We then motivate the need for intensional semantics in order to capture data dependency. In Sect. 4 we define our interpretation of programs and prove adequacy. In Sect. 5 we define an updated version of the algorithm, and prove its correctness. We conclude in Sect. 6.

The proofs of the statements are detailed in the technical report [4].

# **2 Probabilistic Programming**

In this section we motivate the need for capturing data dependency in probabilistic programs. Let us start with a brief introduction to probabilistic programming – a more comprehensive account can be found in [8].

# **2.1 Conditioning and Posterior Distribution**

Let us introduce the problem of inference in probabilistic programming from the point of view of programming language theory.

We consider a first-order programming language enriched with a real number type R and a primitive sample for drawing random values from a given family of standard probability distributions. The language is idealised—but it is assumed that an implementation of the language comprises built-in sampling procedures for those standard distributions. Thus, repeatedly running the program sample Uniform (0, 1) returns a sequence of values approaching the true uniform distribution on [0, 1].

Via other constructs in the language, standard distributions can be combined, as shown in the following example program of type R:

```
let x = sample Uniform(0, 1) in
let y = sample Gaussian(x, 2) in
x+y
```
Here the output will follow a probability distribution built out of the usual uniform and Gaussian distributions. Many probabilistic programming languages will offer more general programming constructs: conditionals, recursion, higherorder functions, data types, *etc.*, enabling a wide range of distributions to be expressed in this way. Such a program is sometimes called a *generative model*.

*Conditioning.* The process of conditioning involves rescaling the distribution associated with a generative model, so as to reflect some bias. Going back to the example above, say we have made some external measurement indicating that y = 0, but we would like to account for possible noise in the measurement using another Gaussian. To express this we modify the program as follows:

```
let x = sample Uniform (0, 1) in
let y = sample Gaussian (x, 2) in
observe y (Gaussian (0, 0.01));
x + y;
```
The purpose of the observe statement is to increase the occurrence of executions in which y is close to 0; the original distribution, known as the **prior**, must be updated accordingly. The probabilistic weight of each execution is multiplied by an appropriate *score*, namely the **likelihood** of the current value of y in the Gaussian distribution with parameters (0, 0.01). (This is known as a *soft constraint*. Conditioning via *hard constraints*, *i.e.* only giving a nonzero score to executions where y is exactly 0, is not practically feasible.)

The language studied here does not have an observe construct, but instead an explicit score primitive; this appears already in [18,19]. So the third line in the program above would instead be score(pdf-Gaussian (0, 0.01) (y)) where pdf-Gaussian (0, 0.01) is the *density* function of the Gaussian distribution. The resulting distribution is not necessarily normalised. We obtain the **posterior** distribution by computing the normalising constant, following Bayes' rule:

posterior <sup>∝</sup> likelihood <sup>×</sup> prior.

This process is known as Bayesian inference and has ubiquitous applications. The difficulty lies in computing the normalising constant, which is usually obtained as an integral. Below we discuss *approximate* methods for sampling from the posterior distribution; they do not rely on this normalising step.

*Measure Theory.* Because this work makes heavy use of probability theory, we start with a brief account of measure theory. A standard textbook for this is [1]. Recall that a **measurable space** is a set X equipped with a σ**-algebra** ΣX: a set of subsets of <sup>X</sup> containing <sup>∅</sup> and closed under complements and countable unions. Elements of Σ<sup>X</sup> are called **measurable sets**. A **measure** on X is a function <sup>μ</sup> : <sup>Σ</sup><sup>X</sup> <sup>→</sup> [0,∞], such that <sup>μ</sup>(∅) = 0 and, for any countable family {U<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> of measurable sets, <sup>μ</sup>( - <sup>i</sup>∈<sup>I</sup> <sup>U</sup>i) = <sup>i</sup>∈<sup>I</sup> <sup>μ</sup>(Ui).

An important example is that of the set R of real numbers, whose σ-algebra <sup>Σ</sup><sup>R</sup> is generated by the intervals [a, b), for a, b <sup>∈</sup> <sup>R</sup> (in other words, it is the smallest σ-algebra containing those intervals). The **Lebesgue measure** on (R, ΣR) is the (unique) measure <sup>λ</sup> assigning <sup>b</sup> <sup>−</sup> <sup>a</sup> to every interval [a, b) (with <sup>a</sup> <sup>≤</sup> <sup>b</sup>).

Given measurable spaces (X, ΣX) and (Y,Σ<sup>Y</sup> ), a function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> is **measurable** if for every <sup>U</sup> <sup>∈</sup> <sup>Σ</sup><sup>Y</sup> , <sup>f</sup> <sup>−</sup>1<sup>U</sup> <sup>∈</sup> <sup>Σ</sup>X. A measurable function <sup>f</sup> : <sup>X</sup> <sup>→</sup> [0,∞] can be *integrated*: given <sup>U</sup> <sup>∈</sup> <sup>Σ</sup><sup>X</sup> the **integral** <sup>U</sup> f dλ is a well-defined element of [0,∞]; indeed the map <sup>μ</sup> : <sup>U</sup> → <sup>U</sup> fdλ is a measure on X, and f is said to be a **density** for μ. The precise definition of the integral is standard but slightly more involved; we omit it.

We identify the following important classes of measures: a measure μ on (X, ΣX) is a **probability measure** if <sup>μ</sup>(X) = 1. It is **finite** if <sup>μ</sup>(X) <sup>&</sup>lt; <sup>∞</sup>, and it is **s-finite** if μ = <sup>i</sup>∈<sup>I</sup> <sup>μ</sup>i, a pointwise, countable sum of finite measures.

We recall the usual product and coproduct constructions for measurable spaces and measures. If {X<sup>i</sup>}<sup>i</sup>∈<sup>I</sup> is a countable family of measurable spaces, their **product** <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> and **coproduct** <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>I</sup> {i} × <sup>X</sup><sup>i</sup> as sets can be turned into measurable spaces, where:


The measurable spaces in this paper all belong to a well-behaved subclass: call (X, ΣX) a **standard Borel space** if it either countable and discrete (*i.e.* all <sup>U</sup> <sup>⊆</sup> <sup>X</sup> are in <sup>Σ</sup>X), or measurably isomorphic to (R, ΣR). Note that standard Borel spaces are closed under countable products and coproducts, and that in a standard Borel space all singletons are measurable.

#### **2.2 A First-Order Probabilistic Programming Language**

We consider a first-order, call-by-value language L with types

$$A, B \:: = 1 \mid \mathbb{R} \mid \coprod\_{i \in I} A\_i \mid \prod\_{i \in I} A\_i$$

where I ranges over nonempty countable sets. The types denote measurable spaces in a natural way: -1 is the singleton space, and -R = (R, ΣR). Products and coproducts are interpreted via the corresponding measure-theoretic constructions: - <sup>i</sup>∈<sup>I</sup> <sup>A</sup>i <sup>=</sup> <sup>i</sup>∈<sup>I</sup> -Ai and - <sup>i</sup>∈<sup>I</sup> <sup>A</sup>i <sup>=</sup> <sup>i</sup>∈<sup>I</sup> -Ai = - <sup>i</sup>∈<sup>I</sup> {i} × -Ai. Moreover, each measurable space -A has a canonical measure μ-<sup>A</sup> : Σ-<sup>A</sup> <sup>→</sup> <sup>R</sup>, induced from the Lebesgue measure on R and the Dirac measure on -1 via standard product and coproduct measure constructions.

The terms of L are given by the following grammar:

$$\begin{aligned} \left| M, N ::= \left( \right) \mid M; N \mid f \mid \mathbf{1} \mathbf{et} \mid a \rightleftharpouparrow M \mid \mathbf{in} \mid N \mid x \\ \mid (M\_i)\_{i \in I} \mid \mathbf{case} \mid M \text{ of } \{ (i, x) \Rightarrow N\_i \}\_{i \in I} \\ \mid \mathbf{samp1e } d \mid \mathbf{(M)} \mid \mathbf{score} \mid M \end{aligned}$$

and we use standard syntactic sugar to manipulate integers and booleans: B = 1 + 1, N = <sup>i</sup>∈<sup>ω</sup> 1, and constants are given by the appropriate injections. Conditionals and sequencing can be expressed in the usual way: if <sup>M</sup> then <sup>N</sup><sup>1</sup> else <sup>N</sup><sup>2</sup> <sup>=</sup> case <sup>M</sup> of {(i, ) <sup>⇒</sup> <sup>N</sup><sup>i</sup>}<sup>i</sup>∈{1,2}, and <sup>M</sup>; <sup>N</sup> <sup>=</sup> let a = M in N, where a does not occur in N. In the grammar above:


The typing rules are as follows:

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash M : A \\ \hline \Gamma \vdash \mathtt{1et} \; a = M \ \mathtt{in} \; N : B \end{array} \qquad \qquad \begin{array}{c} \Gamma \vdash M : \mathbb{R}^{n} \qquad d : \mathbb{R}^{n} \times \mathbb{R} \to \mathbb{R} \\ \hline \Gamma \vdash \mathtt{sample} \; d \; \{M\} : \mathbb{R} \\ \hline \end{array} \\\\ \begin{array}{c} \Gamma \vdash M : \mathbb{R} \\ \hline \Gamma \vdash \mathtt{case} \; M : 1 \\ \hline \end{array} \qquad \begin{array}{c} \Gamma \vdash M : A \vdash \mathbb{R}^{n} \qquad d : \mathbb{R}^{n} \times \mathbb{R} \to \mathbb{R} \\ \hline \Gamma \vdash \mathtt{sample} \; d \; \{M\} : \mathbb{R} \\ \hline \end{array} \\\\ \begin{array}{c} \Gamma \vdash M : A \vdash \mathbb{R} : A \qquad \overline{\Gamma \vdash \langle \rangle \, A} \qquad \overline{\Gamma \vdash \langle \rangle \, A} : \mathbb{R} \\ \hline \end{array} \qquad \begin{array}{c} \Gamma \vdash M : A \; \overline{\Gamma \vdash \langle \rangle} \\ \hline \Gamma \vdash \langle M\_{i} : A\_{i} \; \overline{\Gamma \vdash \langle \rangle} \\ \hline \end{array} \\\\ \begin{array}{c} \Gamma \vdash M : A \; \overline{\Gamma \vdash \langle M\_{i} : A \rangle} \\ \hline \Gamma \vdash M : A \\ \hline \end{array} \end{array}$$

Among the measurable functions f, we point out the following of interest:


Examples for <sup>d</sup> include uniform : <sup>R</sup><sup>2</sup> <sup>×</sup> <sup>R</sup> <sup>→</sup> <sup>R</sup>, gaussian : <sup>R</sup><sup>2</sup> <sup>×</sup> <sup>R</sup> <sup>→</sup> <sup>R</sup>, ...

#### **2.3 Measure-Theoretic Semantics of Programs**

We now define a semantics of probabilistic programs using the measure-theoretic concept of *kernel*, which we define shortly. The content of this section is not new: using kernels as semantics for probabilistic was originally proposed in [14], while the (more recent) treatment of conditioning (score) via *s-finite* kernels is due to Staton [18]. Intuitively, kernels provide a semantics of open terms <sup>Γ</sup> <sup>M</sup> : <sup>A</sup> as measures on -A varying according to the values of variables in Γ.

Formally, a **kernel** from (X, ΣX) to (Y,Σ<sup>Y</sup> ) is a function <sup>k</sup> : <sup>X</sup>×Σ<sup>Y</sup> <sup>→</sup> [0,∞] such that for each <sup>x</sup> <sup>∈</sup> <sup>X</sup>, <sup>k</sup>(x, <sup>−</sup>) is a measure, and for each <sup>U</sup> <sup>∈</sup> <sup>Σ</sup><sup>Y</sup> , <sup>k</sup>(−, U) is measurable. (Here the <sup>σ</sup>-algebra <sup>Σ</sup>[0,∞] is the restriction of that of <sup>R</sup>+{∞}.) We say <sup>k</sup> is **finite** (resp. **probabilistic**) if each <sup>k</sup>(x, <sup>−</sup>) is a finite (resp. probability) measure, and it **s-finite** if it is a countable pointwise sum <sup>i</sup>∈<sup>I</sup> <sup>k</sup><sup>i</sup> of finite kernels. We write k : X -Y when k is an s-finite kernel from X to Y .

A term <sup>Γ</sup> <sup>M</sup> : <sup>A</sup> will denote an s-finite kernel -M : -Γ - -A, where the context Γ = x<sup>1</sup> : A1,...,x<sup>n</sup> : A<sup>n</sup> denotes the product of its components: -Γ = -<sup>A</sup>1 ×···× -An.

Notice that any measurable function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> can be seen as a *deterministic* kernel f † : X - Y . Given two s-finite kernels k : A - <sup>B</sup> and <sup>l</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> - C, we define their composition <sup>l</sup> ◦ <sup>k</sup> : <sup>A</sup> -C:

$$l(l\diamond k)(a,X) = \int\_{b\in B} l((a,b),C) \times k(a, \mathrm{d}b).$$

Staton [18] proved that <sup>l</sup> ◦ <sup>k</sup> is a s-finite kernel.

The interpretation of terms is defined by induction:


– case <sup>M</sup> of {(i, x) <sup>⇒</sup> <sup>N</sup>i}i∈<sup>I</sup> <sup>=</sup> coprod ◦ -<sup>M</sup> where coprod : <sup>Γ</sup> <sup>×</sup> - <sup>i</sup>∈<sup>I</sup> <sup>A</sup>i - -<sup>B</sup> maps (γ, {i} × <sup>X</sup>) to -Ni(γ,X).

We observe that when M is a program making no use of conditioning (*i.e.* a generative model), the kernel -M is probabilistic:

**Lemma 1.** *For* <sup>Γ</sup> <sup>M</sup> : <sup>A</sup> *without scores,* -M(γ, -<sup>A</sup>)=1 *for each* <sup>γ</sup> <sup>∈</sup> -Γ*.*

#### **2.4 Exact Inference**

Note that a kernel 1 - -A is the same as a measure on -A. Given a closed program <sup>M</sup> : <sup>A</sup>, the measure -M is a combination of the prior (occurrences of sample) and the likelihood (score). Because score can be called on arbitrary arguments, it may be the case that the measure of the total space (that is, the coefficient -M(-<sup>A</sup>), often called the *model evidence*) is 0 or <sup>∞</sup>.

Whenever this is *not* the case, -M can be normalised to a probability measure, the posterior distribution. For every <sup>U</sup> <sup>∈</sup> <sup>Σ</sup>-<sup>A</sup>,

$$\operatorname{norm}[M](U) = \frac{[M](U)}{[M]([A])}.$$

However, in many cases, this computation is intractable. Thus the goal of *approximate inference* is to approach norm-M, the *true posterior*, using a well-chosen sequence of samples.

# **3 Approximate Inference via Intensional Semantics**

#### **3.1 An Introduction to Approximate Inference**

In this section we describe the Metropolis-Hastings (MH) algorithm for approximate inference in the context of probabilistic programming. Metropolis-Hastings is a generic algorithm to sample from a probability distribution D on a measurable state space <sup>X</sup>, of which we know the density <sup>d</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup> up to some normalising constant.

MH is part of a family of inference algorithms called *Monte-Carlo Markov chain*, in which the posterior distribution is approximated by a series of samples generated using a Markov chain.

Formally, the MH algorithm defines a Markov chain M on the state space X, that is a probabilistic kernel M : X - X. The correctness of the MH algorithm is expressed in terms of convergence. It says that for almost all <sup>x</sup> <sup>∈</sup> <sup>X</sup>, the distribution <sup>M</sup><sup>n</sup>(x, ·) converges to <sup>D</sup> as <sup>n</sup> goes to infinity, where <sup>M</sup><sup>n</sup> is the <sup>n</sup>iteration of <sup>M</sup>: <sup>M</sup> ◦ ... ◦ <sup>M</sup>. Intuitively, this means that iterated sampling from M gets closer to D with the number of iterations.

The MH algorithm is itself parametrised by a Markov chain, referred to as the **proposal kernel** P : X - <sup>X</sup>: for each sampled value <sup>x</sup> <sup>∈</sup> <sup>X</sup>, a proposed value for the next sample is drawn according to <sup>P</sup>(x, ·). Note that correctness only holds under certain assumptions on P.

The MH algorithm assumes that we know how to sample from P, and that its density is known, ie. there is a function <sup>p</sup> : <sup>X</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> such that <sup>p</sup>(x, ·) is the density of the distribution <sup>P</sup>(x, ·),

*The MH Algorithm.* On an input state <sup>x</sup>, the MH algorithm samples from <sup>P</sup>(x, ·) and gets a new sample x . It then compares the likelihood of x and x by computing an acceptance ratio α(x, x ) which says whether the return state is x or <sup>x</sup>. In pseudo-code, for an input state <sup>x</sup> <sup>∈</sup> <sup>X</sup>:


$$\alpha(x, x') = \min\left(1, \frac{d(x') \times p(x, x')}{d(x) \times p(x', x)}\right)$$

3. With probability α(x, x ), return the new sample x , otherwise return the input state x.

The formula for α(x, x ) is known as the Hastings acceptance ratio and is key to the correctness of the algorithm.

Very little is assumed of P, which makes the algorithm very flexible; but of course the convergence rate may vary depending on the choice of P. We give a more formal description of MH in Sect. 5.2.

*Single-Site MH and Incremental Recomputation.* To apply this algorithm to probabilistic programming, we need a proposal kernel. Given a program M, the execution traces of M form a measurable set XM. In this setting the proposal is given by a kernel X<sup>M</sup> -XM.

A widely adopted choice of proposal is the *single-site proposal kernel* which, given a trace <sup>x</sup> <sup>∈</sup> <sup>X</sup>M, generates a new trace <sup>x</sup> as follows:


Observe that there is some redundancy in this process: in the final step, the entire program has to be explored even though only a subset of the random choices will be re-evaluated. Some implementations of Trace MH for probabilistic programming make use of *incremental recomputation*.

We propose in this paper to statically compile a program M to an *event structure* G<sup>M</sup> which makes explicit the probabilistic dependences between events, thus avoiding unnecessary sampling.

#### **3.2 Capturing Probabilistic Dependencies Using Event Structures**

Consider the program depicted in Fig. 1 in which we are interested in learning the parameters μ and σ of a Gaussian distribution from which we have observed two data points, say <sup>v</sup><sup>1</sup> and <sup>v</sup>2. For <sup>i</sup> = 1, 2 the function <sup>f</sup><sup>i</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> expresses a soft constraint; it can be understood as indicating how much the sampled value of xi matches the observed value vi.

A *trace* of this program will be of the form

```
Sam μ · Sam σ · Sam x1 · Sam x2 · Sco (f1 x1) · Sco (f2 x2) · Rtn (μ, σ),
```
for some μ, σ, x1, and <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>R</sup> corresponding to sampled values for variables mu, sigma, x1 and x2.

```
let mu = sample uniform (150, 200) in
let sigma = sample uniform (1, 50) in
let x1 = sample gaussian (mu, sigma) in
let x2 = sample gaussian (mu, sigma) in
score (f1 x1); score (f2 x2);
(mu, sigma)
```
**Fig. 1.** A simple probabilistic program

A proposal step following the single-site kernel may choose to resample μ; then it must run through the entire trace, checking for potential dependencies to μ, though in this case none of the other variables need to be resampled.

So we argue that viewing a program as tree of traces is not most appropriate in this context: we propose instead to compile a program into a partially ordered structure reflecting the probabilistic dependencies.

With our approach, the example above would yield the partial order displayed below on the right-hand side. The nodes on the first line corresponds to the sample for μ and σ, and those on the second line to x<sup>1</sup> and x2. This provides an accurate account of the probabilistic dependencies: whenever <sup>e</sup> <sup>≤</sup> <sup>e</sup> (where <sup>≤</sup> is the reflexive, transitive closure of ), it is the case that e depends on e.

According to this representation of the program, a trace is no longer a linear order, but instead another partial order, similar to the previous one only annotated with a specific value for each variable. This is displayed below, on the left-hand side; note that the order ≤ is drawn top to bottom. There is an obvious erasure map from the trace (left) to the graph (right); this will be important later on.

*Conflict and Control Flow.* We have seen that a partial order can be used to faithfully represent the data dependency in the program; it is however not sufficient to accurately describe the control flow. In particular, computational events may live in different *branches* of a conditional statement, as in the following example:

```
let x = sample uniform (0, 5) in
if x ≥ 2 then sample gaussian (3, 1)
else sample uniform (2, 4)
```
The last two samples are independent, but also *incompatible*: in any given trace only one of them will occur. An example of a trace for this program is Sam 1 · Sam 3 · Rtn 3.

Sam Sam Rtn Rtn We represent this information by enriching the partial order with a conflict relation, indicating when two actions

Sam

are in different branches of a conditional statement. The resulting structure is depicted on the right. Combining partial order and conflict in this way can be conveniently formalised using **event structures** [22]:

**Definition 1.** *An event structure is a tuple* (E, <sup>≤</sup>, #) *where* (E, <sup>≤</sup>) *is a partially ordered set and* # *is an irreflexive, binary relation on* E *such that*


From the partial order <sup>≤</sup>, we extract **immediate causality** : <sup>e</sup> <sup>e</sup> when e<e with no events in between; and from the conflict relation, we extract **minimal conflict** - - : e - e when e#e and there are no other conflicts in [e] <sup>∪</sup> [e ]. In pictures we draw and - rather than ≤ and #.

A subset <sup>x</sup> <sup>⊆</sup> <sup>E</sup> is a **configuration** of <sup>E</sup> if it is down-closed (if <sup>e</sup> <sup>≤</sup> <sup>e</sup> <sup>∈</sup> <sup>x</sup> then <sup>e</sup> <sup>∈</sup> <sup>x</sup>) and conflict-free (if e, e <sup>∈</sup> <sup>x</sup> then <sup>¬</sup>(e#e )). So in this framework, configurations correspond to exactly to *partial* executions traces of E.

The configuration [e] is the **causal history** of <sup>e</sup>; we also write [e) for [e]\{e}. We write *C* (E) for the set of all finite configurations of E, a partial order under inclusion. A configuration x is **maximal** if it is maximal in *C* (E): for every <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (E), if <sup>x</sup> <sup>⊆</sup> <sup>x</sup> then <sup>x</sup> <sup>=</sup> <sup>x</sup> . We use the notation <sup>x</sup> <sup>e</sup> −−⊂ <sup>x</sup> when <sup>x</sup> <sup>=</sup> <sup>x</sup>∪{e}, and in that case we say x **covers** x.

An event structure is **confusion-free** if minimal conflict is transitive, and if any two events e, e in minimal conflict satisfy [e)=[e ).

*Compositionality.* In order to give semantics to the language in a compositional manner, we must consider arbitrary *open* programs, *i.e.* with free parameters. Therefore we also represent each call to a parameter a as a *read* event, marked Rd a . For instance the program x + y with two real parameters will become the event structure

Note that the read actions on x and y are independent in the program (no order is specified), and the event structure respects this independence.

Our dependency graphs are event structures where each event carries information about the syntactic operation it comes from, a **label**, which depends on the typing context of the program:

$$\mathcal{Q}\_{\Gamma \vdash B}^{\mathsf{autatic}} ::= \mathsf{Rd}\,a \mid \mathsf{Rtn} \mid \mathsf{Sam} \mid \mathsf{Sco} \,, \,\mathsf{Sco} \,, \,\mathsf{Sco} \,\,\mathsf{A}$$

where a ranges over variables a : A in Γ.

**Definition 2.** *<sup>A</sup> dependency graph over* <sup>Γ</sup> <sup>B</sup> *is an event structure* <sup>G</sup> *along with a labelling map lbl* : <sup>G</sup> <sup>→</sup> *<sup>L</sup> static* <sup>Γ</sup> <sup>B</sup> *where any two events* s, s <sup>∈</sup> <sup>G</sup> *labelled Rtn are in conflict, and all maximal configurations of* G *are of the form* [r] *for* <sup>r</sup> <sup>∈</sup> <sup>G</sup> *a return event.*

The condition on return events ensures that in any configuration of G there is at most one return event. Events of G are called static events.

We use dependency graphs as a causal representation of programs, reflecting the dependency between different parts of the program. In what follows we enrich this representation with runtime information in order to keep track of the dataflow of the program (in Sect. 3.3), and the associated distributions (in Sect. 3.4).

#### **3.3 Runtime Values and Dataflow Graphs**

We have seen how data dependency can be captured by representing a program P as a dependency graph G<sup>P</sup> . But observe that this graph does not give any runtime information about the data in <sup>P</sup>; every event <sup>s</sup> <sup>∈</sup> <sup>G</sup><sup>P</sup> only carries a label lbl(s) indicating the class of action it belongs to. (For an event labelled Rd a , G does not specify the value at a; whereas at runtime this will be filled by an element of -A where A is the type of a.)

To each label, we can associate a measurable space of possible runtime values:

$$\mathcal{Q}(\mathsf{Rd}\,b) = \begin{bmatrix} \Gamma(b) \end{bmatrix} \qquad \mathcal{Q}(\mathsf{Rtn}) = \begin{bmatrix} A \end{bmatrix} \qquad \mathcal{Q}(\mathsf{San}) = (\mathbb{R}, \Sigma\_{\mathbb{R}}) \qquad \mathcal{Q}(\mathsf{Sco}) = (\mathbb{R}, \Sigma\_{\mathbb{R}}).$$

Then, in a particular execution, an event <sup>s</sup> <sup>∈</sup> <sup>G</sup><sup>P</sup> has a value in *<sup>Q</sup>*(lbl(s)), and can be instead labelled by the following expanded set:

$$\mathcal{A}\_{\Gamma \vdash B}^{\mathsf{run}} ::= \mathsf{Rd} \, a \, v \mid \mathsf{Rtn} \, v \mid \mathsf{Sam} \, r \mid \mathsf{Sco} \, r \, v$$

where <sup>r</sup> ranges over real numbers; in Rd a v, <sup>a</sup> : <sup>A</sup> <sup>∈</sup> <sup>Γ</sup> and <sup>v</sup> <sup>∈</sup> -A; and in Rtn v, v ranges over elements of -B. Notice that there is an obvious forgetful map α : *L* run <sup>Γ</sup> <sup>A</sup> <sup>→</sup> *<sup>L</sup>* static <sup>Γ</sup> <sup>A</sup> , discarding the runtime value. This runtime value can be extracted from a label in *L* run <sup>Γ</sup> <sup>B</sup> as follows:

$$\mathbf{q(Rd\,b\,v)} = v \qquad \mathbf{q(Rt\,n\,v)} = v \qquad \mathbf{q(S\mathbf{a}\,m\,r)} = r \qquad \mathbf{q(S\mathbf{c}\,r)} = r.$$

In particular, we have **<sup>q</sup>**() <sup>∈</sup> *<sup>Q</sup>*(α()).

Such runtime events organise themselves in an event structure E<sup>P</sup> , labelled over *L* run <sup>Γ</sup> B, the **runtime graph** of P. Runtime graphs are in general uncountable, and so difficult to represent pictorially. It can be done in some

simple, finite cases: the graph for if a then 2 else 3 is depicted on the right. Recall that in dependency graphs conflict was used to represent conditional branches; here instead conflict is used to keep disjoint the possible outcomes of the same static event. (Necessarily, this static event must be a sample or a read, since other actions (return, score) are deterministic.)

Intuitively one can project runtime events to static events by erasing the runtime information; this suggests the existence of a function <sup>π</sup><sup>P</sup> : <sup>E</sup><sup>P</sup> <sup>→</sup> <sup>G</sup><sup>P</sup> . This function will turn out to satisfy the axioms of a *rigid map of event structures*:

**Definition 3.** *Given event structures* (E, <sup>≤</sup><sup>E</sup>, #E) *and* (G, <sup>≤</sup><sup>G</sup>, #G) *a function* <sup>π</sup> : <sup>E</sup> <sup>→</sup> <sup>G</sup> *is a rigid map if*


In general π is not injective, since many runtime events may correspond to the same static event – in that case however the axioms will require them to be in conflict. The last condition in the definition ensures that all causal dependencies come from G.

Given <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (G<sup>P</sup> ) we define the possible runtime values for <sup>x</sup> as the set *<sup>Q</sup>*(x) of functions mapping <sup>s</sup> <sup>∈</sup> <sup>x</sup> to a runtime value in *<sup>Q</sup>*(lbl(s)); in other words *Q*(x) = <sup>s</sup>∈<sup>x</sup> *<sup>Q</sup>*(lbl(s)). A configuration <sup>x</sup> of <sup>E</sup><sup>P</sup> can be viewed as a trace over π<sup>P</sup> x ; hence π−<sup>1</sup> <sup>P</sup> {x} := {x <sup>∈</sup> *<sup>C</sup>* (E<sup>P</sup> ) <sup>|</sup> <sup>π</sup><sup>P</sup> <sup>x</sup> <sup>=</sup> <sup>x</sup>} is the set of traces of <sup>P</sup> over x. We can now define dataflow graphs:

**Definition 4.** *<sup>A</sup> dataflow graph on* <sup>Γ</sup> <sup>B</sup> *is a triple* **<sup>S</sup>** = (ES, GS, π<sup>S</sup> : <sup>E</sup><sup>S</sup> <sup>→</sup> GS) *with* G<sup>S</sup> *a dependency graph and* E<sup>S</sup> *a runtime graph, such that:*

*–* <sup>π</sup><sup>S</sup> *is a rigid map and lbl* ◦ <sup>π</sup><sup>S</sup> <sup>=</sup> <sup>α</sup> ◦ *lbl* : <sup>E</sup><sup>S</sup> <sup>→</sup> *<sup>L</sup> static* <sup>Γ</sup> <sup>B</sup> *– for each* <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (GS)*, the following function is injective*

$$\begin{array}{rcl} q\_x: \pi\_S^{-1}\{x\} & \to \mathcal{Q}(x) \\ x' & \longmapsto (s \mapsto \mathbf{q}(\mathsf{bb}(s))) \end{array}$$

*– if* e, e <sup>∈</sup> <sup>E</sup><sup>S</sup> *with* <sup>e</sup> - e *then* πe = πe *, and moreover* e *and* e *are either both sample or both read events.*

As mentioned above, maximal configurations of E<sup>P</sup> correspond to total traces of P, and will be the states of the Markov chain in Sect. 5. By the second axiom, they can be seen as pairs (<sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (GS), q <sup>∈</sup> *<sup>Q</sup>*(x)). Because of the third axiom, E<sup>S</sup> is always confusion-free.

*Measurable Fibres.* Rigid maps are convenient in this context because, they allow for reasoning about program traces by organising them as *fibres*. The key property we rely on is the following:

**Lemma 2.** *If* <sup>π</sup> : <sup>E</sup> <sup>→</sup> <sup>G</sup> *is a rigid map of event structures, then the induced map* <sup>π</sup> : *<sup>C</sup>* (E) <sup>→</sup> *<sup>C</sup>* (G) *is a* discrete fibration*: that is, for every* <sup>y</sup> <sup>∈</sup> *<sup>C</sup>* (E)*, if* <sup>x</sup> <sup>⊆</sup> πy *for some* <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (G)*, then there is a unique* <sup>y</sup> <sup>∈</sup> *<sup>C</sup>* (E) *such that* <sup>y</sup> <sup>⊆</sup> <sup>y</sup> *and* πy = x*.*

This enables an essential feature of our approach: given a configuration x of the dataflow graph <sup>G</sup>, the fibre <sup>π</sup>−<sup>1</sup>{x} over it contains all the (possibly partial) program traces over x, *i.e.* those whose path through the program corresponds to that of x. Additionally the lemma implies that every pair of configurations x, x <sup>∈</sup> *<sup>C</sup>* (G) such that <sup>x</sup> <sup>⊆</sup> <sup>x</sup> induces a **restriction map** <sup>r</sup>x,x : <sup>π</sup>−<sup>1</sup>{x } → <sup>π</sup>−<sup>1</sup>{x}, whose action on a program trace over <sup>x</sup> is to return its *prefix* over <sup>x</sup>.

Although there is no measure-theoretic structure in the definition of dataflow graphs, we can recover it: for every <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (GS), the fibre <sup>π</sup>−<sup>1</sup> <sup>S</sup> {x} can be equipped with the σ-algebra induced from Σ*Q*(x) via qx; it is generated by sets q−<sup>1</sup> <sup>x</sup> U for <sup>U</sup> <sup>∈</sup> <sup>Σ</sup>*Q*(x).

It is easy to check that this makes the restriction map rx,x : π−<sup>1</sup> <sup>S</sup> {x } → π−1 <sup>S</sup> {x} measurable for each pair x, x of configurations with <sup>x</sup> <sup>⊆</sup> <sup>x</sup> . (Note that this makes **S** a *measurable event structure* in the sense of [16].) Moreover, the map qx,s : π−<sup>1</sup> <sup>S</sup> {x} → *<sup>Q</sup>*(lbl(s)) for <sup>s</sup> <sup>∈</sup> <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (GS), mapping <sup>x</sup> <sup>∈</sup> <sup>π</sup>−<sup>1</sup> <sup>S</sup> {x} to **q**(lbl(s )) for s the unique antecedent by π<sup>S</sup> of s in x , is also measurable.

We will also make use of the following result:

**Lemma 3.** *Consider a dataflow* **<sup>S</sup>** *and* x, y, z <sup>∈</sup> *<sup>C</sup>* (GS) *with* <sup>x</sup> <sup>⊆</sup> <sup>y</sup>*,* <sup>x</sup> <sup>⊆</sup> <sup>z</sup>*, and* <sup>y</sup> <sup>∪</sup> <sup>z</sup> <sup>∈</sup> *<sup>C</sup>* (GS)*. If* <sup>y</sup> <sup>∩</sup> <sup>z</sup> <sup>=</sup> <sup>x</sup>*, then the space* <sup>π</sup>−<sup>1</sup> <sup>S</sup> {<sup>y</sup> <sup>∪</sup> <sup>z</sup>} *is isomorphic to the set*

$$\{(u\_y, u\_z) \in \pi\_S^{-1}\{y\} \times \pi\_S^{-1}\{z\} \mid r\_{x,y}(u\_y) = r\_{x,z}(u\_z)\},$$

*with* <sup>σ</sup>*-algebra generated by sets of the form* {(uy, uz) <sup>∈</sup> <sup>X</sup><sup>y</sup> <sup>×</sup> <sup>X</sup><sup>z</sup> <sup>|</sup> <sup>X</sup><sup>y</sup> <sup>∈</sup> Σ<sup>π</sup>−<sup>1</sup> <sup>S</sup> {y}, X<sup>z</sup> <sup>∈</sup> <sup>Σ</sup><sup>π</sup>−<sup>1</sup> <sup>S</sup> {z} *and* <sup>r</sup>x,y(uy) = <sup>r</sup>x,z(uz)}*.*

*(For the reader with knowledge of category theory, this says exactly that the diagram*

$$\begin{aligned} \pi\_S^{-1} \{ y \cup z \} &\xrightarrow{r\_{y, y \cup z}} \pi\_S^{-1} \{ y \} \\ r\_{z, y \cup z} \downarrow & & \downarrow r\_{x, y} \\ \pi\_S^{-1} \{ z \} &\xrightarrow{r\_{x, z}} \pi\_S^{-1} \{ x \} \end{aligned}$$

*is a pullback in the category of measurable spaces.)*

#### **3.4 Quantitative Dataflow Graphs**

We can finally introduce the last bit of information we need about programs in order to perform inference: the probabilistic information. So far, in a dataflow graph, we know when the program is sampling, but not from which distribution. This is resolved by adding for each sample event s in the dependency graph a kernel <sup>k</sup><sup>s</sup> : <sup>π</sup>−<sup>1</sup>{[s)} <sup>π</sup>−<sup>1</sup>{[s]}. Given a trace <sup>x</sup> over [s), <sup>k</sup><sup>s</sup> specifies a probability distribution according to which x will be extended to a trace over [s]. This distribution must of course have support contained in the set r−<sup>1</sup> [s),[s] {x} of traces over [s] of which x is a prefix; this is the meaning of the technical condition in the definition below.

**Definition 5.** *A quantitative dataflow graph is a tuple* **S** = (ES, GS, π : <sup>E</sup><sup>S</sup> <sup>→</sup> <sup>G</sup>S,(k<sup>S</sup> <sup>s</sup> )) *where for each sample event* <sup>s</sup> <sup>∈</sup> <sup>G</sup>S*,* <sup>k</sup><sup>S</sup> <sup>s</sup> *is a kernel* <sup>π</sup>−<sup>1</sup>{[s)} - <sup>π</sup>−<sup>1</sup>{[s]} *satisfying for all* <sup>x</sup> <sup>∈</sup> <sup>π</sup>−<sup>1</sup>{[s)},

$$k\_s^S(x, \pi^{-1}\{ [s] \} \backslash r\_{[s), [s]}^{-1} \{ x \}) = 0.$$

This axiom stipulates that any extension <sup>x</sup> <sup>∈</sup> <sup>π</sup>−<sup>1</sup> <sup>S</sup> {[s]} of <sup>x</sup> <sup>∈</sup> <sup>π</sup>−<sup>1</sup> <sup>S</sup> {[s)} drawn by k<sup>s</sup> must contain x; in effect k<sup>s</sup> only samples the runtime value for s.

*From Graphs to Kernels.* We show how to collapse a quantitative dataflow graph **<sup>S</sup>** on <sup>Γ</sup> <sup>B</sup> to a kernel -Γ - -B. First, we extend the kernel family on sampling events (k<sup>S</sup> <sup>s</sup> : <sup>π</sup>−<sup>1</sup>{[s)} <sup>π</sup>−<sup>1</sup>{[s]}) to a family (k<sup>S</sup>[γ] <sup>s</sup> : <sup>π</sup>−<sup>1</sup>{[s)} <sup>π</sup>−<sup>1</sup>{[s]}) defined on *all* events <sup>s</sup> <sup>∈</sup> <sup>S</sup>, parametrised by the value of the environment <sup>γ</sup> <sup>∈</sup> -<sup>Γ</sup>. To define <sup>k</sup><sup>S</sup>[γ] <sup>s</sup> (x, ·) it is enough to specify its value on the generating set for <sup>Σ</sup><sup>π</sup>−1{[s]}. As we have seen this contains elements of the form <sup>q</sup>−<sup>1</sup> [s] (U) with <sup>U</sup> <sup>∈</sup> <sup>Σ</sup>*Q*([s]). We distinguish the following cases corresponding to the nature of <sup>s</sup>:


$$k\_s^{S[\gamma]}(x, q\_{[s]}^{-1}U) = \delta\_{q\_{[s]}(x)[s:=\gamma(a)]}(U)$$

– If <sup>s</sup> is a return or a score event: any <sup>x</sup> <sup>∈</sup> <sup>π</sup>−<sup>1</sup>{[s)} has at most one extension to <sup>o</sup>(x) <sup>∈</sup> <sup>π</sup>−<sup>1</sup>{[s]} (because return and score events cannot be involved in a minimal conflict): <sup>k</sup><sup>S</sup>[γ] <sup>s</sup> (x, q−<sup>1</sup> [s] (U)) = <sup>δ</sup><sup>q</sup>[s](o(x))(U). If <sup>o</sup>(x) does not exist, we let <sup>k</sup><sup>S</sup>[γ] <sup>s</sup> (x, X) = 0.

We can now define a kernel <sup>k</sup><sup>S</sup>[γ] x,s : <sup>π</sup>−<sup>1</sup>{x} <sup>π</sup>−<sup>1</sup>{x } for every atomic extension <sup>x</sup> <sup>s</sup> −−⊂ <sup>x</sup> in <sup>G</sup>S, ie. when <sup>x</sup> \ <sup>x</sup> <sup>=</sup> {s}, as follows:

$$k\_{x,s}^{S[\gamma]}(y,U) = k\_s(r\_{[s),x}(y), \{w \in \pi\_S^{-1}\{[s]\} \mid (y,w) \in U\}).\qed$$

The second argument to k<sup>s</sup> above is always measurable, by a standard measuretheoretic argument based on Lemma 3, as <sup>x</sup> <sup>∩</sup> [s]=[s).

From this definition we derive:

**Lemma 4.** *If* <sup>x</sup> <sup>s</sup><sup>1</sup> −−⊂ <sup>x</sup><sup>1</sup> *and* <sup>x</sup> <sup>s</sup><sup>2</sup> −−⊂ <sup>x</sup><sup>2</sup> *are concurrent extensions of* <sup>x</sup> *(* i.e. <sup>s</sup><sup>1</sup> *and* <sup>s</sup><sup>2</sup> *are not in conflict), then* <sup>k</sup><sup>S</sup>[γ] <sup>x</sup>1,s<sup>2</sup> ◦ <sup>k</sup><sup>S</sup>[γ] x,s<sup>1</sup> <sup>=</sup> <sup>k</sup><sup>S</sup>[γ] <sup>x</sup>2,s<sup>1</sup> ◦ <sup>k</sup><sup>S</sup>[γ] x,s<sup>2</sup> *.*

Given a configuration <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (GS) and a covering chain <sup>∅</sup> <sup>s</sup><sup>1</sup> −−⊂ <sup>x</sup><sup>1</sup> ... <sup>s</sup><sup>n</sup> −−⊂ <sup>x</sup><sup>n</sup> <sup>=</sup> <sup>x</sup>, we can finally define a measure on <sup>π</sup>−<sup>1</sup>{x}:

$$
\mu\_x^{S[\gamma]} = k\_{x\_{n-1}, s\_n}^{S[\gamma]} \diamond \dots \diamond k\_{\emptyset, s\_1}^{S[\gamma]} (\*, \cdot),
$$

where ∗ is the only trace over ∅. The particular covering chain used does not matter by the previous lemma. Using this, we can define the kernel of a quantitative dataflow graph **S** as follows:

$$\mathsf{kernel}(\mathbf{S})(\gamma, X) = \sum\_{r \in G\_S, \mathsf{bl}(r) = \mathsf{Rnt}} \mu\_{[r]}^{S[\gamma]} (q\_{[r], r}^{-1}(X)),$$

where the measurable map <sup>q</sup>[r],r : <sup>π</sup>−<sup>1</sup>{r} → -B looks up the runtime value of r in an element of the fibre over [r] (defined in Sect. 3.3).

**Lemma 5.** *kernel*(**S**) *is an s-finite kernel* -Γ - -B*.*

# **4 Programs as Labelled Event Structures**

We now detail our interpretation of programs as quantitative dataflow graphs. Our interpretation is given by induction, similarly to the measure-theoretic interpretation given in Sect. 2.3, in which composition of kernels plays a central role. In Sect. 4.1, we discuss how to compose quantitative dataflow graphs, and in Sect. 4.2, we define our interpretation.

#### **4.1 Composition of Probablistic Event Structures**

Consider two quantitative dataflow graphs, <sup>S</sup> on <sup>Γ</sup> <sup>A</sup>, and <sup>T</sup> on Γ, a : <sup>A</sup> <sup>B</sup> where a does not occur in Γ. In what follows we show how they can be composed to form a quantitative dataflow graph <sup>T</sup> <sup>a</sup> <sup>S</sup> on <sup>Γ</sup> <sup>B</sup>.

Unlike in the kernel model of Sect. 2.3, we will need two notions of composition. The first one is akin to the usual sequential composition: actions in T must wait on S to return before they can proceed. The second is closer to parallel composition: actions on T which do not depend on a read of the variable a can be executed in parallel with S. The latter composition is used to interpret the let construct. In let a = M in N, we want all the probabilistic actions or reads on other variables which do not depend on the value of a to be in parallel with <sup>M</sup>. However, in a program such as case <sup>M</sup> of {(i, x) <sup>⇒</sup> <sup>N</sup><sup>i</sup>}<sup>i</sup>∈<sup>I</sup> we do not want any actions of N<sup>i</sup> to start before the selected branch is known, *i.e.* before the return value of M is known.

By way of illustration, consider the following simple example, in which we only consider runtime graphs, ignoring the rest of the structure for now. Suppose S and T are given by

$$S = \begin{array}{c} \mathsf{Rd}\,b\,\mathsf{tt} \sim \mathsf{Rd}\,b\,\mathsf{ff} \\ \cline{2-2} \mathsf{Rt}\,\mathsf{n}\,\mathsf{ff} \\ \mathsf{Rt}\,\mathsf{n}\,\mathsf{ff} \end{array} \qquad T = \begin{array}{c} \mathsf{San}\,r \\ \sum\mathsf{Rd}\,a\,\mathsf{tt} \sim \mathsf{Rd}\,a\,\mathsf{ff} \\ \mathsf{Rtn}((),\mathsf{tt}) \sim \mathsf{Rtn}((),\mathsf{ff}) \end{array}$$

The graph S can be seen to correspond to the program if b then ff else tt and T to the pairing (sample d (0), a) for any d. Here S is a runtime graph on <sup>b</sup> : <sup>B</sup> <sup>B</sup> and <sup>T</sup> on <sup>a</sup> : <sup>B</sup>, b : <sup>B</sup> <sup>B</sup>.

Both notions of compositions are displayed in the diagram below. The sequential composition (left) corresponds to

$$\text{if } \textit{b} \text{ then } (\mathtt{samp1e } d \text{ (0)}, \mathtt{ff}) \text{ е1se } (\mathtt{samp1e } d \text{ (0)}, \mathtt{tt}).$$

and the parallel composition to (sample d (0), if b then ff else tt):

$$T \odot\_{\text{seq}}^{a} S = \begin{pmatrix} \mathsf{R} \mathsf{d} \, b \, \mathsf{t} \mathsf{t} \sim \mathsf{R} \mathsf{d} \, b \, \mathsf{f} \\ \Psi & \Psi \\ \mathsf{S} \mathsf{a} \mathsf{m} \, r & \mathsf{S} \mathsf{a} \, \mathsf{r} \\ \Psi & \Psi \\ \mathsf{R} \mathsf{t} \, \mathsf{n} \, \mathsf{f} \end{pmatrix} \quad T \odot\_{\text{par}}^{a} S = \begin{pmatrix} \mathsf{S} \mathsf{a} \, \mathsf{m} \, r & \mathsf{R} \mathsf{d} \, b \, \mathsf{t} \mathsf{t} \sim \mathsf{R} \mathsf{d} \, b \, \mathsf{f} \\ \Psi & \Psi \\ \mathsf{R} \mathsf{t} \, \mathsf{n} \, \mathsf{f} & \mathsf{R} \mathsf{t} \, \mathsf{n} \, \mathsf{t} \, \mathsf{t} \end{pmatrix}.$$

*Composition of Runtime and Dependency Graphs.* Let us now define both composition operators at the level of the event structures. Through the bijection *L* static <sup>Γ</sup> <sup>B</sup> *<sup>L</sup>* run <sup>Γ</sup><sup>1</sup> where <sup>Γ</sup> (a) = 1 for all <sup>a</sup> <sup>∈</sup> dom(Γ), we will see dependency graphs and runtime graphs as the same kind of objects, event structures labelled over *L* run <sup>Γ</sup> <sup>A</sup>.

The two compositions <sup>S</sup> <sup>a</sup> par <sup>T</sup> and <sup>S</sup> <sup>a</sup> seq T are two instances of the same construction, parametrised by a set of labels <sup>D</sup> <sup>⊆</sup> *<sup>L</sup>* run Γ,a:A<sup>B</sup>. Informally, <sup>D</sup> specifies which events of T are to depend on the return value of S in the resulting composition graph. It is natural to assume in particular that D contains all reads on a, and all return events.

Sequential and parallel composition are instances of this construction where D is set to one of the following:

$$D\_{\text{seq}}^{\Gamma, a:A\vdash B} = \mathcal{L}\_{\Gamma, a:A\vdash B}^{\mathsf{run}} \qquad \qquad D\_{\text{par}}^{\Gamma, a:A\vdash B} = \{\mathsf{Rd}\, a\, v, \mathsf{Rtn}\, v \in \mathcal{L}\_{\Gamma, a:A\vdash B}^{\mathsf{run}}\}.$$

We proceed to describe the construction for an abstract D. Let T be an event structure labelled by *L* run Γ,a:A<sup>B</sup> and <sup>S</sup> labelled by *<sup>L</sup>* run <sup>Γ</sup> <sup>A</sup>. A configuration <sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (S) is a **justification** of <sup>y</sup> <sup>∈</sup> *<sup>C</sup>* (T) when

1. if lbl(y) intersects D, then x contains a return event

2. for all <sup>t</sup> <sup>∈</sup> <sup>y</sup> with label Rd a v, there exists an event <sup>s</sup> <sup>∈</sup> <sup>x</sup> labelled Rtn <sup>v</sup>.

In particular if lbl(y) does not intersect D, then any configuration of S is a justification of y. A **minimal justification** of y is a justification that admits no proper subset which is also a justification of y. We now define the event structure <sup>S</sup> ·<sup>D</sup> <sup>T</sup> as follows:


$$\begin{aligned} \#\_S &\cup \{ (x,t), (x',t') \mid x \cup x' \notin \ell^\diamond(T) \lor t \#\_B t' \} \\ &\cup \{ s, (x,t) \mid \{ s \} \cup x \notin \ell^\diamond(S) \} .\end{aligned}$$

**Lemma 6.** <sup>S</sup> ·<sup>D</sup> <sup>T</sup> *is an event structure, and the following is an orderisomorphism:*

·, · : {(x, y) <sup>∈</sup> *<sup>C</sup>* (S) <sup>×</sup> *<sup>C</sup>* (T) <sup>|</sup> <sup>x</sup> *is a justification of* <sup>y</sup>} <sup>∼</sup><sup>=</sup> *<sup>C</sup>* (<sup>S</sup> ·<sup>D</sup> <sup>T</sup>).

This event structure is not quite what we want, since it still contains return events from S and reads on a from T. To remove them, we use the following general construction. Given a <sup>Σ</sup>-labelled event structure <sup>E</sup> and <sup>V</sup> <sup>⊆</sup> <sup>E</sup> a set of visible events, its **projection** <sup>E</sup> <sup>↓</sup> <sup>V</sup> has events <sup>V</sup> and causality, conflict and labelling inherited from E. Thus the composition of S and T is:

<sup>S</sup> <sup>a</sup> <sup>D</sup> <sup>T</sup> := <sup>S</sup> ·<sup>D</sup> <sup>T</sup> <sup>↓</sup> ({<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>s</sup> not a return}∪{(x, t) <sup>|</sup> <sup>t</sup> not a read on <sup>a</sup>}).

As a result <sup>S</sup> <sup>a</sup> <sup>D</sup> T is labelled over *L* run <sup>Γ</sup> <sup>B</sup> as needed.

*Dataflow Information.* We now explain how this construction lifts to dataflow graphs. Consider dataflow graphs <sup>S</sup> = (ES, GS, π<sup>S</sup> : <sup>E</sup><sup>S</sup> <sup>→</sup> <sup>G</sup>S) on <sup>Γ</sup> <sup>A</sup> and <sup>T</sup> = (E<sup>T</sup> , G<sup>T</sup> , π<sup>T</sup> : <sup>E</sup><sup>T</sup> <sup>→</sup> <sup>E</sup><sup>T</sup> ) on Γ, a : <sup>A</sup> <sup>B</sup>. Given <sup>D</sup> <sup>⊆</sup> *<sup>L</sup>* static Γ,a:A<sup>B</sup> we define

$$\begin{aligned} E\_{S \cdot\_D T} &= E\_S \cdot\_{\alpha^{-1} D} E\_T & G\_{S \cdot\_D T} &= G\_S \cdot\_D G\_T \\ E\_{S \odot\_D^a T} &= E\_S \odot\_{\alpha^{-1} D}^a E\_T & G\_{S \odot\_D^a T} &= G\_S \odot\_D^a G\_T \end{aligned}$$

**Lemma 7.** *The maps* π<sup>S</sup> *and* π<sup>T</sup> *extend to rigid maps*

$$\begin{aligned} \pi\_{S \cdot\_D T} &: E\_{S \cdot\_{\alpha^{-1} D} T} \to G\_{S \cdot\_D T} \\ \pi\_{S \odot\_D^a T} &: E\_{S \odot\_{\alpha^{-1} D} T} \to G\_{S \odot\_D^a T} \end{aligned}$$

*Moreover, if* x, y ∈ *<sup>C</sup>* (E<sup>S</sup>·D<sup>T</sup> )*,* π<sup>S</sup> x, π<sup>T</sup> <sup>y</sup> *is a well-defined configuration of* <sup>G</sup><sup>S</sup>·D<sup>T</sup> *. As a result, for* x, y ∈ *<sup>C</sup>* (G<sup>S</sup>·D<sup>T</sup> )*, we have a injection* <sup>ϕ</sup>x,y : <sup>π</sup>−<sup>1</sup>{x, y} → <sup>π</sup>−<sup>1</sup>{x} × <sup>π</sup>−<sup>1</sup>{y} *making the following diagram commute:*

$$\begin{array}{c} \pi^{-1}\{\langle x,y\rangle\} \xrightarrow{\varphi\_{x,y}} \pi^{-1}\{x\} \times \pi^{-1}\{y\} \\ q\_{\langle x,y\rangle} \downarrow \\ \mathcal{Q}(\langle x,y\rangle) \xrightarrow[\cong]{} \mathcal{Q}(x) \times \mathcal{Q}(y) \end{array}$$

*In particular,* <sup>ϕ</sup>x,y *is measurable and induces the* <sup>σ</sup>*-algebra on* <sup>π</sup>−<sup>1</sup>{x, y}*. We write* <sup>ϕ</sup><sup>x</sup> *for the map* <sup>ϕ</sup>x,∅*, an isomorphism.*

*Adding Probability.* At this point we have defined all the components of dataflow graphs <sup>S</sup> <sup>a</sup> <sup>D</sup> <sup>T</sup> and <sup>S</sup> ·<sup>D</sup> <sup>T</sup>. We proceed to make them quantitative.

Observe first that each sampling event of <sup>G</sup><sup>S</sup>·D<sup>T</sup> (or equivalently of <sup>G</sup><sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> – sampling events are never hidden) corresponds either to a sampling event of GS, or to an event (x, t) where t is a sampling event of G<sup>T</sup> . We consider both cases to define a family of kernels (k<sup>S</sup>·D<sup>T</sup> <sup>s</sup> ) between the fibres of <sup>S</sup> ·<sup>D</sup> <sup>T</sup>. This will in turn induce a family (k<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> <sup>s</sup> ) on <sup>S</sup> <sup>a</sup> <sup>D</sup> T.

– If s is a sample event of GS, we use the isomorphisms ϕ[s) and ϕ[s] of Lemma 7 to define:

$$k\_s^{S \odot \stackrel{a}{D}T}(v, X) = k\_s^S(\varphi\_{[s]}^{-1} \, v, \varphi\_{[s]}^{-1} X).$$

– If <sup>s</sup> corresponds to (x, t) for <sup>t</sup> a sample event of <sup>G</sup><sup>T</sup> , then for every <sup>X</sup><sup>x</sup> <sup>∈</sup> Σπ−<sup>1</sup> <sup>S</sup> {x} and <sup>X</sup><sup>t</sup> <sup>∈</sup> <sup>Σ</sup>π−<sup>1</sup> <sup>T</sup> {[t)} we define

$$k\_{\langle x,t\rangle}^{S\odot\_D^a T}(\langle x',y'\rangle,\varphi\_{x,[t]}^{-1}(X\_x\times X\_t))=\delta\_{x'}(X\_x)\times k\_t^T(y',X\_t).$$

By Lemma 7, the sets ϕ−<sup>1</sup> x,[t] (X<sup>x</sup> <sup>×</sup> <sup>X</sup>t) form a basis for <sup>Σ</sup><sup>π</sup>−1{ x,[t)}, so that this definition determines the entire kernel.

So we have defined a kernel <sup>k</sup><sup>S</sup>·D<sup>T</sup> <sup>s</sup> for each sample event <sup>s</sup> of <sup>G</sup><sup>S</sup>·D<sup>T</sup> . We move to the composition (<sup>S</sup> <sup>a</sup> <sup>D</sup> T). Recall that the *causal history* of a configuration <sup>z</sup> <sup>∈</sup> *<sup>C</sup>* (G<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> ) is the set [z], a configuration of <sup>G</sup><sup>S</sup>·D<sup>T</sup> . We see that hiding does not affect the fibre structure:

**Lemma 8.** *For any* <sup>z</sup> <sup>∈</sup> *<sup>C</sup>* (G<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> )*, there is a measurable isomorphism* <sup>ψ</sup><sup>z</sup> : π−1 <sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> {z} <sup>∼</sup><sup>=</sup> <sup>π</sup>−<sup>1</sup> <sup>S</sup>·D<sup>T</sup> {[z]}*.*

Using this result and the fact that <sup>G</sup><sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> <sup>⊆</sup> <sup>G</sup><sup>S</sup>·D<sup>T</sup> , we may define for each s:

$$k\_s^{S \odot\_D^a T}(v, X) = k\_s^{S \cdot\_D T}(\psi\_{[s)}(v), \psi\_{[s]} X).$$

We conclude:

**Lemma 9.** <sup>S</sup> <sup>a</sup> <sup>D</sup> <sup>T</sup> := (G<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> , E<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> , π<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> ,(k<sup>S</sup> <sup>a</sup> <sup>D</sup><sup>T</sup> <sup>s</sup> )) *is a quantitative dataflow graph on* <sup>Γ</sup> <sup>B</sup>*.*

*Multicomposition.* By chaining this composition, we can compose on several variables at once. Given quantitative dataflow graphs <sup>S</sup><sup>i</sup> on <sup>Γ</sup> <sup>A</sup><sup>i</sup> and <sup>T</sup> on Γ, a<sup>1</sup> : <sup>A</sup>1,...,a<sup>n</sup> : <sup>A</sup><sup>n</sup> <sup>A</sup> we define

$$\begin{aligned} (S\_i) \odot\_{\text{par}}^{(a\_i)} T &:= S\_1 \odot\_{\text{par}}^{a\_1} (\dots \odot\_{\text{par}}^{a\_n} T) \\ (S\_i) \odot\_{\text{seq}}^{(a\_i)} T &:= S\_1 \odot\_{\text{seq}}^{a\_1} (\dots \odot\_{\text{seq}}^{a\_n} T) \end{aligned}$$

#### **4.2 Interpretation of Programs**

We now describe how to interpret programs of our language using quantitative dataflow graphs. To do so we follow the same pattern as for the measuretheoretical interpretation given in Sect. 2.3.

*Interpretation of Functions.* Given a measurable function f : -<sup>A</sup> <sup>→</sup> -B, we define the quantitative dataflow graph

$$S\_f^a = \left(\sum\_{v \in \{A\}} \begin{array}{c} \mathsf{Rd}\, a \, v \\ \Downarrow \\ \mathsf{Rtn}\,(f \, v) \end{array} \rightarrow \begin{array}{c} \mathsf{Rd}\, a \\ \spadesuit \\ \mathsf{Rtn} \end{array} \right) .$$

We then define f M<sup>G</sup> as -<sup>M</sup><sup>G</sup> <sup>a</sup> par <sup>S</sup><sup>a</sup> <sup>f</sup> where a is chosen so as not to occur free in M.

*Probablistic Actions.* In order to interpret scoring and sampling primitives, we need the following two quantitative dataflow graphs:

$$\mathbf{score} = \begin{pmatrix} \mathbf{Rd}\,a\,r & & \mathbf{Rd}\,a \\ \mathbf{\forall} & \mathbf{\forall} & \mathbf{\forall} \\ \mathbf{\forall}\mathbf{\mathcal{S}co}\,r & \rightarrow & \mathbf{\mathbf{\mathcal{S}co}} \\ & \mathbf{\mathcal{R}tn}\,(\,) & & \mathbf{Rtn} \end{pmatrix} \quad \mathbf{sample}\_{d} = \begin{pmatrix} \mathbf{Rd}\,a\,r & & \mathbf{Rd}\,a \\ \mathbf{\forall} & \mathbf{\forall} & \mathbf{\forall} \\ \mathbf{\forall} & \mathbf{\forall} & \mathbf{\forall} \\ & \mathbf{\forall}\mathbf{\mathcal{n}n}\,s & \rightarrow & \mathbf{\textit{\mathcal{S}am}} \\ & \mathbf{\textit{Rtn}}\,(\,) & & \mathbf{\textit{Rtn}} \end{pmatrix}.$$

and we define kSam by integrating the density function d; here we identify *<sup>Q</sup>*({Rd a , Sam}) and <sup>π</sup>−<sup>1</sup>{{Rd a , Sam}}:

$$k\_{\mathsf{Sam}}(\{\mathsf{Rd}\,a\,\mathbf{r}\},U) = \int\_{q\in U,q\,(\mathsf{Rd}\,a)=\mathbf{r}} d(\mathbf{r},q(\mathsf{Sam})) \mathrm{d}\lambda.$$

We can now interpret scoring and sampling constructs:

score <sup>M</sup><sup>G</sup> <sup>=</sup> -<sup>M</sup><sup>G</sup> <sup>a</sup> par score sample <sup>d</sup> (M)<sup>G</sup> <sup>=</sup> -<sup>M</sup><sup>G</sup> <sup>a</sup> par sampled.

*Interpretation of Tuples and Variables.* Given a family (ai)<sup>i</sup>∈<sup>I</sup> , we define the dataflow graph tuple(ai:Ai) on <sup>a</sup><sup>1</sup> : <sup>A</sup>1,...,a<sup>n</sup> : <sup>A</sup><sup>n</sup> <sup>A</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>A</sup><sup>n</sup> as follows. Its set of events is the disjoint union

$$\bigcup\_{i \in I, v \in \{A\_i\}} \mathsf{Rd}\ a\_i \, v + \bigcup\_{\mathbf{v} \in \{A\_1 \times \ldots \times A\_n\}} \mathsf{Rt}\mathbf{n} \, \mathbf{v}$$

where the conflict is induced by Rd a<sup>i</sup> v - - Rd <sup>a</sup><sup>i</sup> <sup>v</sup> for <sup>v</sup> <sup>=</sup> <sup>v</sup> ; and causality contains all the pairs Rd a<sup>i</sup> v Rtn (v1,...,vn) where v<sup>i</sup> = v. Then we form a quantitative dataflow graph Tuple(ai:Ai), whose dependency graph is tuple(ai:1) (up to the bijection *L* run <sup>Γ</sup> <sup>A</sup> *<sup>L</sup>* static <sup>Γ</sup><sup>1</sup> where <sup>Γ</sup> (a) = 1 for <sup>a</sup> <sup>∈</sup> dom(Γ)); and the runtime graph is tuple(ai:Ai), along with the obvious rigid map between them.

We then define the semantics of (M1,...,Mn):

$$\|(M\_1, \ldots, M\_n)\|\_{\mathcal{G}} = (\|M\_i\|\_{\mathcal{G}})\_i \odot\_{\text{par}}^{(a\_i)} \texttt{Tup1} \mathbf{e\_{a\_i:A\_i}},$$

where the a<sup>i</sup> are chosen free in all of the M<sup>j</sup> . This construction is also useful to interpret variables:

$$\|a\|\_{\mathcal{G}} = \mathsf{Tup1}\mathfrak{e}\_{a:A} \qquad \text{where } \varGamma \vdash a:A.$$

*Interpretation of Pattern Matching.* Consider now a term of the form case M of {(i, a) <sup>⇒</sup> <sup>N</sup><sup>i</sup>}<sup>i</sup>∈<sup>i</sup>. By induction, we have that -<sup>N</sup>i<sup>G</sup> is a quantitative dataflow graph on Γ, a : <sup>A</sup><sup>i</sup> <sup>B</sup>. Let us write -Ni<sup>∗</sup> <sup>G</sup> for the quantitative dataflow graph on Γ, a : ( <sup>i</sup>∈<sup>I</sup> <sup>A</sup>i) <sup>B</sup> obtained by relabelling events of the form Rd a v to Rd <sup>a</sup> (i, v), and sequentially precomposing with Tuple<sup>a</sup>: <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> . This ensures that minimal events in -Ni<sup>∗</sup> <sup>G</sup> are reads on <sup>a</sup>. We then build the quantitative dataflow graph <sup>i</sup>∈<sup>I</sup> -Ni<sup>∗</sup> <sup>G</sup> on Γ, a : <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> <sup>B</sup>. This can be composed with -<sup>M</sup>G:

$$\|\mathsf{Case}\ \mathit{M}\ \mathsf{of}\ \{(i,a)\Rightarrow N\_{i}\}\_{i\in I}\|\_{\mathcal{G}}=\|\mathsf{M}\|\_{\mathcal{G}}\circ\_{\text{seq}}^{a}\left(\sum\_{i\in I}\|N\_{i}\|\_{\mathcal{G}}^{\*}\right).$$

It is crucial here that one uses *sequential* composition: none of the branches must be evaluated until the outcome of M is known.

*Adequacy of Composition.* We now prove that our interpretation is adequate with respect to the measure-theoretic semantics described in Sect. 2.3. Given any subset <sup>D</sup> <sup>⊆</sup> *<sup>L</sup>* static Γ,a:A<sup>B</sup> containing returns and reads on <sup>a</sup>, we show that the composition <sup>S</sup> <sup>a</sup> <sup>D</sup> T does implement the composition of kernels:

**Theorem 1.** *For* <sup>S</sup> *a quantitative dataflow graph on* <sup>Γ</sup> <sup>A</sup> *and* <sup>T</sup> *on* Γ, a : <sup>A</sup> B*, we have*

*kernel*(<sup>S</sup> <sup>a</sup> <sup>D</sup> <sup>T</sup>) = *kernel*(T) ◦ *kernel*(S) : -<sup>Γ</sup> <sup>→</sup> -B.

From this result, we can deduce that the semantics in terms of quantitative dataflow graphs is adequate with respect to the measure-theoretic semantics:

**Theorem 2.** *For every term* <sup>Γ</sup> <sup>M</sup> : <sup>A</sup>*, kernel*(-<sup>M</sup>G) = -M*.*

# **5 An Inference Algorithm**

In this section, we exploit the intensional semantics defined above and define a Metropolis-Hastings inference algorithm. We start, in Sect. 5.1, by giving a concrete presentation of those quantitative dataflow graphs arising as the interpretation of probabilistic programs; we argue this makes them well-suited for manipulation by an algorithm. Then, in Sect. 5.2, we give a more formal introduction to the Metropolis-Hastings sampling methods than that given in Sect. 3. Finally, in Sect. 5.3, we build the proposal kernel on which our implementation relies, and conclude.

#### **5.1 A Concrete Presentation of Probabilistic Dataflow Graphs**

Quantitative dataflow graphs as presented in the previous sections are not easy to handle inside of an algorithm: among other things, the runtime graph has an uncountable set of events. In this section we show that some dataflow graphs, in particular those needed for modelling programs, admit a finite representation.

*Recovering Fibres.* Consider a dataflow graph **<sup>S</sup>** = (ES, GS, πS) on <sup>Γ</sup> <sup>B</sup>. It follows from Lemma 3 that the fibre structure of **S** is completely determined by the spaces π−<sup>1</sup> <sup>S</sup> {[s]}, for <sup>s</sup> <sup>∈</sup> <sup>G</sup>S, so we focus on trying to give a simplified representation for those spaces.

First, let us notice that if <sup>s</sup> is a return or score event, given <sup>x</sup> <sup>∈</sup> <sup>π</sup>−<sup>1</sup>{x}, the value <sup>q</sup>x(s) is determined by <sup>q</sup>|[s). In other words the map <sup>π</sup>−<sup>1</sup>{[s]} → *<sup>Q</sup>*([s)) is an injection. This is due to the fact that minimal conflict in E<sup>S</sup> cannot involve return or score events. As a result, E<sup>S</sup> induces a partial function o<sup>S</sup> <sup>s</sup> : *Q*([s)) *Q*(lbl(s)), called the **outcome function**. It is defined as follows:

$$o\_s^S(q) = \begin{cases} q\_{[s]}(x')(s) & \text{if there exists } x' \in \pi^{-1}\{x'\}, q\_{[s]}(x')|\_{[s)} = q, \\ \text{undefined} & \text{otherwise.} \end{cases}$$

Note that x must be unique by the remark above since its projection to *Q*([s)) is determined by q. The function o<sup>S</sup> is partial, because it might be the case that the event s occurs conditionally on the runtime value on [s).

In fact this structure is all we need in order to describe a dataflow graph:

**Lemma 10.** *Given* <sup>G</sup><sup>S</sup> *a dependency graph on* <sup>Γ</sup> <sup>B</sup>*, and partial functions* (os) : *Q*([s)) *Q*(*lbl*(s)) *for score and return events of* S*. There exists a dataflow graph* (ES, GS, π<sup>S</sup> : <sup>E</sup><sup>S</sup> <sup>→</sup> <sup>G</sup>S) *whose outcome functions coincide with the* os*. Moreover, there is an order-isomorphism*

$$\mathcal{H}(E\_S) \cong \{(x, q) \mid x \in \mathcal{C}(G\_S), q \in \mathcal{Q}(x), \forall s \in x, o\_s(q|\_{[s]}) = q(s)\}.$$

*Adding Probabilities.* To add probabilities, we simply equip each sample event s of <sup>G</sup><sup>S</sup> with a density function <sup>d</sup><sup>s</sup> : *<sup>Q</sup>*([s)) <sup>×</sup> <sup>R</sup> <sup>R</sup>.

**Definition 6.** *A concrete quantitative dataflow graph is a tuple* (GS,(o<sup>s</sup> : *<sup>Q</sup>*([s)) *<sup>Q</sup>*(*lbl*(s))),(d<sup>s</sup> : *<sup>Q</sup>*([s)) <sup>×</sup> <sup>R</sup> <sup>R</sup>)<sup>s</sup>∈*sample*(G<sup>S</sup> )) *where* <sup>d</sup>s(x, ·) *is normalised.*

**Lemma 11.** *Any concrete quantitative dataflow graph* S *unfolds to a quantitative dataflow graph unfold* <sup>S</sup>*.*

We see now that the quantitative dataflow graphs arising as the interpretation of a program must be the unfolding of a concrete quantitative dataflow graph:

**Lemma 12.** *For any concrete quantitative dataflow graphs* <sup>S</sup> *on* <sup>Γ</sup> <sup>A</sup> *and* <sup>T</sup> *on* Γ, a : <sup>A</sup> <sup>B</sup>*, unfold* S <sup>a</sup> <sup>D</sup> <sup>T</sup>*unfold* <sup>T</sup> *is the unfolding of a concrete quantitative dataflow graph. It follows that for any program* <sup>Γ</sup> <sup>M</sup> : <sup>B</sup>*,* -<sup>M</sup><sup>G</sup> *is the unfolding of a concrete quantitative dataflow graph.*

#### **5.2 Metropolis-Hastings**

Recall that the Metropolis-Hastings algorithm is used to sample from a density function <sup>d</sup> : <sup>A</sup> <sup>→</sup> <sup>R</sup> which may not be normalised. Here <sup>A</sup> is a measurable *state space*, equipped with a measure λ. The algorithm works by building a Markov chain whose stationary distribution is D, the probability distribution obtained from d after normalisation:

$$\forall X \in \Sigma\_{\mathbb{A}}, D(X) = \frac{\int\_{x \in X} d(x)}{\int\_{x \in \mathbb{A}} d(x)}.$$

Our presentation and reasoning in the rest of this section are inspired by the work of Borgstr¨om et al. [2].

*Preliminaries on Markov Chains.* A Markov chain on a measurable state space A is a probability kernel k : A - A, viewed as a transition function: given a state <sup>x</sup> <sup>∈</sup> <sup>A</sup>, the distribution <sup>k</sup>(x, ·) is the distribution from which a next sample state will be drawn. Usually, each <sup>k</sup>(x, ·) comes with a procedure for sampling: we will treat this as a probabilistic program M(x) whose output is the next state. Given an initial state <sup>x</sup> <sup>∈</sup> <sup>A</sup> and a natural number <sup>n</sup> <sup>∈</sup> <sup>N</sup>, we have a distribution <sup>k</sup>n(x, ·) on <sup>A</sup> obtained by iterating k n times. We say that the Markov chain <sup>k</sup> has **limit** the distribution μ on A when

$$\lim\_{n \to \infty} ||k^n(x, \cdot) - \mu|| = 0 \qquad \text{where } ||\mu\_1 - \mu\_2|| = \sup\_{A \in \Sigma\_h} \mu\_1(A) - \mu\_2(A).$$

For the purposes of this paper, we call a Markov chain <sup>k</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> **computable** when there exists a type A such that -A = A (up to iso) and an expression *without scores* <sup>x</sup> : <sup>A</sup> <sup>K</sup> : <sup>A</sup> such that -K = k. (Recall that programs without conditioning denote probabilistic kernels, and are easily sampled from, since all standard distributions in the language are assumed to come with a built-in sampler.)

We will use terms of our language to describe computable Markov chains language, taking mild liberties with syntax. We assume in particular that programs may call each other as subroutines (this can be done via substitutions), and that manipulating finite structures is computable and thus representable in the language.

*The Metropolis-Hastings Algorithm.* Recall that we wish to sample from a distribution with un-normalised density <sup>d</sup> : <sup>A</sup> <sup>→</sup> <sup>R</sup>; <sup>d</sup> is assumed to be computable. The Markov chain defined by the Metropolis-Hastings algorithm has two parameters: a computable Markov chain <sup>x</sup> : <sup>A</sup> <sup>P</sup> : <sup>A</sup>, the *proposal kernel*, and a measurable, computable function <sup>p</sup> : <sup>A</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> representing the kernel -P, *i.e.*

$$\mathbb{E}[P](x, X') = \int\_{x' \in X'} p(x, x') \, d\lambda(x').$$

The Markov-chain MH(P, p, d) is defined as

$$\begin{aligned} \mathsf{MH}(P, p, d)(x) &:= \mathsf{1et} \quad x' = P(x) \text{ in} \\ \mathsf{1et} \quad \alpha &= \min\left(1, \frac{d(x') \times p(x, x')}{d(x) \times p(x', x)}\right) \text{ in} \\ \mathsf{1et} \quad u &= \mathsf{sample} \text{ uniform (0, 1) in} \\ \text{if } \ u &< \alpha \text{ then } x' \text{ else } x \end{aligned}$$

In words, the Markov chain works as follows: given a start state x, it generates a proposal for the next state x using P. It then computes an *acceptance ratio* α, which is the probability with which the new sample will be *accepted*: the return state will then either be the original x or x , accordingly.

Assuming P and p satisfy a number of conditions, the algorithm is correct:

**Theorem 3.** *Assume that* P *and* p *satisfies the following properties:*


*Then, the limit of MH*(P, p, d) *for any initial state* <sup>x</sup> <sup>∈</sup> <sup>A</sup> *with* <sup>d</sup>(x) <sup>&</sup>gt; <sup>0</sup> *is equal to* D*, the distribution obtained after normalising* d*.*

#### **5.3 Our Proposal Kernel**

Consider a closed program <sup>M</sup> : <sup>A</sup> in which every measurable function is a computable one. Then, its interpretation as a concrete quantitative dataflow graph is computable, and we write S for the quantitative dataflow graph whose unfolding is -<sup>M</sup>G. Moreover, because <sup>M</sup> is closed, its measure-theoretic semantics gives a measure -M on -A. Assume that norm(-M) is well-defined: it is a probability distribution on -A. We describe how a Metropolis-Hastings algorithm may be used to sample from it, by reducing this problem to that of sampling from configurations of E<sup>S</sup> according to the following density:

$$d\_S(x,q) := \left(\prod\_{s \in \text{sample}(x)} d\_s(q(s))\right) \left(\prod\_{s \in \text{score}(x)} q(s)\right).$$

Lemma 10 induces a natural measure on *C* (ES). We have:

**Lemma 13.** *For all* <sup>X</sup> <sup>∈</sup> <sup>Σ</sup>*<sup>C</sup>* (E<sup>S</sup> )*,* <sup>μ</sup><sup>S</sup> (X) = <sup>y</sup>∈<sup>X</sup> dS(y)dy.

Note that dS(x, q) is easy to compute, but it is not normalised. Computing the normalising factor is in general intractable, but the Metropolis-Hastings algorithm does not require the density to be normalised.

Let us write μ<sup>S</sup> norm(X) = <sup>μ</sup><sup>S</sup> (X) <sup>μ</sup><sup>S</sup> (*<sup>C</sup>* (E<sup>S</sup> )) for the normalised distribution. By adequacy, we have for all <sup>X</sup> <sup>∈</sup> <sup>Σ</sup>-<sup>A</sup>:

$$\operatorname{norm}[M](X) = \mu\_{\operatorname{norm}}^{\mathcal{S}}(\mathsf{resul}\,\mathsf{t}^{-1}(X)).$$

where result : max *C* (ES) -A maps a maximal configuration of E<sup>S</sup> to its return value, if any. This says that sampling from norm-M amounts to sampling from μ<sup>S</sup> norm and only keeping the return value.

Accordingly, we focus on designing a Metropolis-Hastings algorithm for sampling values in *C* (ES) following the (unnormalised) density dS. We start by defining a proposal kernel for this algorithm.

To avoid overburdening the notation, we will no longer distinguish between a type and its denotation. Since G<sup>S</sup> is finite, it can be represented by a type, and so can *C* (GS). Moreover, *C* (ES) is a subset of <sup>x</sup>∈*<sup>C</sup>* (G<sup>S</sup> ) *<sup>Q</sup>*(x) which is also representable as the type of pairs (<sup>x</sup> <sup>∈</sup> *<sup>C</sup>* (GS), q <sup>∈</sup> *<sup>Q</sup>*(x)). Operations on <sup>G</sup><sup>S</sup> and related objects are all computable and measurable so we can directly use them in the syntax. In particular, we will make use of the function ext : *<sup>C</sup>* (ES) <sup>→</sup> <sup>G</sup><sup>S</sup> +1 which for each configuration (x, q) <sup>∈</sup> *<sup>C</sup>* (ES) returns (1, <sup>s</sup>) if there exists <sup>x</sup> <sup>s</sup> −−⊂ with <sup>o</sup>s(q|[s)) defined, and (2, <sup>∗</sup>) if (x, q) is maximal.

Informally, for (x, q) <sup>∈</sup> *<sup>C</sup>* (ES), the algorithm is:


The last step follows the single-site MH principle: sample events in <sup>x</sup> <sup>∩</sup> <sup>x</sup> have already been evaluated in x, and are not updated. However, events which are in <sup>x</sup> \ <sup>x</sup> belong to conditional branches not explored in <sup>x</sup>; they must be sampled.

We start by formalising the last step of the algorithm. We give a probabilistic program complete which has three parameters: the original configuration (x, q), the current modification (x0, q0) and returns a possible maximal extension:

```
complete(x, q, x0, q0) = case ext(x0, q0) of
   (2,()) ⇒ (x0, q0)
   (1, s) ⇒
    if s is a return or a score event then
      complete(x, v, x0 ∪ {s}, q0[s := os(q0)])
    else if s ∈ x
         complete(x, q, x0 ∪ {s}, q0[s := q(s)])
    else
         complete(x, q, x0 ∪ {s}, q0[s := sample d (q0)])
```
The program starts by trying to extend (x0, q0) by calling ext. If (x0, q0) is already maximal, we directly return it. Otherwise, we get an event s. To extend the quantitative information, there are three cases:


This program is recursive, but because G<sup>S</sup> is finite, there is a static bound on the number of recursive calls; thus this program can be unfolded to a program expressible in our language. We can now define the proposal kernel:

$$\begin{aligned} &P\_S(x,q) = \\ &\mathbf{1et}\text{ : } s = \texttt{sample uniformly over } \texttt{sample events in } x \text{ in } x \\ &\mathbf{1et}\text{ : } r = \texttt{sample } d\_s \text{ (}q\_{[s]}\text{) in } \\ &\mathbf{1et}\text{ : } x\_0 = x \text{ (} \{s' \ge s \mid s' \in x\} \text{ in } \\ &\mathbf{com1}\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\langle\texttt{\rightarrow}}}}}}}}}}}}}}}} \}} \end{}} \end{}$$

We now need to compute the density for P<sup>S</sup> to be able to apply Metropolis-Hastings. Given (x, q),(x , q ) <sup>∈</sup> *<sup>C</sup>* (ES), we define:

$$p\_S((x,q),(x',q')) = \sum\_{s \in \text{sample}(x)} \left( \frac{q\_s(v'|\_{[s]})}{|\text{sample}(x)|} \times \prod\_{s' \in \text{sample}(x'|\_{s})} q\_{s'}(v|\_{[s']}) \right).$$

**Theorem 4.** *The Markov chain* P<sup>S</sup> *and density* p *satisfy the hypothesis of Theorem 3, as a result for any* (x, q) <sup>∈</sup> *<sup>C</sup>* (ES) *the distribution* -*MH*(dS, PS, pS) <sup>n</sup>((x, q), ·) *tends to* <sup>μ</sup><sup>P</sup> *norm as* n *goes to infinity.*

One can thus sample from norm(-M) using the algorithm above, keeping only the return value of the obtained configuration.

Let us re-state the key advantage of our approach: having access to the data dependency information, complete requires fewer steps in general, because at each proposal step only a portion of the graph needs exploring.

# **6 Conclusion**

*Related Work.* There are numerous approaches to the semantics of programs with random choice. Among those concerned with statistical applications of probabilistic programming are Staton et al. [18,19], Ehrhard et al. [7], and Dahlqvist et al. [6]. A game semantics model was announced in [15].

The work of Scibior et al. [17] was influential in suggesting a denotational approach for proving correctness of inference, in the framework of quasi-Borel spaces [9]. It is not clear however how one could reason about data dependencies in this framework, because of the absence of explicit causal information.

Hur et al. [11] gives a proof of correctness for Trace MCMC using new forms of operational semantics for probabilistic programs. This method is extended to higher-order programs with *soft constraints* in Borgstr¨om et al. [2]. However, these approaches do not consider incremental recomputation.

To the best of our knowledge, this is the first work addressing formal correctness of incremental recomputation in MCMC. However, methods exist which take advantage of data dependency information to improve the performance of each proposal step in "naive" Trace MCMC. We mention in particular the work on *slicing* by Hur et al. [10]; other approaches include [5,24]. In the present work we claim no immediate improvement in performance over these techniques, but only a mathematical framework for reasoning about the structures involved.

It is worth remarking that our event structure representation is reminiscent of *graphical model* representation made explicit in some languages. Indeed, for a first-order language such as the one of this paper, Bayesian networks can directly be used as a semantics, see [20]. We claim that the alternative view offered by event structures will allow for an easier extension to higher-order programs, using ideas from game semantics.

*Perspectives.* This is the start of an investigation into intensional semantics for probabilistic programs. Note that the framework of event structures is very flexible and the semantics presented here is by no means the only possible one. Additionally, though the present work only treats the case of a first-order language, we believe that building on recent advances in probabilistic concurrent game semantics [3,16] (from which the present work draws much inspiration), we can extend the techniques of this paper to arbitrary higher-order probabilistic programs with recursion.

**Acknowledgements.** We thank the anonymous referees for helpful comments and suggestions. We also thank Ohad Kammar for suggesting the idea of using causal structures for reasoning about data dependency in this context. This work has been partially sponsored by: EPSRC EP/K034413/1, EP/K011715/1, EP/L00058X/1, EP/N027833/1, EP/N028201/1, and an EPSRC PhD studentship.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Types

# **Handling Polymorphic Algebraic Effects**

Taro Sekiyama1(B) and Atsushi Igarashi2(B)

<sup>1</sup> National Institute of Informatics, Tokyo, Japan tsekiyama@acm.org <sup>2</sup> Kyoto University, Kyoto, Japan igarashi@kuis.kyoto-u.ac.jp

**Abstract.** Algebraic effects and handlers are a powerful abstraction mechanism to represent and implement control effects. In this work, we study their extension with parametric polymorphism that allows abstracting not only expressions but also effects and handlers. Although polymorphism makes it possible to reuse and reason about effect implementations more effectively, it has long been known that a naive combination of polymorphic effects and let-polymorphism breaks type safety. Although type safety can often be gained by restricting let-bound expressions—e.g., by adopting value restriction or weak polymorphism we propose a complementary approach that restricts handlers instead of let-bound expressions. Our key observation is that, informally speaking, a handler is safe if resumptions from the handler do not interfere with each other. To formalize our idea, we define a call-by-value lambda calculus λlet eff that supports let-polymorphism and polymorphic algebraic effects and handlers, design a type system that rejects interfering handlers, and prove type safety of our calculus.

# **1 Introduction**

Algebraic effects [20] and handlers [21] are a powerful abstraction mechanism to represent and implement control effects, such as exceptions, interactive I/O, mutable states, and nondeterminism. They are growing in popularity, thanks to their success in achieving modularity of effects, especially the clear separation between their interfaces and their implementations. An interface of effects is given as a set of *operations*—e.g., an interface of mutable states consists of two operations, namely, put and get—with their signatures. An implementation is given by a *handler H* , which provides a set of interpretations of the operations (called *operation clauses*), and a handle–with expression handle *M* with *H* associates effects invoked during the computation of *M* with handler *H* . Algebraic effects and handlers work as *resumable exceptions*: when an effect operation is invoked, the run-time system tries to find the nearest handler that handles the invoked operation; if it is found, the corresponding operation clause is evaluated by using the argument to the operation invocation and the continuation up to the handler. The continuation gives the ability to resume the computation from the point where the operation was invoked, using the result from the operation clause. Another modularity that algebraic effects provide is flexible composition: multiple algebraic effects can be combined freely [13].

In this work, we study an extension of algebraic effects and handlers with another type-based abstraction mechanism—parametric polymorphism [22]. In general, parametric polymorphism is a basis of generic programming and enhance code reusability by abstracting expressions over types. This work allows abstracting not only expressions but also effect operations and handlers, which makes it possible to reuse and reason about effect implementations that are independent of concrete type representations. Like in many functional languages, we introduce polymorphism in the form of *let-polymorphism* for its practically desirable properties such as decidable typechecking and type inference.

As is well known, however, a naive combination of polymorphic effects and let-polymorphism breaks type safety [11,23]. Many researchers have attacked this classical problem [1,2,10,12,14,17,23,24], and their common idea is to restrict the form of let-bound expressions. For example, value restriction [23,24], which is the standard way to make ML-like languages with imperative features and let-polymorphism type safe, allows only syntactic values to be polymorphic.

In this work, we propose a new approach to achieving type safety in a language with let-polymorphic and polymorphic effects and handlers: the idea is to restrict handlers instead of let-bound expressions. Since a handler gives an implementation of an effect, our work can be viewed as giving a criterion that suggests what effects can cooperate safely with (unrestricted) let-polymorphism and what effects cannot. Our key observation for type safety is that, informally speaking, an invocation of a polymorphic effect in a let-bound expression is safe if resumptions in the corresponding operation clause do not interfere with each other. We formalize this discipline into a type system and show that typeable programs do not get stuck.

Our contributions are summarized as follows.


We believe that our approach is complementary to the usual approach of restricting let-bound expressions: for handlers that are considered unsafe by our criterion, the value restriction can still be used.

The rest of this paper is organized as follows. Section 2 provides an overview of our work, giving motivating examples of polymorphic effects and handlers, a problem in naive combination of polymorphic effects and let-polymorphism, and our solution to gain type safety with those features. Section 3 defines the surface language λlet eff, and Sect. 4 defines the intermediate language λ<sup>Λ</sup> eff and the elaboration from λlet eff to λ<sup>Λ</sup> eff. We also state that the elaboration is typepreserving and that λ<sup>Λ</sup> eff is type sound in Sect. 4. Finally, we discuss related work in Sect. 5 and conclude in Sect. 6. The proofs of the stated properties and the full definition of the elaboration are given in the full version at https://arxiv. org/abs/1811.07332.

# **2 Overview**

We start with reviewing how monomorphic algebraic effects and handlers work through examples and then extend them to a polymorphic version. We also explain why polymorphic effects are inconsistent with let-polymorphism, if naively combined, and how we resolve it.

### **2.1 Monomorphic Algebraic Effects and Handlers**

*Exception.* Our first example is exception handling, shown in an ML-like language below.

```
1 effect fail : unit → unit
2
3 let div100 (x:int) : int =
4 if x = 0 then (#fail(); -1)
5 else 100 / x
6
7 let f (y:int) : int option =
8 handle (div_100 y) with
9 return z → Some z
10 fail z → None
```
Some and None are constructors of datatype α option. Line 1 declares an effect operation fail, which signals that an anomaly happens, with its signature unit → unit, which means that the operation is invoked with the unit value (), causes some effect, and may return the unit value. The function div100, defined in Lines 3–5, is an example that uses fail; it returns the number obtained by dividing 100 by argument x if x is not zero; otherwise, if x is zero, it raises an exception by calling effect operation fail. <sup>1</sup> In general, we write #op(*M* ) for invoking effect operation op with argument *M* . The function f (Lines 7–10) calls div 100 inside a handle–with expression, which returns Some n if div 100 returns integer n normally and returns None if it invokes fail.

An expression of the form handle *M* with *H* handles effect operations invoked in *M* (which we call *handled expression*) according to the effect interpretations given by handler *H* . A handler *H* consists of two parts: a single *return*

<sup>1</sup> Here, "; -1" is necessary to make the types of both branches the same; it becomes unnecessary when we introduce polymorphic effects.

*clause* and zero or more *operation clauses*. A return clause return x → *M* will be executed if the evaluation of *M* results in a value *v*. Then, the value of *M* (where x is bound to *v*) will be the value of the entire handle–with expression. For example, in the program above, if a nonzero number n is passed to f, the handle–with expression would return Some (100/n) because div100 n returns 100/n. An operation clause op x → *M* defines an implementation of effect op: if the evaluation of handled expression *M* invokes effect op with argument *v*, expression *M* will be evaluated after substituting *v* for x and the value of *M* will be the value of the entire handle–with expression. In the program example above, if zero is given to f, then None will be returned because div100 0 invokes fail.

As shown above, algebraic effect handling is similar to exception handling. However, a distinctive feature of algebraic effect handling is that it allows *resumption* of the computation from the point where an effect operation was invoked. The next example demonstrates such an ability of algebraic effect handlers.

*Choice.* The next example is effect choose, which returns one of the given two arguments.

```
1 effect choose : int × int → int
2
3 handle (#choose(1,2) + #choose(10,20)) with
4 return x → x
5 choose x → resume (fst x)
```
As usual, *A*<sup>1</sup> × *A*<sup>2</sup> is a product type, (*M*1, *M*2) is a pair expression, and fst is the first projection function. The first line declares that effect choose is for choosing integers. The handled expression #choose(1,2) + #choose(10,20) intuitively suggests that there would be four possible results—11, 21, 12, and 22—depending on which value each invocation of choose returns. The handler in this example always chooses the first element of a given pair<sup>2</sup> and returns it by using a resume expression, and, as a result, the expression in Lines 3–5 evaluates to 11.

A resumption expression resume *M* in an operation clause makes it possible to return a value of *M* to the point where an effect operation was invoked. This behavior is realized by constructing a *delimited continuation* from the point of the effect invocation up to the handle–with expression that deals with the effect and passing the value of *M* to the continuation. We illustrate it by using the program above. When the handled expression #choose(1,2) + #choose(10,20) is evaluated, continuation c def = [] + #choose(10,20) is constructed. Then, the body resume (fst x) of the operation clause is evaluated after binding x to the invocation argument (1,2). Receiving the value 1 of fst (1,2), the resumption

<sup>2</sup> We can think of more practical implementations, which choose one of the two arguments by other means, say, random values.

expression passes it to the continuation c and c[1] = 1 + #choose(10,20) is evaluated under the same handler. Next, choose is invoked with argument (10,20). Similarly, continuation c def = 1 + [ ] is constructed and the operation clause for choose is executed again. Since fst (10,20) evaluates to 10, c [10] = 1 + 10 is evaluated under the same handler. Since the return clause returns what it receives, the entire expression evaluates to 11.

Finally, we briefly review how an operation clause involving resumption expressions is typechecked [3,13,16]. Let us consider operation clause op(x) → M for op of type signature A → B. The typechecking is performed as follows. First, argument x is assigned the domain type *A* of the signature as it will be bound to an argument of an effect invocation. Second, for resumption expression resume *M* in *M* , (1) *M* is required to have the codomain type *B* of the signature because its value will be passed to the continuation as the result of the invocation and (2) the resumption expression is assigned the same type as the return clause. Third, the type of the body *M* has to be the same as that of the return clause because the value of *M* is the result of the entire handle–with expression. For example, the above operation clause for choose is typechecked as follows: first, argument x is assigned type int × int; second, it is checked whether the argument fst x of the resumption expression has int, the codomain type of choose; third, it is checked whether the body resume (fst x) of the clause has the same type as the return clause, i.e., int. If all the requirements are satisfied, the clause is well typed.

#### **2.2 Polymorphic Algebraic Effects and Handlers**

This section discusses motivation for polymorphism in algebraic effects and handlers. There are two ways to introduce polymorphism: by *parameterized effects* and by *polymorphic effects*.

The former is used to parameterize the declaration of an effect by types. For example, one might declare:

effect α choose : α × α → α

An invocation #choose involves a parameterized effect of the form A choose (where A denotes a type), according to the type of arguments: For example, #choose(true,false) has the effect bool choose and #choose(1,-1) has int choose. Handlers are required for each effect A choose.

The latter is used to give a polymorphic type to an effect. For example, one may declare

effect choose : ∀α. α × α → α

In this case, the effect can be invoked with different types, but all invocations have the same effect choose. One can implement a single operation clause that can handle all invocations of choose, regardless of argument types. Koka supports both styles [16] (with the value restriction); we focus, however, on the latter in this paper. A type system for parameterized effects lifting the value restriction is studied by Kammar and Pretnar [14] (see Sect. 5 for comparison).

In what follows, we show a polymorphic version of the examples we have seen, along with brief discussions on how polymorphic effects help with reasoning about effect implementations. Other practical examples of polymorphic effects can be found in Leijen's work [16].

*Polymorphic Exception.* First, we extend the exception effect fail with polymorphism.

```
1 effect fail∀ : ∀α. unit → α
2
3 let div100∀ (x:int) : int =
4 if x = 0 then #fail∀()
5 else 100 / x
```
The polymorphic type signature of effect fail∀, given in Line 1, means that the codomain type α can be any. Thus, we do not need to append the dummy value -1 to the invocation of fail<sup>∀</sup> by instantiating the bound type variable α with int (the shaded part).

*Choice.* Next, let us make choose polymorphic.

```
1 effect choose∀ : ∀α. α × α → α
2
3 let rec random_walk (x:int) : int =
4 let b = #choose∀(true,false) in
5 if b then random_walk (x + #choose∀(1,-1))
6 else x
7
8 let f (s:int) =
9 handle random_walk s with
10 return x → x
11 choose∀ y → if rand() < 0.0 then resume (fst y)
12 else resume (snd y)
```
The function random walk implements random walk; it takes the current coordinate x, chooses whether it stops, and, if it decides to continue, recursively calls itself with a new coordinate. In the definition, choose<sup>∀</sup> is used twice with different types: bool and int. Lines 11–12 give choose<sup>∀</sup> an interpretation, which calls rand to obtain a random float, <sup>3</sup> and returns either the first or the second element of y.

Typechecking of operation clauses could be extended in a straightforward manner. That is, an operation clause op(x) → M for an effect operation of signature ∀α.A → B would be typechecked as follows: first, α is locally bound in the clause and *x* is assigned type *A*; second, an argument of a resumption

<sup>3</sup> One might implement rand as another effect operation.

expression must have type *B* (which may contain type variable α); third, *M* must have the same type as that of the return clause (its type cannot contain α as α is local) under the assumption that resumption expressions have the same type as the return clause. For example, let us consider typechecking of the above operation clause for choose∀. First, the typechecking algorithm allocates a local type variable α and assigns type α × α to y. The body has two resumption expressions, and it is checked whether the arguments fst y and snd y have the codomain type α of the signature. Finally, it is checked whether the body is typed at int assuming that the resumption expressions have type int. The operation clause meets all the requirements, and, therefore, it would be well typed.

An obvious advantage of polymorphic effects is reusability. Without polymorphism, one has to declare many versions of choose for different types.

Another pleasant effect of polymorphic effects is that, thanks to parametricity, inappropriate implementations for an effect operation can be excluded. For example, it is not possible for an implementation of choose<sup>∀</sup> to resume with values other than the first or second element of y. In the monomorphic version, however, it is possible to resume with any integer, as opposed to what the name of the operation suggests. A similar argument applies to fail∀; since the codomain type is α, which does not appear in the domain type, it is not possible to resume! In other words, the signature ∀α. unit → α enforces that no invocation of fail<sup>∀</sup> will return.

#### **2.3 Problem in Naive Combination with Let-Polymorphism**

Although polymorphic effects and handlers provide an ability to abstract and restrict effect implementations, one may easily expect that their unrestricted use with naive *let-polymorphism*, which allows any let-bound expressions to be polymorphic, breaks type safety. Indeed, it does.

We develop a counterexample, inspired by Harper and Lillibridge [11], below.

effect get\_id : ∀α. unit → (α → α)

```
let f () : int =
 let g = #get_id() in (* g : ∀α. α → α *)
 if (g true) then ((g 0) + 1) else 2
```
The function f first binds g to the invocation result of op. The expression #get id() is given type α → α and the naive let-polymorphism would assign type scheme ∀α.α → α to g, which makes both g true and g 0 (and thus the definition of f) well typed.

An intended use of f is as follows:

```
handle f () with
 return x → x
 get_id y → resume (λz. z)
```
The operation clause for get id resumes with the identity function λz.z. It would be well typed under the typechecking procedure described in Sect. 2.2 and it safely returns 1.

However, the following strange expression

```
handle f () with
 return x → x
 get_id y → resume (λz1. (resume (λz2. z1)); z1)
```
will get stuck, although this expression would be well typed: both λz1. ··· ;z1 and λz2. z1 could be given type α → α by assigning both z1 and z2 type α, which is the type variable local to this clause. Let us see how the evaluation gets stuck in detail. When the handled expression f () invokes effect get id, the following continuation will be constructed:

```
c
 def
 = let g = [ ] in if (g true) then ((g 0) + 1) else 2 .
```
Next, the body of the operation clause get id is evaluated. It immediately resumes and reduces to

$$c'[(\lambda \mathbf{z1} . c'[(\lambda \mathbf{z2} . \mathbf{z1})]; \ \mathbf{z1})]$$

where

c def = handle c with return x → x get id y → resume (λz1. (resume (λz2.z1)); z1) ,

which is the continuation c under the same handler. The evaluation proceeds as follows (here, k def = λz1. c [(λz2.z1)]; z1):

```
c-

     [(λz1. c-

              [(λz2.z1)]; z1)]
= handle let g = k in if (g true) then ((g 0) + 1) else 2 with ...
−→ handle if (k true) then ((k 0) + 1) else 2 with ...
−→ handle if c-

                [(λz2.true)]; true then ((k 0) + 1) else 2 with ...
```
Here, the hole in c is filled by function (λz2.true), which returns a Boolean value, *though the hole is supposed to be filled by a function of* ∀α. α → α. This weird gap triggers a run-time error:

```
c
     [(λz2.true)]
    handle
 = let g = λz2.true in if (g true) then ((g 0) + 1) else 2
    with ...
−→∗ handle if true then (((λz2.true) 0) + 1) else 2 with ...
−→ handle ((λz2.true) 0) + 1 with ...
−→ handle true + 1 with ...
```
We stop here because true + 1 cannot reduce.

#### **2.4 Our Solution**

A standard approach to this problem is to restrict the form of let-bound expressions by some means such as the (relaxed) value restriction [10,23,24] or weak polymorphism [1,12]. This approach amounts to restricting how effect operations can be *used*.

In this paper, we seek for a complementary approach, which is to restrict how effect operations can be *implemented*. <sup>4</sup> More concretely, we develop a type system such that let-bound expressions are polymorphic as long as they invoke only "safe" polymorphic effects and the notion of safe polymorphic effects is formalized in terms of typing rules (for handlers).

To see what are "safe" effects, let us examine the above counterexample to type safety. The crux of the counterexample is that


The last point is crucial—if λz2.z1 were, say, λz2.z2, there would be no influence from the first invocation of c and the evaluation would succeed. The problem we see here is that the naive type system mistakenly allows *interference* between the arguments to the two resumptions by assuming that z1 and z2 share the same type.

Based on this observation, the typing rule for resumption is revised to disallow interference between different resumptions by separating their types: for each resume M in the operation clause for op : ∀α<sup>1</sup> ··· αn.A → B, M has to have type B obtained by renaming all type variables α<sup>i</sup> in B with *fresh* type variables α <sup>i</sup>. In the case of get id, the two resumptions should be called with β → β and γ → γ for fresh β and γ; for the first resume to be well typed, z1 has to be of type β, although it means that the return type of λz2.z1 (given to the second resumption) is β, making the entire clause ill typed, as we expect. If a clause does not have interfering resumptions like

$$\texttt{get.id }\texttt{y \rightarrow \texttt{resume}}\text{ (}\lambda\texttt{z1.z1)}$$

or

$$\begin{array}{rcl} \mathtt{get}.\mathtt{id} \ y & \rightarrow \mathtt{resume} \ \langle \lambda \mathtt{z1}.\ \langle \mathtt{resume} \ \langle \lambda \mathtt{z2}.\mathtt{z2} \rangle \rangle ; \ \mathtt{z1} \end{array}$$

it will be well typed.

<sup>4</sup> We compare our approach with the standard approaches in Sect. 5 in detail.

#### **3 Surface Language:** *λ***let eff**

We define a lambda calculus λlet eff that supports let-polymorphism, polymorphic algebraic effects, and handlers without interfering resumptions. This section introduces the syntax and the type system of λlet eff. The semantics is given by a formal elaboration to intermediate calculus λ<sup>Λ</sup> eff, which will be introduced in Sect. 4.


**Fig. 1.** Syntax of λlet eff .

#### **3.1 Syntax**

The syntax of λlet eff is given in Fig. 1. Effect operations are denoted by op and type variables by α, β, and γ. An effect, denoted by , is a finite set of effect operations. We write for the empty effect set. A type, denoted by *A*, *B*, *C* , and *D*, is a type variable; a base type ι, which includes, e.g., bool and int; or a function type *A* → *B*, which is given to functions that take an argument of type *A* and compute a value of type *B* possibly with effect . A type scheme σ is obtained by abstracting type variables. Terms, denoted by *M* , consist of variables; constants (including primitive operations); lambda abstractions λ*x* .*M* , which bind *x* in *M* ; function applications; let-expressions let *x* = *M*<sup>1</sup> in *M*2, which bind *x* in *M*2; effect invocations #op(*M* ); handle–with expressions handle *M* with *H* ; and resumption expressions resume *M* . All type information in λlet eff is implicit; thus the terms have no type annotations. A handler *H* has a single return clause return *x* → *M* , where *x* is bound in *M* , and zero or more operation clauses of the form op(*x* ) → *M* , where *x* is bound in *M* . A typing context Γ binds a sequence of variable declarations *x* : σ and type variable declarations α.

We introduce the following notations used throughout this paper. We write <sup>∀</sup> *<sup>α</sup>*<sup>i</sup>∈<sup>I</sup> .*<sup>A</sup>* for <sup>∀</sup> <sup>α</sup>1....<sup>∀</sup> <sup>α</sup>n.*<sup>A</sup>* where *<sup>I</sup>* <sup>=</sup> {1, ..., n}. We often omit indices (*<sup>i</sup>* and *j*) and index sets (*I* and *J* ) if they are not important: e.g., we often abbreviate <sup>∀</sup> *<sup>α</sup>*<sup>i</sup>∈<sup>I</sup> .*<sup>A</sup>* to <sup>∀</sup> *<sup>α</sup>*<sup>I</sup> .*<sup>A</sup>* or even to <sup>∀</sup> *<sup>α</sup>*.*A*. Similarly, we use a bold font for other sequences (*A*<sup>i</sup>∈<sup>I</sup> for a sequence of types, *v*<sup>i</sup>∈<sup>I</sup> for a sequence of values, etc.). We sometimes write {*α*} to view the sequence *α* as a set by ignoring the order. Free type variables *ftv*(σ) in a type scheme σ and type substitution *B*[*A*/*α*] of *A* for type variables *α* in *B* are defined as usual (with the understanding that the omitted index sets for *A* and *α* are the same).

We suppose that each constant *c* is assigned a first-order closed type *ty*(*c*) of the form ι<sup>1</sup> → · · · → ι<sup>n</sup> and that each effect operation op is assigned a signature of the form ∀*α*.*A* → *B*, which means that an invocation of op with type instantiation *C* takes an argument of *A*[*C* /*α*] and returns a value of *B*[*C* /*α*]. We also assume that, for *ty* (op) = ∀*α*.*A* → *B*, *ftv*(*A*) ⊆ {*α*} and *ftv*(*B*) ⊆ {*α*}.

#### **3.2 Type System**

The type system of λlet eff consists of four judgments: well-formedness of typing contexts Γ; well formedness of type schemes Γ σ; term typing judgment Γ; *R M* : *A* | , which means that *M* computes a value of *A* possibly with effect under typing context Γ and resumption type *R* (discussed below); and handler typing judgment Γ; *R H* : *A* | ⇒ *B* | , which means that *H* handles a computation that produces a value of *A* with effect and that the clauses in *H* compute a value of *B* possibly with effect under Γ and *R*.

A resumption type *R* contains type information for resumption.

**Definition 1 (Resumption type).** *Resumption types in* λlet eff*, denoted by R, are defined as follows:*

$$R ::= \mathsf{none} \mid (\alpha, A, B \to \epsilon \mid C)$$

$$(if \, ftv(A) \cup ftv(B) \subseteq \{\alpha\} \,\, and \, ftv(C) \cap \{\alpha\} = \emptyset)$$

If *M* is not a subterm of an operation clause, it is typechecked under *R* = none, which means that *M* cannot contain resumption expressions. Otherwise, suppose that *M* is a subterm of an operation clause op(*x* ) → *M* that handles effect op of signature ∀*α*.*A* → *B* and computes a value of *C* possibly with effect . Then, *M* is typechecked under *R* = (*α*, *x* : *A*, *B* → *C* ), which means that argument *x* to the operation clause has type *A* and that resumptions in *M* are effectful functions from *B* to *C* with effect . Note that type variables *α* occur free only in *A* and *B* but not in *C* .

Figure 2 shows the inference rules of the judgments (except for Γ σ, which is defined by: Γ σ if and only if all free type variables in σ are bound by Γ). For a sequence of type schemes *σ*, we write Γ *σ* if and only if every type scheme in *σ* is well formed under Γ.

Well-formedness rules for typing contexts, shown at the top of Fig. 2, are standard. A typing context is well formed if it is empty (WF Empty) or a variable in the typing context is associated with a type scheme that is well formed in the remaining typing context (WF Var) and a type variable in the typing context is not declared (WF TVar). For typing context Γ, *dom*(Γ) denotes the set of type and term variables declared in Γ.

$$\begin{array}{ccc} \begin{array}{c} \begin{array}{c} \begin{array}{c} \text{WF.\\_EMPTY} \end{array} \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \begin{array}{c} \vdash \Gamma \quad x \notin \\_dom(\Gamma) \quad \Gamma \vdash \sigma\\ \vdash \Gamma, x:\sigma \end{array} \end{array} \text{WF\\_VAR\\_1} \end{array} \text{WF\\_VAR\\_2}$$
 
$$\begin{array}{c} \begin{array}{c} \vdash \Gamma \quad \alpha \notin \\_dom(\Gamma) \end{array} \text{WF\\_TVAR\\_3} \end{array} \text{WF\\_TVAR\\_4}$$

**Fig. 2.** Typing rules.

Typing rules for terms are given in the middle of Fig. 2. The first six rules are standard for the lambda calculus with let-polymorphism and a type-and-effect system. If a variable *x* is introduced by a let-expression and has type scheme ∀ *α*.*A* in Γ, it is given type *A*[*B*/*α*], obtained by instantiating type variables *α* with well-formed types *B*. If *x* is bound by other constructors (e.g., a lambda abstraction), *x* is always bound to a monomorphic type and both *α* and *B* are the empty sequence. Note that (TS Var) gives any effect to the typing judgment for *x* . In general, in judgment Γ; *R M* : *A* | means that the evaluation of *M may* invoke effect operations in . Since a reference to a variable involves no effect, it is given any effect; for the same reason, value constructors are also given any effect. The rule (TS Const) means that the type of a constant is given by (meta-level) function *ty*. The typing rules for lambda abstractions and function applications are standard in the lambda calculus equipped with a typeand-effect system. The rule (TS Abs) gives lambda abstraction λ*x* .*M* function type *A* → *B* if *M* computes a value of *B* possibly with effect by using *x* of type *A*. The rule (TS App) requires that (1) the argument type of function part *M*<sup>1</sup> be equivalent to the type of actual argument *M*<sup>2</sup> and (2) effect invoked by function *M*<sup>1</sup> be contained in the whole effect . The rule (TS Weak) allows weakening of effects.

The next two rules are mostly standard for algebraic effects and handlers. The rule (TS Op) is applied to effect invocations. Since λlet eff supports implicit polymorphism, an invocation #op(*M* ) of polymorphic effect op of signature ∀*α*.*A* → *B* also accompanies implicit type substitution of well-formed types *C* for *α*. Thus, the type of argument *M* has to be *A*[*C* /*α*] and the result of the invocation is given type *B*[*C* /*α*]. In addition, effect contains op. The typeability of handle–with expressions depends on the typing of handlers (TS Handle), which will be explained below shortly.

The last typing rule (TS Resume) is the key to gaining type safety in this work. Suppose that we are given resumption type (*α*, *x* : *A*, *B* → *C* ). Intuitively, *B* → *C* is the type of the continuation for resumption and, therefore, argument *M* to resume is required to have type *B*. As we have discussed in Sect. 2, we avoid interference between different resumptions by renaming *α*, the type parameters to the effect operation, to fresh type variables *β*, in typechecking *M* . Freshness of *β* will be ensured when well-formedness of typing contexts Γ1, Γ2, *β*,... is checked at the leaves of the type derivation. The type variables *α* in the type of *x* , the parameter to the operation, are also renamed for *x* to be useful in *M* . To see why this renaming is useful, let us consider an extension of the calculus with pairs and typechecking of an operation clause for choose<sup>∀</sup> of signature ∀α.α × α → α:

$$\mathtt{choose}^\vee(x) \to \mathtt{presume}\left(\mathtt{fst}\,x\right),$$

Variable *x* is assigned product type α×α for fresh type variable α and the body resume (fst *x* ) is typechecked under the resumption type (α, *x* : α × α, α → *A*) for some and *A* (see the typing rules for handlers for details). To typecheck resume (fst *x* ), the argument fst *x* is required to have type β, freshly generated for this resume. Without applying renaming also to *x* , the clause would not typecheck. Finally, (TS Resume) also requires that (1) the typing context contains *α*, which should have been declared at an application of the typing rule for the operation clause that surrounds this resume and (2) effect , which may be invoked by resumption of a continuation, be contained in the whole effect . The binding *x* : *D* in the conclusion means that parameter *x* to the operation clause is declared outside the resumption expression.

The typing rules for handlers are standard [3,13,16]. The rule (THS Return) for a return clause return *x* → *M* checks that the body *M* is given a type under the assumption that argument *x* has type *A*, which is the type of the handled expression. The effect stands for effects that are not handled by the operation clauses that follow the return clause and it must be a subset of the effect that *M* may cause.<sup>5</sup> A handler having operation clauses is typechecked by (THS Op), which checks that the body of the operation clause op(*x* ) → *M* for op of signature ∀*α*.*C* → *D* is typed at the result type *B*, which is the same as the type of the return clause, under the typing context extended with fresh assigned type variables *α* and argument *x* of type *C* , together with the resumption type (*α*, *x* : *C* , *D* → *B*). The effect  {op} in the conclusion means that the effect operation op is handled by this clause and no other clauses (in the present handler) handle it. Our semantics adopts deep handlers [13], i.e., when a handled expression invokes an effect operation, the continuation, which passed to the operation clause, is wrapped by the same handler. Thus, resumption may invoke the same effect as the one possibly invoked by the clauses of the handler, hence *D* → *B* in the resumption type.

Finally, we show how the type system rejects the counterexample given in Sect. 2. The problem is in the following operation clause.

$$\mathsf{op}(y) \to \mathsf{presume}\,\lambda z\_1.(\mathsf{presume}\,\lambda z\_2.z\_1); z\_1$$

where op has effect signature ∀α.unit → (α → α). This clause is typechecked under resumption type (α, *y* : unit, α → α) for some . By (TS Resume), the two resumption expressions are assigned two different type variables γ<sup>1</sup> and γ2, and the arguments λ*z*1.(resume λ*z*2.*z*1); *z*<sup>1</sup> and λ*z*2.*z*<sup>1</sup> are required to have γ<sup>1</sup> → γ<sup>1</sup> and γ<sup>2</sup> → γ2, respectively. However, λ*z*2.*z*<sup>1</sup> cannot because *z*<sup>1</sup> is associated with γ<sup>1</sup> but not with γ2.

*Remark.* The rule (TS Resume) allows only the type of the argument to an operation clause to be renamed. Thus, other variables bound by, e.g., lambda abstractions and let-expressions outside the resumption expression cannot be used as such a type. As a result, more care may be required as to where to introduce a new variable. For example, let us consider the following operation clause (which is a variant of the example of choose<sup>∀</sup> above).

$$\mathsf{choose}^{\vee}(x) \to \mathsf{let}\,y = \mathsf{fst}\,x \,\mathsf{in}\,\mathsf{summe}\,y$$

The variable *x* is assigned α × α first and the resumption requires *y* to be typed at fresh type variable β. This clause would be rejected in the current type system

<sup>5</sup> Thus, handlers in λlet eff are open [13] in the sense that a handle–with expression does not have to handle *all* effects caused by the handled expression.

because fst *x* appears outside resume and, therefore, *y* is given type α, not β. This inconvenience may be addressed by moving down the let-binding in some cases: e.g., resume (let *y* = fst *x* in *y*) is well typed.

#### **4 Intermediate Language:** *λ<sup>Λ</sup>* **eff**

The semantics of λlet eff is given by a formal elaboration to an intermediate language λ<sup>Λ</sup> eff, wherein type abstraction and type application appear explicitly. We define the syntax, operational semantics, and type system of λlet eff and the formal elaboration from λlet eff to λ<sup>Λ</sup> eff. Finally, we show type safety of λlet eff via type preservation of the elaboration and type soundness of λ<sup>Λ</sup> eff.


**Fig. 3.** Syntax of λ<sup>Λ</sup> eff.

#### **4.1 Syntax**

The syntax of λ<sup>Λ</sup> eff is shown in Fig. 3. Values, denoted by *v*, consist of constants and lambda abstractions. Polymorphic values, denoted by *w*, are values abstracted over types. Terms, denoted by *e*, and handlers, denoted by *h*, are the same as those of λlet eff except for the following three points. First, type abstraction and type arguments are explicit in λ<sup>Λ</sup> eff: variables and effect invocations are accompanied by a sequence of types and let-bound expressions, resumption expressions, and operation clauses bind type variables. Second, a new term constructor of the form #op(*σ*,*w*,*E*) is added. It represents an intermediate state in which an effect invocation is capturing the continuation up to the closest handler for op. Here, *E* is an evaluation context [6] and denotes a continuation to be resumed by an operation clause handling op. In the operational semantics, an operation invocation #op(*A*, *v*) is first transformed to #op(*A*, *v*, [ ]) (where [ ] denotes the empty context or the identity continuation) and then it bubbles up by capturing its context and pushing it onto the third argument. Note that *σ* and *w* of #op(*σ*,*w*,*E*) become polymorphic when it bubbles up from the body of a type abstraction. Third, each resumption expression resume *α x* .*e* declares distinct (type) variables *α* and *x* to denote the (type) argument to an operation **Reduction rules** *e*<sup>1</sup> *e*<sup>2</sup> *c*<sup>1</sup> *c*<sup>2</sup> ζ(*c*1, *c*2) (R Const) (λ*x* .*e*) *v e*[*v*/*x* ] (R Beta) let *x* = Λ*α*.*v* in *e <sup>e</sup>*[Λ*α*.*v*/*<sup>x</sup>* ] (<sup>R</sup> Let) handle *<sup>v</sup>* with *<sup>h</sup> e*[*v*/*x* ] (R Return) (where *h*return = return *x e*) #op(*A*, *<sup>v</sup>*) - #op(*A*, *<sup>v</sup>*, [ ]) (<sup>R</sup> Op) #op(*σ*, *<sup>w</sup>*,*E*) *<sup>e</sup>*<sup>2</sup> - #op(*σ*, *<sup>w</sup>*,*E e*2) (<sup>R</sup> OpApp1) *<sup>v</sup>*<sup>1</sup> #op(*σ*, *<sup>w</sup>*,*E*) - #op(*σ*, *<sup>w</sup>*, *<sup>v</sup>*<sup>1</sup> *<sup>E</sup>*) (<sup>R</sup> OpApp2) #op- (*A<sup>I</sup>* , #op(*σ<sup>J</sup>* , *<sup>w</sup>*,*E*)) - #op(*σ<sup>J</sup>* , *<sup>w</sup>*, #op- (*A<sup>I</sup>* ,*E*)) (R OpOp) handle #op(*σ*, *<sup>w</sup>*,*E*) with *<sup>h</sup>* - #op(*σ*, *<sup>w</sup>*, handle*<sup>E</sup>* with *<sup>h</sup>*) (<sup>R</sup> OpHandle) (where op ∈ *ops*(*h*)) let *<sup>x</sup>* <sup>=</sup> <sup>Λ</sup>*α<sup>I</sup>* .#op(*σ<sup>J</sup>* , *<sup>w</sup>*,*E*) in *<sup>e</sup>*<sup>2</sup> - (R OpLet) #op(<sup>∀</sup> *<sup>α</sup><sup>I</sup> .σ<sup>J</sup>* , Λ*α<sup>I</sup>* .*w*, let *<sup>x</sup>* <sup>=</sup> <sup>Λ</sup>*α<sup>I</sup>* .*<sup>E</sup>* in *<sup>e</sup>*2) handle #op(<sup>∀</sup> *<sup>β</sup><sup>J</sup> .A<sup>I</sup>* , Λ*β<sup>J</sup>* .*v*,*<sup>E</sup> <sup>β</sup><sup>J</sup>* ) with *h* - *e*[handle*E <sup>β</sup><sup>J</sup>* with *h*/resume] <sup>∀</sup> *<sup>β</sup><sup>J</sup> .A<sup>I</sup>* <sup>Λ</sup>*β<sup>J</sup>* .*<sup>v</sup>* [*A<sup>I</sup>* [*⊥*/*β<sup>J</sup>* ]/*α<sup>I</sup>* ][*v*[*⊥*/*β<sup>J</sup>* ]/*<sup>x</sup>* ] (<sup>R</sup> Handle) (where *h*op = Λ*α<sup>I</sup>* .op(*x* ) *e*) **Evaluation rules** *e*<sup>1</sup> *e*<sup>2</sup> *e*<sup>1</sup> *e*<sup>2</sup> *<sup>E</sup>*[*e*1] *<sup>E</sup>*[*e*2] <sup>E</sup> Eval

**Fig. 4.** Semantics of λ<sup>Λ</sup> eff.

clause, whereas a single variable declared at op(*x* ) → *M* and implicit type variables are used for the same purpose in λlet eff. For example, the λlet eff operation clause choose∀(*x* ) → resume (fst *x* ) is translated to Λα.choose∀(*x* ) → resume β *y*.(fst *y*). This change simplifies the semantics.

Evaluation contexts, denoted by *E <sup>α</sup>* , are standard for the lambda calculus with call-by-value, left-to-right evaluation except for two points. First, they contain the form let *x* = Λ*α*.*E<sup>β</sup>* in *e*2, which allows the body of a type abstraction to be evaluated. Second, the metavariable *E* for evaluation contexts is indexed by type variables *α*, meaning that the hole in the context appears under type abstractions binding *α*. For example, let *x* = Λα.let *y* = Λβ.[ ] in *e*<sup>2</sup> in *e*<sup>1</sup> is denoted by *E* α,β and, more generally, let *x* = Λ*β*<sup>J</sup><sup>1</sup> .*E<sup>γ</sup>* <sup>J</sup><sup>2</sup> in *e* is denoted by *E<sup>β</sup>*J<sup>1</sup> ,*<sup>γ</sup>* <sup>J</sup><sup>2</sup> . (Here, *β*<sup>J</sup><sup>1</sup> , *γ*<sup>J</sup><sup>2</sup> stands for the concatenation of the two sequences *β*<sup>J</sup><sup>1</sup> and *γ*<sup>J</sup><sup>2</sup> .) If *α* is not important, we simply write *E* for *E <sup>α</sup>* . We often use the term "continuation" to mean "evaluation context," especially when it is expected to be resumed.

As usual, substitution *e*[*w*/*x* ] of *w* for *x* in *e* is defined in a capture-avoiding manner. Since variables come along with type arguments, the case for variables is defined as follows:

$$(x\,\mathcal{A})[A\alpha.v/x] \stackrel{\text{def}}{=} v[\mathcal{A}/\alpha]$$

Application of substitution [Λ*α*<sup>I</sup> .*v*/*<sup>x</sup>* ] to *<sup>x</sup> <sup>A</sup>*<sup>J</sup> , where *<sup>I</sup>* <sup>=</sup> *<sup>J</sup>* , is undefined. We define free type variables *ftv*(*e*) and *ftv*(*E*) in *e* and *E*, respectively, as usual.

#### **4.2 Semantics**

The semantics of λ<sup>Λ</sup> eff is given in the small-step style and consists of two relations: the reduction relation -, which is for basic computation, and the evaluation relation −→, which is for top-level execution. Figure 4 shows the rules for these relations. In what follows, we write *h*return for the return clause of handler *h*, *ops*(*h*) for the set of effect operations handled by *h*, and *h*op for the operation clause for op in *h*.

Most of the reduction rules are standard [13,16]. A constant application *c*<sup>1</sup> *c*<sup>2</sup> reduces to ζ(*c*1, *c*2) (R Const), where function ζ maps a pair of constants to another constant. A function application (λ*x* .*e*) *v* and a let-expression let *x* = Λ*α*.*v* in *e* reduce to *e*[*v*/*x* ] (R Beta) and *e*[Λ*α*.*v*/*x* ] (R Let), respectively. If a handled expression is a value *v*, the handle–with expression reduces to the body of the return clause where *v* is substituted for the parameter *x* (R Return). An effect invocation #op(*A*, *v*) reduces to #op(*A*, *v*, [ ]) with the identity continuation, as explained above (R Op); the process of capturing its evaluation context is expressed by the rules (R OpApp1), (R OpApp2), (R OpOp), (R OpHandle), and (R OpLet). The rule (R OpHandle) can be applied only if the handler *h* does *not* handle op. The rule (R OpLet) is applied to a let-expression where #op(*σ*<sup>J</sup> ,*w*,*E*) appears under a type abstraction with bound type variables *α*<sup>I</sup> . Since *σ*<sup>J</sup> and *w* may refer to *α*<sup>I</sup> , the reduction result binds *<sup>α</sup>*<sup>I</sup> in both *<sup>σ</sup>*<sup>J</sup> and *<sup>w</sup>*. We write <sup>∀</sup> *<sup>α</sup>*<sup>I</sup> *.σ*<sup>J</sup> for a sequence <sup>∀</sup> *<sup>α</sup>*<sup>I</sup> .σ<sup>j</sup><sup>1</sup> , ..., <sup>∀</sup> *<sup>α</sup>*<sup>I</sup> .σ<sup>j</sup>*<sup>n</sup>* of type schemes (where *<sup>J</sup>* <sup>=</sup> {j1,...,jn}).

The crux of the semantics is (R Handle): it is applied when #op(*σ*<sup>I</sup> ,*w*,*E*) reaches the handler *h* that handles op. Since the handled term #op(*σ*<sup>I</sup> ,*w*,*E*) is constructed from an effect invocation #op(*A*<sup>I</sup> , *v*), if the captured continuation *E* binds type variables *β*<sup>J</sup> , the same type variables *β*<sup>J</sup> should have been added to *A*<sup>I</sup> and *v* along the capture. Thus, the handled expression on the left-hand side of the rule takes the form #op(<sup>∀</sup> *<sup>β</sup>*<sup>J</sup> *.A*<sup>I</sup> , Λ*β*<sup>J</sup> .*v*,*E<sup>β</sup>*<sup>J</sup> ) (with the same type variables *β*<sup>J</sup> ).

The right-hand side of (R Handle) involves three types of substitution: continuation substitution [handle*E<sup>β</sup>*<sup>J</sup> with *h*/resume] <sup>∀</sup> *<sup>β</sup>*<sup>J</sup> *.A*<sup>I</sup> <sup>Λ</sup>*β*<sup>J</sup> .<sup>v</sup> for resumptions, type substitution for *α*<sup>I</sup> , and value substitution for *x* . We explain them one by one below. In the following, let *<sup>h</sup>*op <sup>=</sup> <sup>Λ</sup>*α*<sup>I</sup> .op(*<sup>x</sup>* ) <sup>→</sup> *<sup>e</sup>* and *<sup>E</sup>β*<sup>J</sup> = handle*E<sup>β</sup>*<sup>J</sup> with *h*. *Continuation Substitution.* Let us start with a simple case where the sequence *β*<sup>J</sup> is empty. Intuitively, continuation substitution [*E* /resume] *A*<sup>I</sup> <sup>v</sup> replaces a resumption expression resume *γ*<sup>I</sup> *z* .*e* in the body *e* with *E* [*v* ], where *v* is the value of *e* , and substitutes *A*<sup>I</sup> and *v* (arguments to the invocation of op) for *γ*<sup>I</sup> and *z* , respectively. Therefore, assuming resume does not appear in *e* , we define (resume *γ*<sup>I</sup> *z* .*e* )[*E* /resume] *A*I <sup>v</sup> to be let *y* = *e* [*A*<sup>I</sup> /*γ*<sup>I</sup> ][*v*/*z* ] in*E* [*y*] (for fresh *y*). Note that the evaluation of *e* takes place outside of *E* so that an invocation of an effect in *e* is *not* handled by handlers in *E*. When *β*<sup>J</sup> is not empty,

$$(\mathtt{resume}\,\gamma^{I}\,\boldsymbol{z}.e')[E^{\beta^{J}}/\mathtt{resume}]\_{A\beta^{J}\boldsymbol{\cdot}.v}^{\forall\beta^{J}.\mathcal{A}^{I}}\stackrel{\text{def}}{=}$$

$$\mathtt{let}\,\boldsymbol{y}=A\boldsymbol{\beta}^{J}.e'[\boldsymbol{A}^{I}/\gamma^{I}][\boldsymbol{v}/\boldsymbol{z}]\,\boldsymbol{\it}\,\boldsymbol{E}^{\boldsymbol{\beta}^{J}}[\boldsymbol{y}|\boldsymbol{\beta}^{\boldsymbol{J}}]\,\boldsymbol{\cdot}$$

(The differences from the simple case are shaded.) The idea is to bind *β*<sup>J</sup> that appear free in *A*<sup>I</sup> and *v* by type abstraction at let and to instantiate with the same variables at *y β*<sup>J</sup> , where *β*<sup>J</sup> are bound by type abstractions in *E<sup>β</sup>*<sup>J</sup> .

Continuation substitution is formally defined as follows:

**Definition 2 (Continuation substitution).** *Substitution of continuation E<sup>β</sup>*<sup>J</sup> *for resumptions in e, written e*[*E<sup>β</sup>*<sup>J</sup> /resume] <sup>∀</sup> *<sup>β</sup>*<sup>J</sup> *.A*<sup>I</sup> <sup>Λ</sup>*β*<sup>J</sup> .<sup>v</sup> *, is defined in a captureavoiding manner, as follows (we describe only the important cases):*

(resume *γ*<sup>I</sup> *z* .*e*)[*E<sup>β</sup>*<sup>J</sup> /resume] <sup>∀</sup> *<sup>β</sup>*<sup>J</sup> *.A*<sup>I</sup> Λ*β*<sup>J</sup> .v def = let *y* = Λ*β*<sup>J</sup> .*e*[*E<sup>β</sup>*<sup>J</sup> /resume] <sup>∀</sup> *<sup>β</sup>*<sup>J</sup> *.A*<sup>I</sup> <sup>Λ</sup>*β*<sup>J</sup> .<sup>v</sup> [*A*<sup>I</sup> /*γ*<sup>I</sup> ][*v*/*<sup>z</sup>* ] in*E<sup>β</sup>*<sup>J</sup> [*y β*<sup>J</sup> ] *(if* (*ftv*(*e*) <sup>∪</sup> *ftv*(*E<sup>β</sup>*<sup>J</sup> )) ∩ {*β*<sup>J</sup> } <sup>=</sup> <sup>∅</sup> *and y is fresh)* (return *x* → *e*)[*E*/resume] *σ* w def = return *x* → *e*[*E*/resume] *σ* w (*h* ;Λ*γ*<sup>J</sup> .op(*<sup>x</sup>* ) <sup>→</sup> *<sup>e</sup>*)[*E*/resume] *σ*I w def = *h* [*E*/resume] *σ*I <sup>w</sup> ;Λ*γ*<sup>J</sup> .op(*<sup>x</sup>* ) <sup>→</sup> *<sup>e</sup>*

The second and third clauses (for a handler) mean that continuation substitution is applied only to return clauses.

*Type and Value Substitution.* The type and value substitutions *<sup>A</sup>*<sup>I</sup> [*⊥*<sup>J</sup> /*β*<sup>J</sup> ] and *v*[*⊥*<sup>J</sup> /*β*<sup>J</sup> ], respectively, in (R Handle) are for (type) parameters in *h*op = <sup>Λ</sup>*α*<sup>I</sup> .op(*<sup>x</sup>* ) <sup>→</sup> *<sup>e</sup>*. The basic idea is to substitute *<sup>A</sup>*<sup>I</sup> for *<sup>β</sup>*<sup>I</sup> and *<sup>v</sup>* for *<sup>x</sup>*—similarly to continuation substitution. We erase free type variables *β*<sup>J</sup> in *A*<sup>I</sup> and *v* by substituting the designated base type <sup>⊥</sup> for all of them. (We write *<sup>A</sup>*<sup>I</sup> [*⊥*<sup>J</sup> /*β*<sup>J</sup> ] and *v*[*⊥*<sup>J</sup> /*β*<sup>J</sup> ] for the types and value, respectively, after the erasure.)

The evaluation rule is ordinary: Evaluation of a term proceeds by reducing a subterm under an evaluation context.

#### **4.3 Type System**

The type system of λ<sup>Λ</sup> eff is similar to that of λlet eff and has five judgments: wellformedness of typing contexts Γ; well formedness of type schemes Γ σ; term typing judgment Γ; *r e* : *A* | ; handler typing judgment Γ; *r h* : *A* | ⇒ *B* | ; and continuation typing judgment Γ *E* : ∀ *α*.*A B* | . The first two are defined in the same way as those of λlet eff. The last judgment means that a term obtained by filling the hole of *E* with a term having *A* under Γ, *α* is typed at *B* under Γ and possibly involves effect . A resumption type *r* is similar to *R* but does not contain an argument variable.

**Definition 3 (Resumption type).** *Resumption types in* λ<sup>Λ</sup> eff*, denoted by r , are defined as follows:*

$$\begin{array}{c} r ::= \mathsf{none} \mid (\alpha, A, B \to \epsilon \mid C) \\ (if \, ftv(A) \cup ftv(B) \subseteq \{\alpha\} \,\, and \, ftv(C) \cap \{\alpha\} = \emptyset) \end{array}$$

**Typing rules** Γ; *r e* : *A* | Γ *x* : ∀ *α*.*A* ∈ Γ Γ *B* <sup>Γ</sup>; *<sup>r</sup> <sup>x</sup> <sup>B</sup>* : *<sup>A</sup>*[*B*/*α*] <sup>|</sup> <sup>T</sup> Var Γ <sup>Γ</sup>; *<sup>r</sup> <sup>c</sup>* : *ty*(*c*) <sup>|</sup> <sup>T</sup> Const Γ, *x* : *A*; *r e* : *B* | - Γ; *r* λ*x* .*e* : *A* →- *<sup>B</sup>* <sup>|</sup> <sup>T</sup> Abs Γ; *r e*<sup>1</sup> : *A* → - *B* | Γ; *r e*<sup>2</sup> : *A* | - ⊆ <sup>Γ</sup>; *<sup>r</sup> <sup>e</sup>*<sup>1</sup> *<sup>e</sup>*<sup>2</sup> : *<sup>B</sup>* <sup>|</sup> <sup>T</sup> App *ty* (op) = ∀*α*.*A* → *B* op ∈ Γ; *r e* : *A*[*C* /*α*] | Γ *C* <sup>Γ</sup>; *<sup>r</sup>* #op(*<sup>C</sup>* , *<sup>e</sup>*) : *<sup>B</sup>*[*<sup>C</sup>* /*α*] <sup>|</sup> <sup>T</sup> Op *ty* (op) = <sup>∀</sup>*α<sup>I</sup>* .*<sup>A</sup>* <sup>→</sup> *<sup>B</sup>* op <sup>∈</sup> Γ ∀ *<sup>β</sup><sup>J</sup> .<sup>C</sup> <sup>I</sup>* Γ, *<sup>β</sup><sup>J</sup>* ; *<sup>r</sup> <sup>v</sup>* : *<sup>A</sup>*[*<sup>C</sup> <sup>I</sup>* /*α<sup>I</sup>* ] <sup>|</sup> Γ *<sup>E</sup> <sup>β</sup><sup>J</sup>* : <sup>∀</sup> *<sup>β</sup><sup>J</sup>* .(*B*[*<sup>C</sup> <sup>I</sup>* /*α<sup>I</sup>* ]) *<sup>D</sup>* <sup>|</sup> <sup>Γ</sup>; *<sup>r</sup>* #op(<sup>∀</sup> *<sup>β</sup><sup>J</sup> .<sup>C</sup> <sup>I</sup>* , Λ*β<sup>J</sup>* .*v*,*<sup>E</sup> <sup>β</sup><sup>J</sup>* ) : *<sup>D</sup>* <sup>|</sup> <sup>T</sup> OpCont Γ; *r e* : *A* | - - ⊆ <sup>Γ</sup>; *<sup>r</sup> <sup>e</sup>* : *<sup>A</sup>* <sup>|</sup> <sup>T</sup> Weak Γ; *r e* : *A* | Γ; *r h* : *A* | ⇒ *B* | - Γ; *r* handle *e* with *h* : *B* | - T Handle Γ, *α*; *r e*<sup>1</sup> : *A* | Γ, *x* : ∀ *α*.*A*; *r e*<sup>2</sup> : *B* | <sup>Γ</sup>; *<sup>r</sup>* let *<sup>x</sup>* <sup>=</sup> <sup>Λ</sup>*α*.*e*<sup>1</sup> in *<sup>e</sup>*<sup>2</sup> : *<sup>B</sup>* <sup>|</sup> <sup>T</sup> Let *α* ∈ Γ Γ, *β*, *x* : *A*[*β*/*α*]; (*α*, *A*, *B* → *C*) *e* : *B*[*β*/*α*] | - ⊆ - Γ; (*α*, *A*, *B C*) resume *β x* .*e* : *C* | -T Resume

**Fig. 5.** Typing rules for terms in λ<sup>Λ</sup> eff.

The typing rules for terms, shown in Fig. 5, and handlers, shown in the upper half of Fig. 6, are similar to those of λlet eff except for a new rule (T OpCont), which is applied to an effect invocation #op(<sup>∀</sup> *<sup>β</sup>*<sup>J</sup> *.<sup>C</sup>* <sup>I</sup> , Λ*β*<sup>J</sup> .*v*,*Eβ*<sup>J</sup> ) with a continuation. Let *ty* (op) = <sup>∀</sup>*α*<sup>I</sup> .*<sup>A</sup>* <sup>→</sup> *<sup>B</sup>*. Since op should have been invoked with *C* <sup>I</sup> and *v* under type abstractions with bound type variables *β*<sup>J</sup> , the argument *v* has type *A*[*C* <sup>I</sup> /*α*<sup>I</sup> ] under the typing context extended with *β*<sup>J</sup> . Similarly, the hole of *Eβ*<sup>J</sup> expects to be filled with the result of the invocation, i.e., a value of *B*[*C* <sup>I</sup> /*α*<sup>I</sup> ]. Since the continuation denotes the context before the evaluation, its result type matches with the type of the whole term.

The typing rules for continuations are shown in the lower half of Fig. 6. They are similar to the corresponding typing rules for terms except that a subterm is replaced with a continuation. In (TE Let), the continuation let *x* = Λ*α*.*E* in *e* has type ∀ *α*.σ *B* because the hole of *E* appears inside the scope of *α*.

Γ; *r h* : *A* | ⇒ *B* | - Γ, *x* : *A*; *r e* : *B* | - ⊆ - Γ; *r* return *x e* : *A* | ⇒ *B* | - TH Return Γ; *r h* : *A* | ⇒ *B* | - *ty* (op) = ∀*α*.*C* → *D* Γ, *α*, *x* : *C*; (*α*, *C*, *D* - *B*) *e* : *B* | - Γ; *r h*; Λ*α*.op(*x* ) *e* : *A* | {op} ⇒ *B* | - TH Op Γ *E* : σ *A* | <sup>Γ</sup> []: *<sup>A</sup> <sup>A</sup>* <sup>|</sup> TE Hole Γ *E* : σ (*A* - *B*) | Γ; none *e*<sup>2</sup> : *A* | - ⊆ <sup>Γ</sup> *E e*<sup>2</sup> : <sup>σ</sup> *<sup>B</sup>* <sup>|</sup> TE App1 Γ; none *v*<sup>1</sup> : (*A* - *B*) | Γ *E* : σ *A* | - ⊆ <sup>Γ</sup> *<sup>v</sup>*<sup>1</sup> *<sup>E</sup>* : <sup>σ</sup> *<sup>B</sup>* <sup>|</sup> TE App2 *ty* (op) = ∀*α*.*A B* op ∈ Γ *E* : σ *A*[*C* /*α*] | Γ *C* <sup>Γ</sup> #op(*<sup>C</sup>* ,*E*) : <sup>σ</sup> *<sup>B</sup>*[*<sup>C</sup>* /*α*] <sup>|</sup> TE Op Γ *E* : σ *A* | Γ; none *h* : *A* | ⇒ *B* | - Γ handle*E* with *h* : σ *B* | - TE Handle Γ *E* : σ *A* | - - ⊆ <sup>Γ</sup> *<sup>E</sup>* : <sup>σ</sup> *<sup>A</sup>* <sup>|</sup> TE Weak Γ, *α E* : σ *A* | Γ, *x* : ∀ *α*.*A*; none *e* : *B* | <sup>Γ</sup> let *<sup>x</sup>* <sup>=</sup> <sup>Λ</sup>*α*.*<sup>E</sup>* in *<sup>e</sup>* : <sup>∀</sup> *<sup>α</sup>*.σ *<sup>B</sup>* <sup>|</sup> TE Let

**Fig. 6.** Typing rules for handlers and continuations in λ<sup>Λ</sup> eff.

#### **4.4 Elaboration**

This section defines the elaboration from λlet eff to λ<sup>Λ</sup> eff. The important difference between the two languages from the viewpoint of elaboration is that, whereas the parameter of an operation clause is referred to by a single variable in λlet eff, it is done by one or more variables in λ<sup>Λ</sup> eff. Therefore, one variable in λlet eff is represented by multiple variables (required for each resume) in λ<sup>Λ</sup> eff. We use *S*, a mapping from variables to variables, to make the correspondence between variable names. We write *S* ◦ {*x* →*y*} for the same mapping as *S* except that *x* is mapped to *y*.

Elaboration is defined by two judgments: term elaboration judgment Γ; *R <sup>M</sup>* : *<sup>A</sup>* <sup>|</sup> <sup>S</sup> *<sup>e</sup>*, which denotes elaboration from a typing derivation of judgment Γ; *R M* : *A* | to *e* with *S*, and handler elaboration judgment Γ; *R <sup>H</sup>* : *<sup>A</sup>* <sup>|</sup> <sup>⇒</sup> *<sup>B</sup>* <sup>|</sup> <sup>S</sup> *<sup>h</sup>*, which denotes elaboration from a typing derivation of judgment Γ; *R H* : *A* | ⇒ *B* | to *h* with *S*.

$$\begin{array}{lcl} \text{Term} & \text{alaboration} \text{ runs} & \left[\begin{array}{lcl} T; R \vdash M \mathrel{.} A \ \vert \ e \vDash \rangle \end{array} \epsilon \right] \\ & & \left[\begin{array}{lcl} T; x; \forall \alpha. A \in F & \Gamma \vdash B \text{ } \text{ELAB}\_{.} \text{VAR} \\\hline \end{array} \\ & & \left[\begin{array}{lcl} T, x; A; B \vdash M \mathrel{.} A \ \vert \ e \vDash \rangle \end{array} \epsilon \right] \in \begin{array}{lcl} \text{E} & \text{E} \text{AB}\_{.} \text{VAR} \\\hline \end{array} \\ & & \left[\begin{array}{lcl} T, x; A; B \vdash M \mathrel{.} A \ \vert \ e \vDash \rangle \end{array} \epsilon \in \begin{array}{lcl} \varepsilon \circ \epsilon \circ \epsilon \end{array} \epsilon \end{array} \epsilon \end{array} \text{EAB}\_{.} \text{ABS} \\ & & \left[\begin{array}{lcl} T, x; A \vdash B \vdash A \land B \vdash A \mathrel{.} A \ \vert \ e \vDash \rangle \end{array} \epsilon \right] \in \begin{array}{lcl} \varepsilon \circ \epsilon \circ \epsilon \end{array} \epsilon \end{array} \text{EAB}\_{.} \text{AVAB}\_{.} \text{AvBE} \\ & & \left[\begin{array}{lcl} T, x; B \vdash M \mathrel{.} A \ \vert \ e \vDash \rangle \end{array} \epsilon \in \begin{array}{lcl} \varepsilon \circ \epsilon \circ \epsilon \end{array} \epsilon \end{array} \begin{array}{lcl} \text{EAB}\_{.} \text{AVAB}\_{.} \text{AvBE} \\ & \text{EAB}\_{.} \text{AVAB}\_{.} \text{EAB}\_{.} \text{VAB}\_{.} \text{EAB}$$

**Fig. 7.** Elaboration rules (excerpt).

Selected elaboration rules are shown in Fig. 7; the complete set of the rules is found in the full version of the paper. The elaboration rules are straightforward except for the use of *S*. A variable *x* is translated to *S*(*x* ) (Elab Var) and, every time a new variable is introduced, *S* is extended: see the rules other than (Elab Var) and (Elab Handle).

#### **4.5 Properties**

We show type safety of λlet eff, i.e., a well-typed program in λlet eff does not get stuck, by proving (1) type preservation of the elaboration from λlet eff to λ<sup>Λ</sup> eff and (2) type soundness of λ<sup>Λ</sup> eff. Term *M* is a well-typed program of *A* if and only if ∅; none *M* : *A* | .

The first can be shown easily. We write ∅ also for the identity mapping for variables.

**Theorem 1 (Elaboration is type-preserving).** *If M is a well-typed program of A, then* ∅; none *M* : *A* | <sup>∅</sup> *e and* ∅; none *e* : *A* | *for some e.*

We show the second—type soundness of λ<sup>Λ</sup> eff—via progress and subject reduction [25]. We write Δ for a typing context that consists only of type variables. Progress can be shown as usual.

**Lemma 1 (Progress).** *If* Δ; none *e* : *A* | *, then (1) e* −→ *e for some e , (2) e is a value, or (3) e* = #op(*σ*,*w*,*E*) *for some* op ∈ *, σ, w , and E .*

A key lemma to show subject reduction is type preservation of continuation substitution.

**Lemma 2 (Continuation substitution).** *Suppose that* <sup>Γ</sup> ∀ *<sup>β</sup>*<sup>J</sup> *.<sup>C</sup>* <sup>I</sup> *and* <sup>Γ</sup> *<sup>E</sup><sup>β</sup>*<sup>J</sup> : <sup>∀</sup> *<sup>β</sup>*<sup>J</sup> .(*B*[*<sup>C</sup>* <sup>I</sup> /*α*<sup>I</sup> ]) *<sup>D</sup>* <sup>|</sup> *and* Γ, *<sup>β</sup>*<sup>J</sup> *<sup>v</sup>* : *<sup>A</sup>*[*<sup>C</sup>* <sup>I</sup> /*α*<sup>I</sup> ]*.*


Using the continuation substitution lemma as well as other lemmas, we show subject reduction.

#### **Lemma 3 (Subject reduction)**


We write *e* −→ if and only if *e* cannot evaluate further. Moreover, −→<sup>∗</sup> denotes the reflexive and transitive closure of the evaluation relation −→.

**Theorem 2 (Type soundness of** λ<sup>Λ</sup> **eff ).** *If* Δ; none *e* : *A* | *and e* −→<sup>∗</sup> *e and e* −→ *, then (1) e is a value or (2) e* = #op(*σ*,*w*,*E*) *for some* op ∈ *, σ, w , and E .*

Now, type safety of λlet eff is obtained as a corollary of Theorems 1 and 2.

**Corollary 1 (Type safety of** λ**let eff ).** *If M is a well-typed program of A, there exists some e such that* ∅; none *M* : *A* | <sup>∅</sup> *e and e does not get stuck.*

# **5 Related Work**

#### **5.1 Polymorphic Effects and Let-Polymorphism**

Many researchers have attacked the problem of combining effects—not necessarily algebraic—and let-polymorphism so far [1,2,10,12,14,17,23,24]. In particular, most of them have focused on ML-style polymorphic references. The algebraic effect handlers dealt with in this paper seem to be unable to implement general ML-style references—i.e., give an appropriate implementation to a set of effect operations new with the signature ∀α.α → α ref, get with ∀α.α ref → α, and put with ∀α.α × α ref → unit for abstract datatype α ref—even without the restriction on handlers because each operation clause in a handler assigns type variables locally and it is impossible to share such type variables between operation clauses.<sup>6</sup> Nevertheless, their approaches would be applicable to algebraic effects and handlers.

A common idea in the literature is to restrict the form of expressions bound by polymorphic let. Thus, they are complementary to our approach in that they restrict how effect operations are used whereas we restrict how effect operations are implemented.

Value restriction [23,24], a standard way adopted in ML-like languages, restricts polymorphic let-bound expressions to syntactic values. Garrigue [10] relaxes the value restriction so that, if a let-bound expression is not a syntactic value, type variables that appear only at positive positions in the type of the expression can be generalized. Although the (relaxed) value restriction is a quite clear criterion that indicates what let-bound expressions can be polymorphic safely and it even accepts interfering handlers, it is too restrictive in some cases. We give an example for such a case below.

```
effect choose∀ : ∀α. α × α → α
let f1 () =
 let g = #choose∀(fst, snd) in
 if g (true,false) then g (-1,1) else g (1,-1)
```
<sup>6</sup> One possible approach to dealing with ML-style references is to extend algebraic effects and handlers so that a handler for *parameterized* effects can be connected with dynamic resources [3].

In the definition of function f1, variable g is used polymorphically. Execution of this function under an appropriate handler would succeed, and in fact our calculus accepts it. By contrast, the (relaxed) value restriction rejects it because the let-bound expression #choose∀(fst,snd) is not a syntactic value and the type variable appear in both positive and negative positions, and so g is assigned a monomorphic type. A workaround for this problem is to make a function wrapper that calls either of fst or snd depending on the Boolean value chosen by choose∀:

```
let f2 () =
 let b = #choose∀(true,false) in
 let g = λx. if b then (fst x) else (snd x) in
 if g (true,false) then g (-1,1) else g (1,-1)
```
However, this workaround makes the program complicated and incurs additional run-time cost for the branching and an extra call to the wrapper function.

Asai and Kameyama [2] study a combination of let-polymorphism with delimited control operators shift/reset [4]. They allow a let-bound expression to be polymorphic if it invokes no control operation. Thus, the function f1 above would be rejected in their approach.

Another research line to restrict the use of effects is to allow only type variables unrelated to effect invocations to be generalized. Tofte [23] distinguishes between applicative type variables, which cannot be used for effect invocations, and imperative ones, which can be used, and proposes a type system that enforces restrictions that (1) type variables of imperative operations can be instantiated only with types wherein all type variables are imperative and (2) if a let-bound expression is not a syntactic value, only applicative type variables can be generalized. Leroy and Weis [17] allow generalization only of type variables that do not appear in a parameter type to the reference type in the type of a let-expression. To detect the hidden use of references, their type system gives a term not only a type but also the types of free variables used in the term. Standard ML of New Jersey (before ML97) adopted weak polymorphism [1], which was later formalized and investigated deeply by Hoang et al. [12]. Weak polymorphism equips a type variable with the number of function calls after which a value of a type containing the type variable will be passed to an imperative operation. The type system ensures that type variables with positive numbers are not related to imperative constructs, and so such type variables can be generalized safely. In this line of research, the function f1 above would not typecheck because generalized type variables are used to instantiate those of the effect signature, although it could be rewritten to an acceptable one by taking care not to involve type variables in effect invocation.

```
let f3 () =
 let g = if #choose∀(true,false) then fst then snd in
 if g (true,false) then g (-1,1) else g (1,-1)
```
More recently, Kammar and Pretnar [14] show that *parameterized* algebraic effects and handlers do not need the value restriction *if* the type variables used in an effect invocation are not generalized. Thus, as the other work that restricts generalized type variables, their approach would reject function f1 but would accept f3.

#### **5.2 Algebraic Effects and Handlers**

Algebraic effects [20] are a way to represent the denotation of an effect by giving a set of operations and an equational theory that capture their properties. Algebraic effect handlers, introduced by Plotkin and Pretnar [21], make it possible to provide user-defined effects. Algebraic effect handlers have been gaining popularity owing to their flexibility and have been made available as libraries [13,15,26] or as primitive features of languages, such as Eff [3], Koka [16], Frank [18], and Multicore OCaml [5]. In these languages, let-bound expressions that can be polymorphic are restricted to values or pure expressions.

Recently, Forster et al. [9] investigate the relationships between algebraic effect handlers and other mechanisms for user-defined effects—delimited control shift0 [19] and monadic reflection [7,8]—conjecturing that there would be no type-preserving translation from a language with delimited control or monadic reflection to one with algebraic effect handlers. It would be an interesting direction to export our idea to delimited control and monadic reflection.

### **6 Conclusion**

There has been a long history of collaboration between effects and letpolymorphism. This work focuses on polymorphic algebraic effects and handlers, wherein the type signature of an effect operation can be polymorphic and an operation clause has a type binder, and shows that a naive combination of polymorphic effects and let-polymorphism breaks type safety. Our novel observation to address this problem is that any let-bound expression can be polymorphic safely if resumptions from a handler do not interfere with each other. We formalized this idea by developing a type system that requires the argument of each resumption expression to have a type obtained by renaming the type variables assigned in the operation clause to those assigned in the resumption. We have proven that a well-typed program in our type system does not get stuck via elaboration to an intermediate language wherein type information appears explicitly.

There are many directions for future work. The first is to address the problem, described at the end of Sect. 3, that renaming the type variables assigned in an operation clause to those assigned in a resumption expression is allowed for the argument of the clause but not for variables bound by lambda abstractions and let-expressions outside the resumption expression. Second, we are interested in incorporating other features from the literature on algebraic effect handlers, such as dynamic resources [3] and parameterized algebraic effects, and restriction techniques that have been developed for type-safe imperative programming with let-polymorphism such as (relaxed) value restriction [10,23,24]. For example, we would like to develop a type system that enforces the non-interfering restriction only to handlers implementing effect operations invoked in polymorphic computation. We also expect that it is possible to determine whether implementations of an effect operation have no interfering resumption from the type signature of the operation, as relaxed value restriction makes it possible to find safely generalizable type variables from the type of a let-bound expression [10]. Finally, we are also interested in implementing our idea for a language with effect handlers such as Koka [16] and in applying the idea of analyzing handlers to other settings such as dependent typing.

**Acknowledgments.** We would like to thank the anonymous reviewers for their valuable comments. This work was supported in part by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST (Sekiyama), and JSPS KAKENHI Grant Number JP15H05706 (Igarashi).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Distributive Disjoint Polymorphism for Compositional Programming**

Xuan Bi1(B), Ningning Xie1, Bruno C. d. S. Oliveira1, and Tom Schrijvers<sup>2</sup>

<sup>1</sup> The University of Hong Kong, Hong Kong, China {xbi,nnxie,bruno}@cs.hku.hk <sup>2</sup> KU Leuven, Leuven, Belgium tom.schrijvers@cs.kuleuven.be

**Abstract.** Popular programming techniques such as *shallow embeddings* of Domain Specific Languages (DSLs), *finally tagless* or *object algebras* are built on the principle of *compositionality*. However, existing programming languages only support simple compositional designs well, and have limited support for more sophisticated ones.

This paper presents the F<sup>+</sup> i calculus, which supports highly modular and compositional designs that improve on existing techniques. These improvements are due to the combination of three features: *disjoint intersection types* with a *merge operator*; *parametric (disjoint) polymorphism*; and *BCD-style distributive subtyping*. The main technical challenge is F<sup>+</sup> i 's proof of coherence. A naive adaptation of ideas used in System F's *parametricity* to *canonicity* (the logical relation used by F<sup>+</sup> i to prove coherence) results in an ill-founded logical relation. To solve the problem our canonicity relation employs a different technique based on immediate substitutions and a restriction to predicative instantiations. Besides coherence, we show several other important meta-theoretical results, such as type-safety, sound and complete algorithmic subtyping, and decidability of the type system. Remarkably, unlike <sup>F</sup><:'s *bounded polymorphism*, disjoint polymorphism in F<sup>+</sup> i supports decidable type-checking.

# **1 Introduction**

Compositionality is a desirable property in programming designs. Broadly defined, it is the principle that a system should be built by composing smaller subsystems. For instance, in the area of programming languages, compositionality is a key aspect of *denotational semantics* [48,49], where the denotation of a program is constructed from the denotations of its parts. Compositional definitions have many benefits. One is ease of reasoning: since compositional definitions are recursively defined over smaller elements they can typically be reasoned about using induction. Another benefit is that compositional definitions are easy to extend, without modifying previous definitions.

Programming techniques that support compositional definitions include: *shallow embeddings* of Domain Specific Languages (DSLs) [20], *finally tagless* [11], *polymorphic embeddings* [26] or *object algebras* [35]. These techniques c The Author(s) 2019

allow us to create compositional definitions, which are easy to extend without modifications. Moreover, when modeling semantics, both finally tagless and object algebras support *multiple interpretations* (or denotations) of syntax, thus offering a solution to the well-known *Expression Problem* [53]. Because of these benefits these techniques have become popular both in the functional and objectoriented programming communities.

However, programming languages often only support simple compositional designs well, while support for more sophisticated compositional designs is lacking. For instance, once we have multiple interpretations of syntax, we may wish to compose them. Particularly useful is a *merge* combinator, which composes two interpretations [35,37,42] to form a new interpretation that, when executed, returns the results of both interpretations.

The merge combinator can be manually defined in existing programming languages, and be used in combination with techniques such as finally tagless or object algebras. Moreover variants of the merge combinator are useful to model more complex combinations of interpretations. A good example are so-called *dependent* interpretations, where an interpretation does not depend *only* on itself, but also on a different interpretation. These definitions with dependencies are quite common in practice, and, although they are not orthogonal to the interpretation they depend on, we would like to model them (and also mutually dependent interpretations) in a modular and compositional style.

Defining the merge combinator in existing programming languages is verbose and cumbersome, requiring code for every new kind of syntax. Yet, that code is essentially mechanical and ought to be automated. While using advanced meta-programming techniques enables automating the merge combinator to a large extent in existing programming languages [37,42], those techniques have several problems: error messages can be problematic, type-unsafe reflection is needed in some approaches [37] and advanced type-level features are required in others [42]. An alternative to the merge combinator that supports modular multiple interpretations and works in OO languages with support for some form of multiple inheritance and covariant type-refinement of fields has also been recently proposed [55]. While this approach is relatively simple, it still requires a lot of manual boilerplate code for composition of interpretations.

This paper presents a calculus and polymorphic type system with *(disjoint) intersection types* [36], called F<sup>+</sup> i . <sup>F</sup><sup>+</sup> i supports our broader notion of compositional designs, and enables the development of highly modular and reusable programs. F<sup>+</sup> i has a built-in merge operator and a powerful subtyping relation that are used to automate the composition of multiple (possibly dependent) interpretations. In F<sup>+</sup> i subtyping is coercive and enables the automatic generation of coercions in a *type-directed* fashion. This process is similar to that of other type-directed code generation mechanisms such as *type classes* [52], which eliminate boilerplate code associated to the *dictionary translation* [52].

F+ i continues a line of research on disjoint intersection types. Previous work on *disjoint polymorphism* (the <sup>F</sup>i calculus) [2] studied the combination of parametric polymorphism and disjoint intersection types, but its subtyping relation does not support BCD-style distributivity rules [3] and the type system also prevents unrestricted intersections [16]. More recently the NeColus calculus (or λ<sup>+</sup> i ) [5] introduced a system with *disjoint intersection types* and BCD-style distributivity rules, but did not account for parametric polymorphism. F<sup>+</sup> i is unique in that it combines all three features in a single calculus: *disjoint intersection types* and a *merge operator* ; *parametric (disjoint) polymorphism*; and a BCD-style subtyping relation with *distributivity rules*. The three features together allow us to improve upon the finally tagless and object algebra approaches and support advanced compositional designs. Moreover previous work on disjoint intersection types has shown various other applications that are also possible in F<sup>+</sup> i , including: *first-class traits* and *dynamic inheritance* [4], *extensible records* and *dynamic mixins* [2], and *nested composition* and *family polymorphism* [5].

Unfortunately the combination of the three features has non-trivial complications. The main technical challenge (like for most other calculi with disjoint intersection types) is the proof of coherence for F<sup>+</sup> i . Because of the presence of BCD-style distributivity rules, our coherence proof is based on the recent approach employed in λ<sup>+</sup> i [5], which uses a *heterogeneous* logical relation called *canonicity*. To account for polymorphism, which λ<sup>+</sup> i 's canonicity does not support, we originally wanted to incorporate the relevant parts of System F's logical relation [43]. However, due to a mismatch between the two relations, this did not work. The parametricity relation has been carefully set up with a delayed type substitution to avoid ill-foundedness due to its impredicative polymorphism. Unfortunately, canonicity is a heterogeneous relation and needs to account for cases that cannot be expressed with the delayed substitution setup of the homogeneous parametricity relation. Therefore, to handle those heterogeneous cases, we resorted to immediate substitutions and *predicative instantiations*. We do not believe that predicativity is a severe restriction in practice, since many source languages (e.g., those based on the Hindley-Milner type system like Haskell and OCaml) are themselves predicative and do not require the full generality of an impredicative core language. Should impredicative instantiation be required, we expect that step-indexing [1] can be used to recover well-foundedness, though at the cost of a much more complicated coherence proof.

The formalization and metatheory of F<sup>+</sup> i are a significant advance over that of <sup>F</sup>i. Besides the support for distributive subtyping, <sup>F</sup><sup>+</sup> i removes several restrictions imposed by the syntactic coherence proof in <sup>F</sup>i. In particular <sup>F</sup><sup>+</sup> i supports unrestricted intersections, which are forbidden in <sup>F</sup>i. Unrestricted intersections enable, for example, encoding certain forms of bounded quantification [39]. Moreover the new proof method is more robust with respect to language extensions. For instance, F<sup>+</sup> i supports the bottom type without significant complications in the proofs, while it was a challenging open problem in <sup>F</sup>i. A final interesting aspect is that F<sup>+</sup> i 's type-checking is decidable. In the design space of languages with polymorphism and subtyping, similar mechanisms have been known to lead to undecidability. Pierce's seminal paper "*Bounded quantification is undecidable*" [40] shows that the contravariant subtyping rule for bounded quantification in <sup>F</sup><: leads to undecidability of subtyping. In <sup>F</sup><sup>+</sup> i the contravariant rule for disjoint quantification retains decidability. Since with unrestricted intersections F<sup>+</sup> i can express several use cases of bounded quantification, <sup>F</sup><sup>+</sup> i could be an interesting and decidable alternative to <sup>F</sup><:.

In summary the contributions of this paper are:


# **2 Compositional Programming**

To demonstrate the compositional properties of F<sup>+</sup> i we use Gibbons and Wu's shallow embeddings of parallel prefix circuits [20]. By means of several different shallow embeddings, we first illustrate the short-comings of a state-of-the-art compositional approach, popularly known as a *finally tagless* encoding [11], in Haskell. Next we show how parametric polymorphism and distributive intersection types provide a more elegant and compact solution in SEDEL [4], a source language built on top of our F<sup>+</sup> i calculus.

### **2.1 A Finally Tagless Encoding in Haskell**

The circuit DSL represents networks that map a number of inputs (known as the width) of some type A onto the same number of outputs of the same type. The outputs combine (with repetitions) one or more inputs using a binary associative operator ⊕ : A × A → A. A particularly interesting class of circuits that can be expressed in the DSL are *parallel prefix circuits*. These represent computations that take n > 0 inputs <sup>x</sup>1,...,xn and produce <sup>n</sup> outputs <sup>y</sup>1,...,yn, where <sup>y</sup>i <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>⊕</sup> <sup>x</sup><sup>2</sup> <sup>⊕</sup> ... <sup>⊕</sup> <sup>x</sup>i.

The DSL features 5 language primitives: two basic circuit constructors and three circuit combinators. These are captured in the Haskell type class Circuit:

```
data Width=W{ width :: Int }
instance Circuit Width where
 identity n = W n
 fan n = W n
 beside c1 c2 =
   W (width c1 + width c2)
 above c1 c2 = c1
 stretch ws c = W (sum ws)
     (a) Width embedding
                                 data Depth = D { depth :: Int }
                                 instance Circuit Depth where
                                   identity n = D 0
                                   fan n = D 1
                                   beside c1 c2 =
                                     D (max (depth c1) (depth c2))
                                   above c1 c2 = D (depth c1 + depth c2)
                                   stretch ws c = c
                                           (b) Depth embedding
```
**Fig. 1.** Two finally tagless embeddings of circuits.


An identity circuit with <sup>n</sup> inputs <sup>x</sup>i, has <sup>n</sup> outputs <sup>y</sup>i <sup>=</sup> <sup>x</sup>i. A fan circuit has <sup>n</sup> inputs <sup>x</sup>i and <sup>n</sup> outputs <sup>y</sup>i, where <sup>y</sup><sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> and <sup>y</sup>j <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>⊕</sup> <sup>x</sup>j (j > 1). The binary beside combinator puts two circuits in parallel; the combined circuit takes the inputs of both circuits to the outputs of both circuits. The binary above combinator connects the outputs of the first circuit to the inputs of the second; the width of both circuits has to be same. Finally, stretch ws c interleaves the wires of circuit <sup>c</sup> with bundles of additional wires that map their input straight on their output. The ws parameter specifies the width of the consecutive bundles; the <sup>i</sup>th wire of <sup>c</sup> is preceded by a bundle of width *ws*i <sup>−</sup> 1.

*Basic width and depth embeddings.* Figure 1 shows two simple shallow embeddings, which represent a circuit respectively in terms of its width and its depth. The former denotes the number of inputs/outputs of a circuit, while the latter is the maximal number of ⊕ operators between any input and output. Both definitions follow the same setup: a new Haskell datatype (Width/Depth) wraps the primitive result value and provides an instance of the Circuit type class that interprets the 5 DSL primitives accordingly. The following code creates a so-called Brent-Kung parallel prefix circuit [9]:

```
e1 :: Width
e1 = above (beside (fan 2) (fan 2))
       (above (stretch [2, 2] (fan 2))
         (beside (beside (identity 1) (fan 2)) (identity 1)))
```
Here e1 evaluates to W {width = 4}. If we want to know the depth of the circuit, we have to change type signature to Depth.

*Interpreting multiple ways.* Fortunately, with the help of polymorphism we can define a type of circuits that support multiple interpretations at once.

**type** DCircuit = forall c. Circuit c <sup>⇒</sup> c

This way we can provide a single Brent-Kung parallel prefix circuit definition that can be reused for different interpretations.

```
brentKung :: DCircuit
brentKung = above (beside (fan 2) (fan 2))
              (above (stretch [2, 2] (fan 2))
                (beside (beside (identity 1) (fan 2)) (identity 1)))
```
A type annotation then selects the desired interpretation. For instance, brentKung :: Width yields the width and brentKung :: Depth the depth.

*Composition of embeddings.* What is not ideal in the above code is that the same brentKung circuit is processed twice, if we want to execute both interpretations. We can do better by processing the circuit only once, computing both interpretations simultaneously. The finally tagless encoding achieves this with a boilerplate instance for tuples of interpretations.

```
instance (Circuit c1, Circuit c2) ⇒ Circuit (c1, c2) where
 identity n = (identity n, identity n)
 fan n = (fan n, fan n)
 beside c1 c2 = (beside (fst c1) (fst c2), beside (snd c1) (snd c2))
 above c1 c2 = (above (fst c1) (fst c2), above (snd c1) (snd c2))
 stretch ws c = (stretch ws (fst c), stretch ws (snd c))
```
Now we can get both embeddings simultaneously as follows:

```
e12 :: (Width, Depth)
e12 = brentKung
```
This evaluates to (W {width = 4}, D {depth = 2}).

*Composition of dependent interpretations.* The composition above is easy because the two embeddings are orthogonal. In contrast, the composition of dependent interpretations is rather cumbersome in the standard finally tagless setup. An example of the latter is the interpretation of circuits as their wellsizedness, which captures whether circuits are well-formed. This interpretation depends on the interpretation of circuits as their width.<sup>1</sup>

```
data WellSized = WS { wS :: Bool, ox :: Width }
instance Circuit WellSized where
identity n = WS True (identity n)
fan n = WS True (fan n)
beside c1 c2 = WS (wS c1 && wS c2) (beside (ox c1) (ox c2))
```
<sup>1</sup> Dependent recursion schemes are also known as *zygomorphism* [18] after the ancient Greek word for yoke. We have labeled the Width field with ox because it is pulling the yoke.

```
above c1 c2 = WS (wS c1 && wS c2 && width (ox c1) == width (ox c2))
                  (above (ox c1) (ox c2))
stretch ws c = WS (wS c && length ws==width (ox c)) (stretch ws (ox c))
```
The WellSized datatype represents the well-sizedness of a circuit with a Boolean, and also keeps track of the circuit's width. The 5 primitives compute the wellsizedness in terms of both the width and well-sizedness of the subcomponents. What makes the code cumbersome is that it has to explicitly delegate to the Width interpretation to collect this additional information.

With the help of a substantially more complicated setup that features a dozen Haskell language extensions, and advanced programming techniques, we can make the explicit delegation implicit (see the appendix). Nevertheless, that approach still requires *a lot of boilerplate* that needs to be repeated for each DSL, as well as explicit projections that need to be written in each interpretation. Another alternative Haskell encoding that also enables multiple dependent interpretations is proposed by Zhang and Oliveira [55], but it does not eliminate the explicit delegation and still requires substantial amounts of boilerplate. A final remark is that adding new primitives (e.g., a "right stretch" rstretch combinator [25]) can also be easily achieved [46].

# **2.2 The SEDEL Encoding**

SEDEL is a source language that elaborates to F<sup>+</sup> i , adding a few convenient source level constructs. The SEDEL setup of the circuit DSL is similar to the finally tagless approach. Instead of a Circuit c type class, there is a Circuit[C] type that gathers the 5 circuit primitives in a record. Like in Haskell, the type parameter <sup>C</sup> expresses that the interpretation of circuits is a parameter.

```
type Circuit[C] = {
 identity : Int → C, fan : Int → C, beside : C → C → C,
 above : C → C → C, stretch : List[Int] → C → C };
```
As a side note if a new constructor (e.g., rstretch) is needed, then this is done by means of intersection types (& creates an intersection type) in SEDEL:

```
type NCircuit[C] = Circuit[C] & { rstretch : List[Int] → C → C };
```
Figure 2 shows the two basic shallow embeddings for width and depth. In both cases, a named SEDEL definition replaces the corresponding unnamed Haskell type class instance in providing the implementations of the 5 language primitives for a particular interpretation.

The use of the SEDEL embeddings is different from that of their Haskell counterparts. Where Haskell implicitly selects the appropriate type class instance based on the available type information, in SEDEL the programmer explicitly selects the implementation following the style used by object algebras. The following code does this by building a circuit with l1 (short for language1).

```
l1 = language1;
e1 = l1.above (l1.beside (l1.fan 2) (l1.fan 2))
```

```
type Width = { width : Int };
language1 : Circuit[Width] = {
 identity (n : Int) = { width = n },
 fan (n : Int) = { width = n },
 beside (c1 : Width) (c2 : Width) = { width = c1.width + c2.width },
 above (c1 : Width) (c2 : Width) = { width = c1.width },
 stretch (ws : List[Int]) (c : Width) = { width = sum ws } };
type Depth = { depth : Int };
language2 : Circuit[Depth] = {
 identity (n : Int) = { depth = 0 },
 fan (n : Int) = { depth = 1 },
 beside (c1 : Depth) (c2 : Depth) = { depth = max c1.depth c2.depth},
 above (c1 : Depth) (c2 : Depth) = { depth = c1.depth + c2.depth},
```
388 X. Bi et al.

**Fig. 2.** Two SEDEL embeddings of circuits.

stretch (ws : List[Int]) (c : Depth) = { depth = c.depth } };

```
(l1.above (l1.stretch (cons 2 (cons 2 nil)) (l1.fan 2))
  (l1.beside (l1.beside (l1.identity 1) (l1.fan 2)) (l1.identity 1)));
```
Here e1 evaluates to {width = 4}. If we want to know the depth of the circuit, we have to replicate the code with language2.

*Dynamically reusable circuits.* Just like in Haskell, we can use polymorphism to define a type of circuits that can be interpreted with different languages.

**type** DCircuit = { accept : forall C. Circuit[C] <sup>→</sup> C };

In contrast to the Haskell solution, this implementation explicitly accepts the implementation.

```
brentKung : DCircuit = {
  acceptCl= l.above (l.beside (l.fan 2) (l.fan 2))
    (l.above (l.stretch (cons 2 (cons 2 nil)) (l.fan 2))
      (l.beside (l.beside (l.identity 1) (l.fan 2)) (l.identity 1))) };
e1 = brentKung.accept Width language1;
e2 = brentKung.accept Depth language2;
```
*Automatic composition of languages.* Of course, like in Haskell we can also compute both results simultaneously. However, unlike in Haskell, the composition of the two interpretation requires no boilerplate whatsoever—in particular, there is no SEDEL counterpart of the Circuit (c1, c2) instance. Instead, we can just compose the two interpretations with the term-level merge operator (,,) and specify the desired type Circuit[Width & Depth].

```
language3 : Circuit[Width & Depth] = language1 ,, language2;
e3 = brentKung.accept (Width & Depth) language3;
```
Here the use of the merge operator creates a term with the intersection type Circuit[Width] & Circuit[Depth]. Implicitly, the SEDEL type system takes care of the details, turning this intersection type into Circuit[Width & Depth]. This is possible because intersection (&) distributes over function and record types (a distinctive feature of BCD-style subtyping).

*Composition of dependent interpretations.* In SEDEL the composition scales nicely to dependent interpretations. For instance, the well-sizedness interpretation can be expressed without explicit projections.

```
type WellSized = { wS : Bool };
language4 = {
  identity (n : Int) = { wS = true },
  fan (n : Int) = { wS = true },
  above (c1 : WellSized & Width) (c2 : WellSized & Width) =
    { wS = c1.wS && c2.wS && c1.width == c2.width },
  beside (c1 : WellSized) (c2 : WellSized) = { wS = c1.wS && c2.wS },
  stretch (ws : List[Int]) (c : WellSized & Width) =
```
{ wS = c.wS && length ws == c.width } }; Here the WellSized & Width type in the above and stretch cases expresses that both the well-sizedness and width of subcircuits must be given, and that the width implementation is left as a dependency—when language4 is used, then the width implementation must be provided. Again, the distributive properties of & in the type system take care of merging the two interpretations.

```
e4 = brentKung.accept (WellSized & Width) (language1 ,, language4);
main = e4.wS -- Output: true
```
*Disjoint polymorphism and dynamic merges.* While it may seem from the above examples that definitions have to be merged statically, SEDEL in fact supports dynamic merges. For instance, we can encapsulate the merge operator in the combine function while abstracting over the two components x and y that are merged as well as over their types <sup>A</sup> and <sup>B</sup>.

combine A [B \* A] (x : A) (y : B) = x ,, y;

This way the components x and y are only known at runtime and thus the merge can only happen at that time. The types <sup>A</sup> and <sup>B</sup> cannot be chosen entirely freely. For instance, if both components would contribute an implementation for the same method, which implementation is provided by the combination would be ambiguous. To avoid this problem the two types <sup>A</sup> and <sup>B</sup> have to be *disjoint*. This is expressed in the disjointness constraint \* A on the quantifier of the type variable <sup>B</sup>. If a quantifier mentions no disjointness constraint, like that of <sup>A</sup>, it defaults to the trivial \* constraint which implies no restriction.

#### **3 Semantics of the F<sup>+</sup>** *<sup>i</sup>* **Calculus**

This section gives a formal account of F<sup>+</sup> i , the first typed calculus combining disjoint polymorphism [2] (and disjoint intersection types) with BCD subtyping [3].


**Fig. 3.** Syntax of F<sup>+</sup>

i

The main differences to <sup>F</sup>i are in the subtyping, well-formedness and disjointness relations. F<sup>+</sup> i adds BCD subtyping and unrestricted intersections, and also closes an open problem of <sup>F</sup>i by including the bottom type. The dynamic semantics of F<sup>+</sup> i is given by elaboration to the target calculus <sup>F</sup>co—a variant of System F extended with products and explicit coercions.

#### **3.1 Syntax and Semantics**

Figure 3 shows the syntax of F<sup>+</sup> i . Metavariables *<sup>A</sup>*, *<sup>B</sup>*, *<sup>C</sup>* range over types. Types include standard constructs from prior work [2,36]: integers Int, the top type , arrows *A* → *B*, intersections *A* & *B*, single-field record types {*l* : *A*} and disjoint quantification <sup>∀</sup>(<sup>α</sup> <sup>∗</sup> *<sup>A</sup>*). *<sup>B</sup>*. One novelty in <sup>F</sup><sup>+</sup> i is the addition of the uninhabited bottom type ⊥. Metavariable *E* ranges over expressions. Expressions are integer literals i, the top value , lambda abstractions λ*x* .*E*, applications *E*<sup>1</sup> *E*2, merges *E*<sup>1</sup> , , *E*2, annotated terms *E* : *A*, single-field records {*l* = *E*}, record projections *E*.*l*, type abstractions Λ(α ∗ *A*).*E* and type applications *E A*.

*Well-formedness and unrestricted intersections.* F<sup>+</sup> i 's well-formedness judgment of types Δ *A* is standard, and only enforces well-scoping. This is one of the key differences from <sup>F</sup>i, which uses well-formedness to also ensure that all intersection types are disjoint. In other words, while in <sup>F</sup>i all valid intersection types must be disjoint, in F<sup>+</sup> i unrestricted intersection types such as Int & Int are allowed. More specifically, the well-formedness of intersection types in F<sup>+</sup> i and <sup>F</sup><sup>i</sup> is:

$$\begin{array}{c} \Delta \vdash A \qquad \Delta \vdash B\\ \hline \Delta \vdash A \& B \end{array} \text{w\r\r\text{-F}}\_{i}^{+} \qquad \begin{array}{c} \Delta \vdash A \qquad \Delta \vdash B \qquad \Delta \vdash A \star B\\ \hline \Delta \vdash A \& B \end{array} \text{w\r\text{-F}}\_{i} \text{w\r\text{-F}}\_{i}$$

Notice that <sup>F</sup>i has an extra disjointness condition <sup>Δ</sup> *<sup>A</sup>*∗*<sup>B</sup>* in the premise. This is crucial for <sup>F</sup>i's syntactic method for proving coherence, but also burdens the calculus with various syntactic restrictions and complicates its metatheory. For example, it requires extra effort to show that <sup>F</sup>i only produces disjoint intersection types. As a consequence, <sup>F</sup>i features a *weaker* substitution lemma (note the gray part in Proposition 1) than F<sup>+</sup> i (Lemma 1).

**Proposition 1 (Type substitution in** <sup>F</sup>i**).** *If* <sup>Δ</sup> *A,* <sup>Δ</sup> *B ,* (<sup>α</sup> <sup>∗</sup> *<sup>C</sup>* ) <sup>∈</sup> <sup>Δ</sup>*,* Δ *B* ∗ *C and well-formed context* [*B*/α]Δ*, then* [*B*/α]Δ [*B*/α]*A.*

**Lemma 1 (Type substitution in** F<sup>+</sup> i **).** *If* <sup>Δ</sup> *A,* <sup>Δ</sup> *B ,* (<sup>α</sup> <sup>∗</sup> *<sup>C</sup>* ) <sup>∈</sup> <sup>Δ</sup> *and well-formed context* [*B*/α]Δ*, then* [*B*/α]Δ [*B*/α]*A.*

co *(Declarative subtyping)*


*A* <: *B* -

**Fig. 4.** Declarative subtyping

*Declarative subtyping.* F<sup>+</sup> i 's subtyping judgment is another major difference to <sup>F</sup>i, because it features BCD-style subtyping and a rule for the bottom type. The full set of subtyping rules are shown in Fig. 4. The reader is advised to ignore the gray parts for now. Our subtyping rules extend the BCD-style subtyping rules from λ<sup>+</sup> i [5] with a rule for parametric (disjoint) polymorphism (rule S-forall). Moreover, we have three new rules: rule S-bot for the bottom type, and rules S-distAll and S-topAll for distributivity of disjoint quantification. The subtyping relation is a partial order (rules S-refl and S-trans). Most of the rules are quite standard. <sup>⊥</sup> is a subtype of all types (rule S-bot). Subtyping of disjoint quantification is covariant in its body, and contravariant in its disjointness constraints (rule S-forall). Of particular interest are those socalled "distributivity" rules: rule S-distArr says intersections distribute over arrows; rule S-distRcd says intersections distribute over records. Similarly, rule S-distAll dictates that intersections may distribute over disjoint quantifiers.


**Fig. 5.** Bidirectional type system

*Typing rules.* F<sup>+</sup> i features a bidirectional type system inherited from <sup>F</sup>i. The full set of typing rules are shown in Fig. 5. Again we ignore the gray parts and explain them in Sect. 3.3. The inference judgment Δ; Γ *E* ⇒ *A* says that we can synthesize the type *A* under the contexts Δ and Γ. The checking judgment Δ; Γ *E* ⇐ *A* asserts that *E* checks against the type *A* under the contexts Δ and Γ. Most of the rules are quite standard in the literature. The merge expression *E*<sup>1</sup> , , *E*<sup>2</sup> is well-typed if both sub-expressions are welltyped, and their types are *disjoint* (rule T-merge). The disjointness relation will be explained in Sect. 3.2. To infer a type abstraction (rule T-tabs), we add disjointness constraints to the type context. For a type application (rule T-tapp), we check that the type argument satisfies the disjointness constraints. Rules T-merge and T-tapp are the only rules checking disjointness.


**Fig. 6.** Selected rules for disjointness

#### **3.2 Disjointness**

We now turn to another core judgment of F<sup>+</sup> i —the disjointness relation, shown in Fig. 6. The disjointness rules are mostly inherited from <sup>F</sup>i [2], but the new bottom type requires a notable change regarding disjointness with *top-like types*.

*Top-like types.* Top-like types are all types that are isomorphic to (i.e., simultaneously sub- and supertypes of ). Hence, they are inhabited by a single value, isomorphic to the value. Figure 6 captures this notion in a syntax-directed fashion in the *A* predicate. As a historical note, the concept of top-like types was already known by Barendregt et al. [3]. The <sup>λ</sup>i calculus [36] re-discovered it and coined the term "top-like types"; the <sup>F</sup>i calculus [2] extended it with universal quantifiers. Note that in both calculi, top-like types are solely employed for enabling a syntactic method of proving coherence, and due to the lack of BCD subtyping, they do not have a type-theoretic interpretation of top-like types.

*Disjointness rules.* The disjointness judgment Δ *A* ∗ *B* is helpful to check whether the merge of two expressions of type *A* and *B* preserves coherence. Incoherence arises when both expressions produce distinct values for the same type, either directly when they are both of that same type, or through implicit


**Fig. 7.** Syntax of <sup>F</sup>co

upcasting to a common supertype. Of course we can safely disregard top-like types in this matter because they do not have two distinct values. In short, it suffices to check that the two types have only top-like supertypes in common.

Because ⊥ and any another type *A* always have *A* as a common supertype, it follows that ⊥ is only disjoint to *A* when *A* is top-like. More generally, if *A* is a top-like type, then *A* is disjoint to any type. This is the rationale behind the two rules D-topL and D-topR, which generalize and subsume <sup>Δ</sup> ∗ *<sup>A</sup>* and <sup>Δ</sup> *<sup>A</sup>* ∗ from <sup>F</sup>i, and also cater to the bottom type. Two other interesting rules are D-tvarL and D-tvarR, which dictate that a type variable α is disjoint with some type *B* if its disjointness constraints *A* is a subtype of *B*. Disjointness axioms *<sup>A</sup>*∗ax*<sup>B</sup>* (appearing in rule D-ax) take care of two types with different type constructors (e.g., Int and records). Axiom rules can be found in the appendix. Finally we note that the disjointness relation is symmetric.

#### **3.3 Elaboration and Type Safety**

The dynamic semantics of F<sup>+</sup> i is given by elaboration into a target calculus. The target calculus <sup>F</sup>co is the standard call-by-value System F extended with products and coercions. The syntax of <sup>F</sup>co is shown in Fig. 7.

*Type translation.* Definition 1 defines the type translation function |·| from F+ i types *<sup>A</sup>* to <sup>F</sup>co types <sup>τ</sup> . Most cases are straightforward. For example, <sup>⊥</sup> is mapped to an uninhabited type ∀α. α; disjoint quantification is mapped to universal quantification, dropping the disjointness constraints. |·| is naturally extended to work on contexts as well.

**Definition 1.** *Type translation* |·| *is defined as follows:*


**Fig. 8.** Selected reduction rules

*Coercions and coercive subtyping.* We follow prior work [5,6] by having a syntactic category for coercions [22]. In Fig. 7, we have several new coercions: bot, co∀, dist<sup>∀</sup> and top<sup>∀</sup> due to the addition of polymorphism and bottom type. As seen in Fig. 4 the coercive subtyping judgment has the form *A* <: *B* co, which says that the subtyping derivation for *A* <: *B* produces a coercion co that converts terms of type |*A*| to |*B*|.

<sup>F</sup>co *static semantics.* The typing rules of <sup>F</sup>co are quite standard. We have one rule t-capp regarding coercion application, which uses the judgment co::τ τ to type coercions. We show two representative rules ct-forall and ct-bot.

$$\begin{array}{cc} \text{T-CAPP} & \text{C: } \tau \Vdash \tau \Vdash \tau' \\ \hline \Phi; \Psi \vdash co \; e :: \tau' \\ \hline \end{array} \qquad \begin{array}{c} \text{CT-FORAL} \\ \hline co :: \tau\_1 \beth \; \tau\_2 \\ \hline co :: \forall \alpha. \,\tau\_1 \beth \; \alpha. \,\tau\_2 \\ \hline \end{array} \qquad \begin{array}{c} \text{CT-BOT} \\ \hline \text{CT-BOT} \\ \hline \text{bot} :: \forall \alpha. \,\alpha \beth \; \tau\_1 \beth \; \alpha. \,\tau\_2 \\ \hline \end{array}$$

<sup>F</sup>co *dynamic semantics.* The dynamic semantics of <sup>F</sup>co is mostly unremarkable. We write *e* −→ *e* to mean one-step reduction. Figure 8 shows selected reduction rules. The first line shows three representative rules regarding coercion reductions. They do not contribute to computation but merely rearrange coercions. Our coercion reduction rules are quite standard but not efficient in terms of space. Nevertheless, there is existing work on space-efficient coercions [23,50], which should be applicable to our work as well. Rule r-app is the usual β-rule that performs actual computation, and rule r-ctxt handles reduction under an evaluation context. As usual, −→<sup>∗</sup> is the reflexive, transitive closure of −→. Now we can show that <sup>F</sup>co is type safe:

#### **Theorem 1 (Preservation).** *If* •; • *e* : τ *and e* −→ *e , then* •; • *e* : τ *.*

#### **Theorem 2 (Progress).** *If* •; • *e* : τ *, either e is a value, or* ∃*e* . *e* −→ *e .*

*Elaboration.* Now consider the translation parts in Fig. 5. The key idea of the translation follows the prior work [2,5,16,36]: merges are elaborated to pairs (rule T-merge); disjoint quantification and disjoint type applications (rules T-tabs and T-tapp)) are elaborated to regular universal quantification and type applications, respectively. Finally, the following lemma connects F<sup>+</sup> i to <sup>F</sup>co:

#### **Lemma 2 (Elaboration soundness).** *We have that:*

*– If A* <: *B* co*, then* co :: |*A*| |*B*|*. – If* Δ; Γ *E* ⇒ *A e, then* |Δ|; |Γ| *e* : |*A*|*. – If* Δ; Γ *E* ⇐ *A e, then* |Δ|; |Γ| *e* : |*A*|*.*

### **4 Algorithmic System and Decidability**

The subtyping relation in Fig. 4 is highly non-algorithmic due to the presence of a transitivity rule. This section presents an alternative algorithmic formulation. Our algorithm extends that of λ<sup>+</sup> i , which itself was inspired by Pierce's decision procedure [38], to handle disjoint quantifiers and the bottom type. We then prove that the algorithm is sound and complete with respect to declarative subtyping.

Additionally we prove that the subtyping and disjointness relations are decidable. Although the proofs of this fact are fairly straightforward, it is nonetheless remarkable since it contrasts with the subtyping relation for (full) <sup>F</sup><: [10], which is undecidable [40]. Thus while bounded quantification is infamous for its undecidability, disjoint quantification has the nicer property of being decidable.

#### **4.1 Algorithmic Subtyping Rules**

While Fig. 4 is a fine specification of how subtyping should behave, it cannot be read directly as a subtyping algorithm for two reasons: (1) the conclusions of rules S-refl and S-trans overlap with the other rules, and (2) the premises of rule S-trans mention a type that does not appear in the conclusion. Simply dropping the two offending rules from the system is not possible without losing expressivity [29]. Thus we need a different approach. Following λ<sup>+</sup> i , we intend the algorithmic judgment Q *A* <: *B* to be equivalent to *A* <: Q ⇒ *B*, where Q is a queue used to track record labels, domain types and disjointness constraints. The full rules of the algorithmic subtyping of F<sup>+</sup> i are shown Fig. 9.

**Definition 2 (**Q ::= [] | *l*, Q | *B*, Q | α ∗ *B*, Q**).** Q ⇒ *A is defined as follows:*

$$\begin{array}{ll} \left[ \begin{array}{c} \Rightarrow A = A\\ (l,\mathcal{Q}) \Rightarrow A = \{l:\mathcal{Q}\Rightarrow A\} \end{array} \right. & \left(\begin{array}{c} (B,\mathcal{Q}) \Rightarrow A = B \rightarrow (\mathcal{Q}\Rightarrow A)\\ (\alpha\*B,\mathcal{Q}) \Rightarrow A = \forall (\alpha\*B).\end{array} \right.\\ \left. \end{array} \right. \\ \left. \left(\begin{array}{c} (B,\mathcal{Q}) \Rightarrow A = \forall (\alpha\*B).\end{array} \right. \right. \\ \left. \left(\begin{array}{c} (B,\mathcal{Q}) \Rightarrow A = \forall (\alpha\*B).\end{array} \right. \\ \left. \left(\begin{array}{c} (B,\mathcal{Q}) \Rightarrow A = \forall (\alpha\*B).\end{array} \right. \right) \right) \end{array}$$

For brevity of the algorithm, we use metavariable c to mean type constants:

Type Constants c ::= Int |⊥| α

The basic idea of Q *A* <: *B* is to perform a case analysis on *B* until it reaches type constants. We explain new rules regarding disjoint quantification and the bottom type. When a quantifier is encountered in *B*, rule A-forall pushes the type variables with its disjointness constraints onto Q and continue with the body. Correspondingly, in rule A-allConst, when a quantifier is encountered in *A*, and the head of Q is a type variable, this variable is popped out and we continue with the body. Rule A-bot is similar to its declarative counterpart. Two meta-functions -Q and -Q & are meant to generate correct forms of coercions, and their definitions are shown in the appendix. For other algorithmic rules, we refer to λ<sup>+</sup> i [5] for detailed explanations.


**Fig. 9.** Algorithmic subtyping

*Correctness of the algorithm.* We prove that the algorithm is sound and complete with respect to the specification. We refer the reader to our Coq formalization for more details. We only show the two major theorems:

**Theorem 3 (Soundness).** *If* Q *A* <: *B* co *then A* <: Q ⇒ *B* co*.*

**Theorem 4 (Completeness).** *If A* <: *B* co*, then* ∃co . [] *A* <: *B* co *.*

### **4.2 Decidability**

Moreover, we prove that our algorithmic type system is decidable. To see this, first notice that the bidirectional type system is syntax-directed, so we only need to show decidability of algorithmic subtyping and disjointness. The full (manual) proofs for decidability can be found in the appendix.

**Lemma 3 (Decidability of algorithmic subtyping).** *Given* Q*, A and B , it is decidable whether there exists* co*, such that* Q *A* <: *B* co*.*

**Lemma 4 (Decidability of disjointness checking).** *Given* Δ*, A and B , it is decidable whether* Δ *A* ∗ *B .*

One interesting observation here is that although our disjointness quantification has a similar shape to bounded quantification <sup>∀</sup>(α <: *<sup>A</sup>*). *<sup>B</sup>* in <sup>F</sup><: [10], subtyping for <sup>F</sup><: is undecidable [40]. In <sup>F</sup><:, the subtyping relation between bounded quantification is:

$$\frac{\Delta \vdash A\_2 <: A\_1 \qquad \Delta, \alpha <: A\_2 \vdash B\_1 <: B\_2}{\Delta \vdash \forall (\alpha <: A\_1) . B\_1 <: \forall (\alpha <: A\_2) . B\_2} \text{ FSUB-FORALL}$$

Compared with rule S-forall, both rules are contravariant on bounded/disjoint types, and covariant on the body. However, with bounded quantification it is fundamental to track the bounds in the environment, which complicates the design of the rules and makes subtyping undecidable with rule fsub-forall. Decidability can be recovered by employing an invariant rule for bounded quantification (that is by forcing *A*<sup>1</sup> and *A*<sup>2</sup> to be identical). Disjoint quantification does not require such invariant rule for decidability.

#### **5 Establishing Coherence for F<sup>+</sup>** *i*

In this section, we establish the coherence property for F<sup>+</sup> i . The proof strategy mostly follows that of λ<sup>+</sup> i , but the construction of the heterogeneous logical relation is significantly more complicated. Firstly in Sect. 5.1 we discuss why adding BCD subtyping to disjoint polymorphism introduces significant complications. In Sect. 5.2, we discuss why a natural extension of System F's logical relation to deal with disjoint polymorphism fails. The technical difficulty is *wellfoundedness*, stemming from the interaction between impredicativity and disjointness. Finally in Sect. 5.3, we present our (predicative) logical relation that is specially crafted to prove coherence for F<sup>+</sup> i .

### **5.1 The Challenge**

Before we tackle the coherence of F<sup>+</sup> i , let us first consider how <sup>F</sup><sup>i</sup> (and its predecessor <sup>λ</sup>i) enforces coherence. Its essentially syntactic approach is to make sure that there is at most one subtyping derivation for any two types. As an immediate consequence, the produced coercions are uniquely determined and thus the calculus is clearly coherent. Key to this approach is the invariant that the type system only produces *disjoint* intersection types. As we mentioned in Sect. 3, this invariant complicates the calculus and its metatheory, and leads to a weaker substitution lemma. Moreover, the syntactic coherence approach is incompatible with BCD subtyping, which leads to multiple subtyping derivations with different coercions and requires a more general substitution lemma. To accommodate BCD into <sup>λ</sup>i, Bi et al. [5] have created the <sup>λ</sup><sup>+</sup> i calculus and developed a semantically-founded proof method based on logical relations. Because λ<sup>+</sup> i does not feature polymorphism, the problem at hand is to incorporate support for polymorphism in this semantic approach to coherence, which turns out to be more challenging than is apparent.

$$\begin{split} (v\_1, v\_2) \in \mathcal{V}[\mathsf{Int}; \mathsf{Int}] \triangleq & \exists i. \ v\_1 = v\_2 = i \\ (v\_1, v\_2) \in \mathcal{V}[\mathsf{I}\tau\_1 \to \tau\_2; \tau\_1' \to \tau\_2'] \triangleq & \forall (v, v') \in \mathcal{V}[\mathsf{I}\tau\_1; \tau\_1']. (v\_1 \ v, v\_2 \ v') \in \mathcal{E}[\mathsf{I}\tau\_2; \tau\_2'] \\ (\langle v\_1, v\_2 \rangle, v\_3) \in \mathcal{V}[\mathsf{I}\tau\_1 \times \tau\_2; \tau\_3] \triangleq & (v\_1, v\_3) \in \mathcal{V}[\mathsf{I}\tau\_1; \tau\_3] \land (v\_2, v\_3) \in \mathcal{V}[\mathsf{I}\tau\_2; \tau\_3] \\ (v\_3, \langle v\_1, v\_2 \rangle) \in \mathcal{V}[\mathsf{I}\tau\_3; \tau\_1 \times \tau\_2] \triangleq & (v\_3, v\_1) \in \mathcal{V}[\mathsf{I}\tau\_3; \tau\_1] \land (v\_3, v\_2) \in \mathcal{V}[\|\tau\_3; \tau\_2\|] \end{split}$$

**Fig. 10.** Selected cases from λ<sup>+</sup> i 's canonicity relation

#### **5.2 Impredicativity and Disjointness at Odds**

Figure 10 shows selected cases of *canonicity*, which is λ<sup>+</sup> i 's (heterogeneous) logical relation used in the coherence proof. The definition captures that two values *v*<sup>1</sup> and *v*<sup>2</sup> of types τ<sup>1</sup> and τ<sup>2</sup> are in Vτ1; τ2 iff either the types are disjoint or the types are equal and the values are semantically equivalent. Because both alternatives entail coherence, canonicity is key to λ<sup>+</sup> i 's coherence proof.

*Well-foundedness issues.* For F<sup>+</sup> i , we need to extend canonicity with additional cases to account for universally quantified types. For reasons that will become clear in Sect. 5.3, the type indices become source types (rather than target types as in Fig. 10). A naive formulation of one case rule is:

$$\begin{aligned} \mathbb{P}(v\_1, v\_2) &\in \mathcal{V}[\mathbb{P}(\alpha \ast A\_1).B\_1; \forall (\alpha \ast A\_2).B\_2] \overset{\Delta}{=} \\ &\forall C\_1 \ast A\_1, C\_2 \ast A\_2. \ (v\_1 \, | \, C\_1 \vert, v\_2 \, | \, C\_2 \vert) \in \mathcal{E}[[\![C\_1/\alpha]B\_1\!\/], [\![C\_2/\alpha]B\_2\!\/]] \end{aligned}$$

This case is problematic because it destroys the well-foundedness of λ<sup>+</sup> i 's logical relation, which is based on structural induction on the type indices. Indeed, the type [*C*1/α]*B*<sup>1</sup> may well be larger than ∀(α ∗ *A*1). *B*1.

However, System F's well-known parametricity logical relation [43] provides us with a means to avoid this problem. Rather than performing the type substitution immediately as in the above rule, we can defer it to a later point by adding it to an extra parameter ρ of the relation, which accumulates the deferred substitutions. This yields a modified rule where the type indices in the recursive occurrences are indeed smaller:

$$\begin{aligned} \mathbb{P}(v\_1, v\_2) &\in \mathcal{V}[\mathbb{V}(\alpha \ast A\_1), B\_1; \mathbb{V}(\alpha \ast A\_2), B\_2]\_{\rho} \overset{\Delta}{=} \\ &\forall C\_1 \ast A\_1, C\_2 \ast A\_2. (v\_1 \, | \, C\_1 |, \, v\_2 \, | \, C\_2 |) \in \mathcal{E}[B\_1; B\_2]\_{\rho \mid \alpha \mapsto (C\_1, C\_2)[\rho]} \end{aligned}$$

Of course, the deferred substitution has to be performed eventually, to be precise when the type indices are type variables.

$$\mathcal{V}(v\_1, v\_2) \in \mathcal{V}[\alpha; \alpha]\_\rho \triangleq (v\_1, v\_2) \in \mathcal{V}[\rho\_1(\alpha); \rho\_2(\alpha)]\_\emptyset$$

Unfortunately, this way we have not only moved the type substitution to the type variable case, but also the ill-foundedness problem. Indeed, this problem is also present in System F. The standard solution is to not fix the relation R by which values at type α are related to Vρ1(α); ρ2(α), but instead to make it a parameter that is tracked by ρ. This yields the following two rules for disjoint quantification and type variables:

$$(v\_1, v\_2) \in \mathcal{V}[\forall (\alpha \ast A\_1), B\_1; \forall (\alpha \ast A\_2), B\_2]\_{\rho} \triangleq \forall C\_1 \ast A\_1, C\_2 \ast A\_2, \mathbb{R} \subseteq C\_1 \times C\_2.$$

$$(v\_1 \, | \, C\_1 \, | \, v\_2 \, | \, C\_2 \big|) \in \mathcal{E}[B\_1; B\_2]\_{\rho \mid \alpha \mapsto (C\_1, C\_2, \mathbb{R})}$$

$$(v\_1, v\_2) \in \mathcal{V}[\alpha; \alpha]\_{\rho} \triangleq (v\_1, v\_2) \in \rho \mathbb{R}(\alpha)$$

Now we have finally recovered the well-foundedness of the relation. It is again structurally inductive on the size of the type indexes.

*Heterogeneous issues.* We have not yet accounted for one major difference between the parametricity relation, from which we have borrowed ideas, and the canonicity relation, to which we have been adding. The former is homogeneous (i.e., the types of the two values is the same) and therefore has one type index, while the latter is heterogeneous (i.e., the two values may have different types) and therefore has two type indices. Thus we must also consider cases like Vα; Int. A definition that seems to handle this case appropriately is:

$$\mathbb{P}(v\_1, v\_2) \in \mathcal{V}[\alpha; \mathsf{Int}]\_\rho \triangleq (v\_1, v\_2) \in \mathcal{V}[\rho\_1(\alpha); \mathsf{Int}]\_\emptyset \tag{1}$$

Here is an example to motivate it. Let *E* = Λ(α∗).((λ*x* . *x* ) : α & Int → α & Int). We expect that *E* Int 1 evaluates to 1, 1. To prove that, we need to show (1, 1) ∈ V<sup>α</sup>; Int[α→(Int,Int,R)]. According to Eq. (1), this is indeed the case. However, we run into ill-foundedness issue again, because ρ1(α) could be larger than α. Alas, this time the parametricity relation has no solution for us.

#### **5.3 The Canonicity Relation for F<sup>+</sup>** *i*

In light of the fact that substitution in the logical relation seems unavoidable in our setting, and that impredicativity is at odds with substitution, we turn to *predicativity*: we change rule T-tapp to its predicative version:

$$\frac{\Delta; I \vdash E \Rightarrow \forall (\alpha \ast B). \ C \leadsto \ e \qquad \Delta \vdash t \ast B}{\Delta; I \vdash E \ t \Rightarrow [t/\alpha]C \leadsto \ e \, |t|} \text{ T-TAPPMONO}$$

where metavariable *t* ranges over monotypes (types minus disjoint quantification). We do not believe that predicativity is a severe restriction in practice, since many source languages (e.g., those based on the Hindley-Milner type system [24,32] like Haskell and OCaml) are themselves predicative and do not require the full generality of an impredicative core language.

Luckily, substitution with monotypes does not prevent well-foundedness. Figure 11 defines the *canonicity* relation for F<sup>+</sup> i . The canonicity relation is a family of binary relations over <sup>F</sup>co values that are *heterogeneous*, i.e., indexed by two F<sup>+</sup> i types. Two points are worth mentioning. (1) An apparent difference from λ<sup>+</sup> i 's logical relation is that our relation is now indexed by *source types*. The

i

$$\rho \in \mathcal{D}[\![\varDelta] \!] \triangleq \overline{\emptyset \in \mathcal{D}[\![\bullet] \!]} \qquad \frac{\rho \in \mathcal{D}[\![\varDelta] \!] \qquad \bullet \vdash t\*\rho(B)}{\rho[\alpha \mapsto t] \in \mathcal{D}[\![\varDelta, \alpha \ast B] \!]}$$

$$(\gamma\_1, \gamma\_2) \in \mathcal{G}[\![\varGamma] \!]\_{\rho} \triangleq \overline{(\emptyset, \emptyset) \in \mathcal{G}[\![\bullet] \!]}\_{\rho} \quad \frac{(\gamma\_1, \gamma\_2) \in \mathcal{G}[\![\varGamma] \!]\_{\rho} \qquad (v\_1, v\_2) \in \mathcal{V}[\![\rho(A); \rho(A)] \!]}{(\gamma\_1[x \mapsto v\_1], \gamma\_2[x \mapsto v\_2]) \in \mathcal{G}[\![\varGamma, x \cdot A] \!]\_{\rho}}$$

**Fig. 11.** The canonicity relation for F<sup>+</sup>

reason is that the type translation function (Definition 1) discards disjointness constraints, which are crucial in our setting, whereas λ<sup>+</sup> i 's type translation does not have information loss. (2) Heterogeneity allows relating values of different types, and in particular values whose types are disjoint. The rationale behind the canonicity relation is to combine equality checking from traditional (homogeneous) logical relations with disjointness checking. It consists of two relations: the value relation V-*A*; *B* relates *closed* values; and the expression relation E-*A*; *B*—defined in terms of the value relation—relates closed expressions.

The relation V-*A*; *B* is defined by induction on the structures of *A* and *B*. For integers, it requires the two values to be literally the same. For two records to behave the same, their fields must behave the same. For two functions to behave the same, they are required to produce outputs related at *B*<sup>1</sup> and *B*<sup>2</sup> when given related inputs at *A*<sup>1</sup> and *A*2. For the next two cases regarding intersection types, the relation distributes over intersection constructor & . Of particular interest is the case for disjoint quantification. Notice that it *does not* quantify over arbitrary relations, but directly substitutes α with monotype *t* in *B*<sup>1</sup> and *B*2. This means that our canonicity relation *does not* entail parametricity. However, it suffices for our purposes to prove coherence. Another noticeable thing is that we keep the invariant that *A* and *B* are closed types throughout the relation, so we no longer need to consider type variables. This simplifies things a lot. Note that when one type is ⊥, two values are vacuously related because there simply are no values of type ⊥. We need to show that the relation is indeed well-founded:

#### **Lemma 5 (Well-foundedness).** *The canonicity relation of* F<sup>+</sup> i *is wellfounded.*

*Proof.* Let |·|<sup>∀</sup> and |·|s be the number of <sup>∀</sup>-quantifies and the size of types, respectively. Consider the measure |·|∀, |·|s, where ... denotes lexicographic order. For the case of disjoint quantification, the number of ∀-quantifiers decreases. For the other cases, the measure of |·|<sup>∀</sup> does not increase, and the measure of |·|s strictly decreases.

#### **5.4 Establishing Coherence**

*Logical equivalence.* The canonicity relation can be lifted to open expressions in the standard way, i.e., by considering all possible interpretations of free type and term variables. The logical interpretations of type and term contexts are found in the bottom half of Fig. 11.

# **Definition 3 (Logical equivalence** log**)**

$$\begin{aligned} \Delta; \Gamma \vdash e\_1 &\simeq\_{\log} e\_2 : A; B \triangleq |\Delta|; |\Gamma| \vdash e\_1 : |A| \land |\Delta|; |\Gamma| \vdash e\_2 : |B| \land \\ &\quad (\forall \rho, \gamma\_1, \gamma\_2. \rho \in \mathcal{D} \llbracket \Delta \rbrack \land (\gamma\_1, \gamma\_2) \in \mathcal{G} \llbracket \Gamma \rbrack\_{\rho} \Longrightarrow (\gamma\_1(\rho\_1(e\_1)), \gamma\_2(\rho\_2(e\_2))) \in \mathcal{E} \llbracket \rho(A); \rho(B) \rbrack ). \end{aligned}$$

For conciseness, we write <sup>Δ</sup>; <sup>Γ</sup> *<sup>e</sup>*<sup>1</sup> log *<sup>e</sup>*<sup>2</sup> : *<sup>A</sup>* to mean <sup>Δ</sup>; <sup>Γ</sup> *<sup>e</sup>*<sup>1</sup> log *<sup>e</sup>*<sup>2</sup> : *<sup>A</sup>*; *<sup>A</sup>*.

*Contextual equivalence.* Following λ<sup>+</sup> i , the notion of coherence is based on *contextual equivalence*. The intuition is that two programs are equivalent if we *cannot* tell them apart in any context. As usual, contextual equivalence is expressed using *expression contexts* (<sup>C</sup> and <sup>D</sup> denote <sup>F</sup><sup>+</sup> i and <sup>F</sup>co expression contexts, respectively), Due to the bidirectional nature of the type system, the typing judgment of C features 4 different forms (full rules are in the appendix), e.g., C : (Δ; Γ ⇒ *A*) → (Δ ; Γ ⇒ *A* ) - D reads if Δ; Γ *E* ⇒ *A* then Δ ; Γ C{*E*} ⇒ *A* . The judgment also generates a well-typed <sup>F</sup>co context D. The following two definitions capture the notion of contextual equivalence:

**Definition 4 (Kleene Equality ).** *Two complete programs (i.e., closed terms of type* Int*), e and e , are Kleene equal, written e e , iff there exists an integer* i *such that e* −→<sup>∗</sup> i *and e* −→<sup>∗</sup> i*.*

**Definition 5 (Contextual Equivalence** ctx**)**

$$\Delta; \Gamma \vdash E\_1 \simeq\_{ctx} E\_2 : A \triangleq \forall e\_1, e\_2. \ \Delta; \Gamma \vdash E\_1 \Rightarrow A \sim e\_1 \land \Delta; \Gamma \vdash E\_2 \Rightarrow A \sim e\_2 \land \Delta$$

$$(\forall C, \mathcal{D}. \ \mathcal{C} : (\Delta; \Gamma \Rightarrow A) \mapsto (\bullet; \bullet \Rightarrow \mathsf{Int}) \leadsto \mathcal{D} \Longrightarrow \mathcal{D}\{e\_1\} \simeq \mathcal{D}\{e\_2\})$$

*Coherence.* For space reasons, we directly show the coherence statement of F<sup>+</sup> i . We need several technical lemmas such as compatibility lemmas, fundamental property, etc. The interested reader can refer to our Coq formalization.

**Theorem 5 (Coherence).** *We have that*

*– If* <sup>Δ</sup>; <sup>Γ</sup> *<sup>E</sup>* <sup>⇒</sup> *A then* <sup>Δ</sup>; <sup>Γ</sup> *<sup>E</sup>* ctx *<sup>E</sup>* : *A. – If* <sup>Δ</sup>; <sup>Γ</sup> *<sup>E</sup>* ⇐ *A then* <sup>Δ</sup>; <sup>Γ</sup> *<sup>E</sup>* ctx *<sup>E</sup>* : *A.*

That is, coherence is a special case of Definition 5 where *E*<sup>1</sup> and *E*<sup>2</sup> are the same. At first glance, this appears underwhelming: of course *E* behaves the same as itself! The tricky part is that, if we expand it according to Definition 5, it is not *E* itself but all its translations *e*<sup>1</sup> and *e*<sup>2</sup> that behave the same!

# **6 Related Work**

*Coherence.* In calculi featuring coercive subtyping, a semantics that interprets the subtyping judgment by introducing explicit coercions is typically defined on typing derivations rather than on typing judgments. A natural question that arises for such systems is whether the semantics is *coherent*, i.e., distinct typing derivations of the same typing judgment possess the same meaning. Since Reynolds [45] proved the coherence of a calculus with intersection types, many researchers have studied the problem of coherence in a variety of typed calculi. Two approaches are commonly found in the literature. The first approach is to find a normal form for a representation of the derivation and show that normal forms are unique for a given typing judgment [8,15,47]. However, this approach cannot be directly applied to Curry-style calculi (where the lambda abstractions are not type annotated). Biernacki and Polesiuk [6] considered the coherence problem of coercion semantics. Their criterion for coherence of the translation is *contextual equivalence* in the target calculus. Inspired by this approach, Bi et al. [5] proposed the canonicity relation to prove coherence for a calculus with disjoint intersection types and BCD subtyping. As we have shown in Sect. 5, constructing a suitable logical relation for F<sup>+</sup> i is challenging. On the one hand, the original approach by Alpuim et al. [2] in <sup>F</sup>i does not work any more due to the addition of BCD subtyping. On the other hand, simply combining System F's logical relation with λ<sup>+</sup> i 's canonicity relation does not work as expected, due to the issue of well-foundedness. To solve the problem, we employ immediate substitutions and a restriction to predicative instantiations.

*BCD subtyping and decidability.* The BCD type system was first introduced by Barendregt et al. [3] to characterize exactly the strongly normalizing terms. The BCD type system features a powerful subtyping relation, which serves as a base for our subtyping relation. The decidability of BCD subtyping has been shown in several works [27,38,41,51]. Laurent [28] formalized the relation in Coq in order to eliminate transitivity cuts from it, but his formalization does not deliver an algorithm. Only recently, Laurent [30] presented a general way of defining a BCD-like subtyping relation extended with generic contravariant/ covariant type constructors that enjoys the "sub-formula property". Our Coq formalization extends the approach used in λ<sup>+</sup> i , which follows a different idea based on Pierce's decision procedure [38], with parametric (disjoint) polymorphism and corresponding distributivity rules. More recently, Muehlboeck and Tate [34] presented a decidable algorithmic system (proved in Coq) with union and intersection types. Similar to F<sup>+</sup> i , their system also has distributive subtyping rules. They also discussed the addition of polymorphism, but left a Coq formalization for future work. In their work they regard intersections of disjoint types (e.g., String & Int) as uninhabitable, which is different from our interpretation. As a consequence, coherence is a non-issue for them.

*Intersection types, the merge operator and polymorphism.* Forsythe [44] has intersection types and a merge-like operator. However to ensure coherence, various


**Fig. 12.** Summary of intersection calculi ( = yes, = no, = syntactic coherence)

restrictions were added to limit the use of merges. In Forsythe merges cannot contain more than one function. Castagna et al. [12] proposed a coherent calculus λ& to study overloaded functions. λ& has a special merge operator that works on functions only. Dunfield proposed a calculus [16] (which we call <sup>λ</sup>,,) that shows significant expressiveness of type systems with unrestricted intersection types and an (unrestricted) merge operator. However, because of his unrestricted merge operator (allowing 1 , , 2), his calculus lacks coherence. Blaauwbroek's λ<sup>∨</sup> <sup>∧</sup> [7] enriched <sup>λ</sup>,, with BCD subtyping and computational effects, but he did not address coherence. The coherence issue for a calculus similar to <sup>λ</sup>,, was first addressed in <sup>λ</sup>i [36] with the notion of disjointness, but at the cost of dropping unrestricted intersections, and a strict notion of coherence (based on α-equivalence). Later Bi et al. [5] improved calculi with disjoint intersection types by removing several restrictions, adopted BCD subtyping and a semantic notion of coherence (based on contextual equivalence) proved using canonicity. The combination of intersection types, a merge operator and parametric polymorphism, while achieving coherence was first studied in <sup>F</sup>i [2], which serves as a foundation for F<sup>+</sup> i . However, <sup>F</sup><sup>i</sup> suffered the same problems as <sup>λ</sup>i. Additionally in <sup>F</sup>i a bottom type is problematic due to interactions with disjoint polymorphism and the lack of unrestricted intersections. The issues can be illustrated with the well-typed F<sup>+</sup> i expression <sup>Λ</sup>(<sup>α</sup> ∗ ⊥). λ*<sup>x</sup>* : α. *<sup>x</sup>* , , *<sup>x</sup>* . In this expression the type of *x* , , *x* is α & α. Such a merge does not violate disjointness because the only types that α can be instantiated with are top-like, and top-like types do not introduce incoherence. In <sup>F</sup>i a type variable <sup>α</sup> can never be disjoint to another type that contains α, but (as the previous expression shows) the addition of a bottom type allows expressions where such (strict) condition does not hold. In this work, we removed those restrictions, extended BCD subtyping with polymorphism, and proposed a more powerful logical relation for proving coherence. Figure 12 summarizes the main differences between the aforementioned calculi.

There are also several other calculi with intersections and polymorphism. Pierce proposed F<sup>∧</sup> [39], a calculus combining intersection types and bounded quantification. Pierce translates F<sup>∧</sup> to System F extended with products, but he left coherence as a conjecture. More recently, Castagna et al. [14] proposed a polymorphic calculus with set-theoretic type connectives (intersections, unions, negations). But their calculus does not include a merge operator. Castagna and Lanvin also proposed a gradual type system [13] with intersection and union types, but also without a merge operator.

*Row polymorphism and bounded polymorphism.* Row polymorphism was originally proposed by Wand [54] as a mechanism to enable type inference for a simple object-oriented language based on recursive records. These ideas were later adopted into type systems for extensible records [19,21,31]. Our merge operator can be seen as a generalization of record extension/concatenation, and selection is also built-in. In contrast to most record calculi, restriction is not a primitive operation in F<sup>+</sup> i , but can be simulated via subtyping. Disjoint quantification can simulate the *lacks* predicate often present in systems with row polymorphism. Recently Morris and McKinna presented a typed language [33], generalizing and abstracting existing systems of row types and row polymorphism. Alpuim et al. [2] informally studied the relationship between row polymorphism and disjoint polymorphism, but it would be interesting to study such relationship more formally. The work of Morris and McKinna may be interesting for such study in that it gives a general framework for row type systems.

Bounded quantification is currently the dominant mechanism in major mainstream object-oriented languages supporting both subtyping and polymorphism. <sup>F</sup><: [10] provides a simple model for bounded quantification, but type-checking in full <sup>F</sup><: is proved to be undecidable [40]. Pierce's thesis [39] discussed the relationship between calculi with simple polymorphism and intersection types and bounded quantification. He observed that there is a way to "encode" many forms of bounded quantification in a system with intersections and pure (unbounded) second-order polymorphism. That encoding can be easily adapted to F<sup>+</sup> i :

$$\forall (\alpha <: A) . B \stackrel{\Delta}{=} \forall (\alpha \* \top) . ([A \& \ \alpha/\alpha]B).$$

The idea is to replace bounded quantification by (unrestricted) universal quantification and all occurrences of α by *A* & α in the body. Such an encoding seems to indicate that F<sup>+</sup> i could be used as a decidable alternative to (full) <sup>F</sup><:. It is worthwhile to note that this encoding does not work in <sup>F</sup>i because *<sup>A</sup>* & <sup>α</sup> is not well-formed (α is not disjoint to *A*). In other words, the encoding requires unrestricted intersections.

# **7 Conclusion and Future Work**

We have proposed F<sup>+</sup> i , a type-safe and coherent calculus with disjoint intersection types, BCD subtyping and parametric polymorphism. F<sup>+</sup> i improves the state-ofart of compositional designs, and enables the development of highly modular and reusable programs. One interesting and useful further extension would be implicit polymorphism. For that we want to combine Dunfield and Krishnaswami's approach [17] with our bidirectional type system. We would also like to study the parametricity of F<sup>+</sup> i . As we have seen in Sect. 5.2, it is not at all obvious how to extend the standard logical relation of System F to account for disjointness, and avoid potential circularity due to impredicativity. A promising solution is to use step-indexed logical relations [1].

**Acknowledgments.** We thank the anonymous reviewers and Yaoda Zhou for their helpful comments. This work has been sponsored by the Hong Kong Research Grant Council projects number 17210617 and 17258816, and by the Research Foundation - Flanders.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Types by Need**

Beniamino Accattoli1, Giulio Guerrieri2(B) , and Maico Leberle<sup>1</sup>

<sup>1</sup> Inria & LIX, Ecole Polytechnique, UMR 7161, Palaiseau, France ´

{beniamino.accattoli,maico-carlos.leberle}@inria.fr <sup>2</sup> Department of Computer Science, University di Bath, Bath, UK g.guerrieri@bath.ac.uk

**Abstract.** A cornerstone of the theory of λ-calculus is that intersection types characterise termination properties. They are a flexible tool that can be adapted to various notions of termination, and that also induces adequate denotational models.

Since the seminal work of de Carvalho in 2007, it is known that multi types (*i.e.* non-idempotent intersection types) refine intersection types with quantitative information and a strong connection to linear logic. Typically, type derivations provide bounds for evaluation lengths, and minimal type derivations provide exact bounds.

De Carvalho studied call-by-name evaluation, and Kesner used his system to show the termination equivalence of call-by-need and call-byname. De Carvalho's system, however, cannot provide exact bounds on call-by-need evaluation lengths.

In this paper we develop a new multi type system for call-by-need. Our system produces exact bounds and induces a denotational model of callby-need, providing the first tight quantitative semantics of call-by-need.

### **1 Introduction**

Duplications and erasures have always been considered as key phenomena in the λ-calculus—the λI-calculus, where erasures are forbidden, is an example of this. The advent of linear logic [38] gave them a new, prominent logical status. Forbidding erasure and duplication enables single-use resources, i.e. linearity, but limits expressivity, as every computation terminates in linear time. Their controlled reintroduction via the non-linear modality ! recovers the full expressive power of cut-elimination and allows a fine analysis of resource consumption. Duplication and erasure are therefore the key ingredients for logical expressivity, and—via Curry-Howard—for the expressivity of the λ-calculus. They are also essential to understand evaluation strategies.

In a λ-term there can be many β-redexes, that is, places where βreduction can be applied. In this sense, the λ-calculus is non-deterministic. Nondeterminism does not affect the result of evaluation, if any, but it affects whether evaluation terminates, and in how many steps. There are two natural deterministic evaluation strategies, call-by-name (shortened to CbN) and call-by-value (CbV), which have dual behaviour with respect to duplication and erasure.

Call-by-Name = Silly Duplication + Wise Erasure. CbN never evaluates arguments of β-redexes before the redexes themselves. As a consequence, it never evaluates in subterms that will be erased. This is wise, and makes CbN a normalising strategy, that is, a strategy that reaches a result whenever one exists1. A second consequence is that if the argument of the redex is duplicated then it may be evaluated more than once. This is silly, as it repeats work already done.

Call-by-Value = Wise Duplication + Silly Erasure. CbV, on the other hand, always evaluates arguments of β-redexes before the redexes themselves. Consequently, arguments are not re-evaluated—this is wise with respect to duplication—but they are also evaluated when they are going to be erased. For instance, on t := (λx.λy.y)Ω, where Ω is the famous looping λ-term, CbV evaluation diverges (it keeps evaluating Ω) while CbN converges in one β-step (simply erasing Ω). This CbV treatment of erasure is clearly as silly as the duplicated work of CbN.

Call-by-Need = Wise Duplication + Wise Erasure. It is natural to try to combine the advantages of both CbN and CbV. The strategy that is wise with respect to both duplications and erasures is usually called call-by-need (CbNeed), it was introduced by Wadsworth [57], and dates back to the '70s. Despite being at the core of Haskell, one of the most-used functional programming languages, and in its strong variant—being at work in the kernel of Coq as designed by Barras [16], the theory of CbNeed is much less developed than that of CbN or CbV.

One of the reasons for this is that it cannot be defined inside the λ-calculus without some hacking. Manageable presentations of CbNeed indeed require firstclass sharing and micro-step operational semantics where variable occurrences are replaced one at a time (when needed), and not all at once as in the λ-calculus. Another reason is the less natural logical interpretation.

Linear Logic, Names, Values, and Needs. CbN and CbV have neat interpretations in linear logic. They correspond to two different representations of intuitionistic logic in linear logic, based on two different representations of implication<sup>2</sup>.

The logical interpretation of CbNeed—studied by Maraist et al. in [47]—is less neat than those of CbN and CbV. Within linear logic, CbNeed is usually understood as corresponding to the CbV representation where erasures are generalised to all terms, not only those under the scope of a ! modality. So, it is seen as a sort of affine CbV. Such an interpretation however is unusual, because it does not match exactly with cut-elimination in linear logic, as for CbN and CbV.

Call-by-Need, Abstractly. The main theorem of the theory of CbNeed is that it is termination equivalent to CbN, that is, on a fixed term, CbNeed evaluation terminates if and only if CbN evaluation terminates, and, moreover, they essentially

<sup>1</sup> If a term t admits both converging and diverging evaluation sequences then the diverging sequences occur in erasable subterms of t, which is why CbN avoids them.

<sup>2</sup> The CbN translation maps <sup>A</sup> <sup>⇒</sup> <sup>B</sup> to (!ACbN) - BCbN, while the CbV maps it to !ACbV - !BCbV, or equivalently to !(ACbV -BCbV).

produce the same result (up to some technical details that are irrelevant here). This is due to the fact that both strategies avoid silly divergent sequences such as that of (λx.λy.y)Ω. Termination equivalence is an abstract theorem stating that CbNeed erases as wisely as CbN. Curiously, in the literature there are no abstract theorems reflecting the dual fact that CbNeed duplicates as wisely as CbV—we provide one, as a side contribution of this paper.

Call-by-Need and Denotational Semantics. CbNeed is then usually considered as a CbV optimisation of CbN. In particular, every denotational model of CbN is also a model of CbNeed, and adequacy—that is the fact that the denotation of t is not degenerated if and only if t terminates—transfers from CbN to CbNeed.

Denotational semantics is invariant by evaluation, and so is insensitive to evaluation lengths by definition. It then seems that denotational semantics cannot distinguish between CbN and CbNeed. The aim of this paper is, somewhat counter-intuitively, to separate CbN and CbNeed semantically. We develop a type system whose type judgements induce a model—this is typical of intersection type systems—and whose type derivations provide exact bounds for CbNeed evaluation—this is usually obtained via non-idempotent intersection types. Unsurprisingly, the design of the type system requires a delicate mix of erasure and duplication and builds on the linear logic understanding of CbN and CbV.

Multi Types. Our typing framework is given by multi types, which is an alternative name for non-idempotent intersection types<sup>3</sup>. Multi types characterise termination properties exactly as intersection types, having moreover the advantages that they are closely related to (the relational semantics of) linear logic, their type derivations provide quantitative information about evaluation lengths, and the proof techniques are simpler—no need for the reducibility method.

The seminal work of de Carvalho [23] (appeared in 2007 but unpublished until 2018, see also [22]) showed how to use multi types to obtain exact bounds on evaluation lengths in CbN. Ehrhard adapted multi types to CbV [34], and very recently Accattoli and Guerrieri adapted de Carvalho's study of exact bounds to Ehrhard's system and CbV evaluation [8]. Kesner used de Carvalho's CbN multi types to obtain a simple proof that CbNeed is termination equivalent to CbN [40] (first proved with other techniques by Maraist, Odersky, and Wadler [48] and Ariola and Felleisen [11] in the nineties), and then Kesner and coauthors continued exploring the theory of CbNeed via CbN multi types [14,15,42].

Kesner's use of CbN multi types to study CbNeed is qualitative, as it deals with termination and not with exact bounds. For a quantitative study of CbNeed, de Carvalho's CbN system cannot really be informative: CbN multi types provide bounds for CbNeed which cannot be exact because they already provide exact bounds for CbN, which generally takes more steps than CbNeed.

<sup>3</sup> The new terminology is due to the fact that a non-idempotent intersection <sup>A</sup> <sup>∧</sup> <sup>A</sup> <sup>∧</sup> B ∧ C can be seen as a multi-set [A, A, B, C].

Multi Types by Need. In this paper we provide the first multi type system characterising CbNeed termination and whose minimal type derivations provide exact bounds for CbNeed evaluation lengths. The design of the type system is delicate, as we explain in Sect. 6. One of the key points is that, in contrast to Ehrhard's system for CbV [34], multi types for CbNeed cannot be directly extracted by the relational semantics of linear logic, given that CbNeed does not have a clean representation in it. A by-product of our work is a new denotational semantics of CbNeed, the first one to precisely reflect its quantitative properties.

Beyond the result itself, the paper tries to stress how the key ingredients of our type system are taken from those for CbN and CbV and combined together. To this aim, we first present multi types for CbN and CbV, and only then we proceed to build the CbNeed system and prove its properties.

Along the way, we also prove the missing fundamental property of CbNeed, that is, that it duplicates as efficiently as CbV. The result dualizes the termination equivalence of CbN and CbNeed, which shows that CbNeed erases as wisely as CbN. Careful: the CbV system is correct but of course not complete with respect to CbNeed, because CbNeed may normalise when CbV diverges. The proof of the result is straightforward, because of our presentations of CbV and CbNeed. We adopt a liberal, non-deterministic formulation of CbV, and assuming (without loss of generality, see [1]) that garbage collection is always postponed. These two ingredients turn CbNeed into a fragment of CbV, obtaining the new fundamental result as a corollary of correctness of CbV multi types for CbV evaluation.

Technical Development. The paper is extremely uniform, technically speaking. The three evaluations are presented as strategies of Accattoli and Kesner's Linear Substitution Calculus (shortened to LSC) [1,6], a calculus with a simple but expressive form of explicit sharing. The LSC is strongly related to linear logic [2], and provides a neat and manageable presentation of CbNeed, introduced by Accattoli, Barenbaum, and Mazza in [3], and further developed by various authors in [4,5,10,14,15,40,42]. Our type systems count evaluation steps by annotating typing rules in the exact same way, and the proofs of correctness and completeness all follow the exact same structure. While the results for CbN are very minor variations with respect to those in the literature [7,23], those for CbV are the first ones with respect to a presentation of CbV with sharing.

As it is standard for CbNeed, we restrict our study to closed terms and weak evaluation (that is, out of abstractions). The main consequence of this fact is that normal forms are particularly simple (sometimes called answers in the literature). Compared with other recent works dealing with exact bounds such as Accattoli, Graham-Lengrand, and Kesner [7] and Accattoli and Guerrieri [8] the main difference is that the size of normal forms is not taken into account by type derivations. This is because of the simple notions of normal forms in the closed and weak case, and not because the type systems are not accurate.

Related Work About CbNeed. Call-by-need was introduced by Wadsworth [57] in the '70s. In the '90s, it was first reformulated as operational semantics by Launchbury [46], Maraist, Odersky, and Wadler [48], and Ariola and Felleisen [11,12], and then implemented by Sestoft [55] and further studied by Kutzner and Schmidt-Schauß [45]. More recent papers are Garcia, Lumsdaine, and Sabry [36], Ariola, Herbelin, and Saurin [13], Chang and Felleisen [26], Danvy and Zerny [29], Downen et al. [33], P´edrot and Saurin [53], and Balabonski et al. [14].

Related Work About Multi Types. Intersection types are a standard tool to study λ-calculi—see Coppo and Dezani [27,28], Pottinger [54], and Krivine [44]. Nonidempotent intersection types, i.e. multi types, were first considered by Gardner [37], and then by Kfoury [43], Neergaard and Mairson [50], and de Carvalho [23]—a survey is Bucciarelli, Kesner, and Ventura [20].

Many recent works rely on multi types or relational semantics to study properties of programs and proofs. Beyond the cited ones, Diaz-Caro, Manzonetto, and Pagani [32], Carraro and Guerrieri [21], Ehrhard and Guerrieri [35], and Guerrieri [39] deal with CbV, while Bernadet and Lengrand [17], de Carvalho, Pagani, and Tortora de Falco [24] provide exact bounds. Further related work is by Bucciarelli, Ehrhard, and Manzonetto [18], de Carvalho and Tortora de Falco [25], Tsukada and Ong [56], Kesner and Vial [41], Piccolo, Paolini and Ronchi Della Rocca [52], Ong [51], Mazza, Pellissier, and Vial [49], Bucciarelli, Kesner and Ronchi Della Rocca [19]—this list is not exhaustive.

Proofs. Proofs are omitted. They can be found in the technical report [9].

# **2 Closed** *λ***-Calculi**

In this section we define the CbN, CbV, and CbNeed evaluation strategies. We present them in the context of the Accattoli and Kesner's linear substitution calculus (LSC) [1,6]. We mainly follow the uniform presentation of these strategies given by Accattoli, Barenbaum, and Mazza [3]. The only difference is that we adopt a non-deterministic presentation of CbV, subsuming both the left-to-right and the right-to-left strategies in [3], that makes our results slightly more general. Such a non-determinism is harmless: not only CbV evaluation is confluent, it even has the diamond property, so that all evaluations have the same length. Moreover, the non-deterministic presentation, together with the postponement of erasing steps discussed below, allows us to see CbNeed as a fragment of CbV, which shall provide a free proof that CbNeed duplicates as wisely as CbV.

Terms and Contexts. The set of terms Λlsc of the LSC is given by the grammar below, where t[x←s] is an explicit substitution (shortened to ES), that is a more compact notation for let x = s in t (intuitively, "t where x will be substituted by s"). Both λx.t and t[x←s] bind x in t, with the usual notion of α-equivalence.

$$\text{LSC Terms} \quad t, s, u ::= x \mid v \mid ts \mid t[x \gets s] \qquad \text{LSC Values} \quad v ::= \lambda x. t$$

The set fv(t) of free variables of a term t is defined as expected, in particular, fv(t[x←s]) := (fv(t)\{x})∪fv(s). A term t is closed if fv(t) = ∅, open otherwise. As usual, terms are identified up to α-equivalence.

Contexts are terms with exactly one occurrence of the hole ·, an additional constant. We shall use many different contexts. The most general ones are weak contexts W (i.e. not under abstractions). The (evaluation) contexts C, V and E—used to define CbN, CbV and CbNeed evaluation strategies, respectively are special cases of weak contexts (in fact, CbV contexts coincide with weak contexts, the consequences of that are discussed on p. 8). To define evaluation strategies, substitution contexts (i.e. lists of explicit substitutions) also play a role.

$$\begin{aligned} \text{WeAK CONEXTS} \qquad &W ::= \langle \cdot \rangle \mid Wt \mid W[x \leftarrow t] \mid tW \mid t[x \leftarrow W] \\ \text{SUBSTITUTION} \quad & \qquad S ::= \langle \cdot \rangle \mid S[x \leftarrow t] \\ \text{CBN CONEXTS} \qquad & C ::= \langle \cdot \rangle \mid Ct \mid C[x \leftarrow t] \\ \text{CBV CONEXTS} \qquad & V ::= W \\ \text{CBN EED CONEXTS} \qquad & E ::= \langle \cdot \rangle \mid Et \mid E[x \leftarrow t] \mid E\langle \langle x \rangle \mid x \leftarrow E' \rangle \end{aligned}$$

We write Wt for the term obtained by replacing the hole · in context W by the term t. This plugging operation, as usual with contexts, can capture variables—for instance ((·t)[x←s])x = (xt)[x←s]. We write Wt when we want to stress that the context W does not capture the free variables of t.

Micro-step Semantics. The rewriting rules decompose the usual small-step semantics for λ-calculi, by substituting linearly one variable occurrence at the time, and only when such an occurrence is in evaluation position. We emphasise this fact saying that we adopt a micro-step semantics. We now give the definitions, examples of evaluation sequences follow right next.

Formally, a micro-step semantics is defined by first giving its root-steps and then taking the closure of root-steps under suitable contexts.


where, in the root-step →<sup>m</sup> (resp. →<sup>e</sup>cbv ; →<sup>e</sup>need ), if S := [y1←s1] ... [y<sup>n</sup>←sn] for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>, then fv(s) (resp. fv(<sup>V</sup> x); fv(Ex)) and {y1,...,y<sup>n</sup>} are disjoint. This condition can always be fulfilled by α-equivalence.

The evaluation strategies −→cbn for CbN, −→cbv for CbV, and −→need for CbNeed, are defined as the closure of root-steps under CbN, CbV and CbNeed evaluation contexts, respectively (so, all evaluation strategies do not reduce under abstractions, since all such contexts are weak):

$$\begin{array}{c} \mathrm{CbN} \\ \rightarrow\_{\mathsf{n}\_{\mathsf{cln}}} := C \langle \leftrightarrow\_{\mathsf{n}} \rangle \\ \rightarrow\_{\mathsf{e}\_{\mathsf{e}\mathsf{lin}}} := C \langle \leftrightarrow\_{\mathsf{e}\_{\mathsf{e}\mathsf{lin}}} \rangle \\ \rightarrow\_{\mathsf{c}\mathsf{e}\mathsf{lin}} := C \langle \leftrightarrow\_{\mathsf{e}\_{\mathsf{e}\mathsf{lin}}} \rangle \end{array} \rightarrow \begin{array}{c} \mathrm{CbV} \\ \rightarrow\_{\mathsf{n}\_{\mathsf{c}\mathsf{l}\mathsf{v}}} := V \langle \leftrightarrow\_{\mathsf{n}} \rangle \\ \rightarrow\_{\mathsf{e}\_{\mathsf{e}\mathsf{l}\mathsf{v}}} := V \langle \leftrightarrow\_{\mathsf{e}\_{\mathsf{e}\mathsf{l}\mathsf{v}}} \rangle \\ \rightarrow\_{\mathsf{c}\mathsf{l}\mathsf{v}} := V \langle \leftrightarrow\_{\mathsf{n}} \cup \leftrightarrow\_{\mathsf{e}\_{\mathsf{e}\mathsf{l}\mathsf{v}}} \rangle \end{array} \rightarrow \begin{array}{c} \mathrm{CbNeed} \\ \rightarrow\_{\mathsf{n}\_{\mathsf{a}\mathsf{o}\mathsf{o}}} := E \langle \leftrightarrow\_{\mathsf{n}} \rangle \\ \rightarrow\_{\mathsf{n}\_{\mathsf{a}\mathsf{o}\mathsf{o}}} := E \langle \leftrightarrow\_{\mathsf{e}\_{\mathsf{n}\mathsf{o}\mathsf{o}}} \rangle \\ \rightarrow\_{\mathsf{n}\_{\mathsf{a}\mathsf{o}\mathsf{o}}} := E \langle \leftrightarrow\_{\mathsf{n}} \cup \leftrightarrow\_{\mathsf{e}\_{\mathsf{o}\mathsf{n}\mathsf{o}}} \rangle \end{array}$$

where the notation −→ := W→ means that, given a root-step →, the evaluation −→ is defined as follows: t −→s if and only if there are terms t and s and a context W such that t = Wt and s = Ws and t → s .

Note that evaluations −→cbn, −→cbv and −→need can equivalently be defined as −→<sup>m</sup>cbn ∪ −→<sup>e</sup>cbn , −→<sup>m</sup>cbn ∪ −→<sup>e</sup>cbv and −→<sup>m</sup>need ∪ −→<sup>e</sup>need , respectively.

Given an evaluation sequence d: t −→<sup>∗</sup> cbns we note with |d| the length of d, and with |d|<sup>m</sup> and |d|<sup>e</sup> the number of multiplicative and exponential steps in d, respectively—and similarly for −→cbv and −→need.

Erasing Steps. The reader may be surprised by our evaluation strategies, as none of them includes erasing steps, despite the absolute relevance of erasures pointed out in the introduction. There are no contradictions: in the LSC—in contrast to the λ-calculus—erasing steps can always be postponed (see [1]), and so they are often simply omitted. This is actually close to programming language practice, as the garbage collector acts asynchronously with respect to the evaluation flow. For the sake of clarity let us spell out the erasing rules—they shall nonetheless be ignored in the rest of the paper. In CbN and CbNeed every term is erasable, so the root erasing step takes the following form

$$t[x \gets s] \mapsto\_{\mathbf{g}\mathbf{c}} t \qquad \text{if } x \notin \mathbf{f} \mathbf{v}(t)$$

and it is then closed by weak evaluation contexts.

In CbV only values are erasable; so, the root erasing step in CbV is:

$$t[x \gets S\langle v \rangle] \mapsto\_{\mathbf{g}\mathbf{c}} S\langle t \rangle \qquad \text{if } x \notin \mathbf{f} \mathbf{v}(t),$$

and it is then closed by weak evaluation contexts.

Example 1. A good example to observe the differences between CbN, CbV, and CbNeed is given by the term t := ((λx.λy.xx)(II))(II) where I := λz.z is the identity combinator. In CbN, it evaluates with 5 multiplicative steps and 5 exponential steps, as follows:

$$\begin{array}{ll} \begin{array}{l} t \rightarrow \mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u} \left( \lambda y.xx \right) [x \leftarrow II](II) & \rightarrow\_{\mathsf{a}\mathsf{e}\mathsf{c}\mathsf{b}u} (xx)[y \leftarrow II][x \leftarrow II] \\ \rightarrow\_{\mathsf{a}\mathsf{e}\mathsf{b}u} \left( (II)x \middle][y \leftarrow II][x \leftarrow II] & \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} \left( z[z \leftarrow I]x \right)[y \leftarrow II][x \leftarrow II] \end{array} \\ \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} \left( I[z \leftarrow I]x \middle][y \leftarrow II][x \leftarrow II] \right) & \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} w[w \leftarrow x][z \leftarrow I][y \leftarrow II][x \leftarrow II] \\ \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} x[w \leftarrow x][z \leftarrow I][y \leftarrow II][x \leftarrow II] & \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} \left( II \right][w \leftarrow x][z \leftarrow I][y \leftarrow II][x \leftarrow II] \\ \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} x'[x' \leftarrow I][w \leftarrow x][z \leftarrow I][y \leftarrow II][x \leftarrow I] & \rightarrow\_{\mathsf{a}\_{\mathsf{e}\mathsf{c}\mathsf{b}u}} I[x' \leftarrow I][w \leftarrow x][z \leftarrow I][x \leftarrow II][x \leftarrow II] \end{array}$$

In CbV, t evaluates with 5 multiplicative steps and 5 exponential steps, for instance from right to left, as follows:

$$\begin{array}{llll} t \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (\lambda x.\lambda y.xx)(II)(z[z\leftarrow I]) & \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (\lambda x.\lambda y.xx)(II)(I[z\leftarrow I]) \\ \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (\lambda x.\lambda y.xx)(w[w\leftarrow I])(I[z\leftarrow I]) & \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (\lambda x.\lambda y.xx)(I[w\leftarrow I])(I[z\leftarrow I]) \\ \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (\lambda y.xx)[x\leftarrow I[w\leftarrow I][w\leftarrow I])(I[z\leftarrow I]) & \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (xx)[y\leftarrow I[z\leftarrow I]][x\leftarrow I[w\leftarrow I]] \\ \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (xI)[y\leftarrow I[z\leftarrow I][x\leftarrow I][w\leftarrow I]][w\leftarrow I] & \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} (II)[y\leftarrow I[z\leftarrow I]][x\leftarrow I][w\leftarrow I] \\ \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} x'[x'\leftarrow I][y\leftarrow I[z\leftarrow I][w\leftarrow I]][w\leftarrow I] & \rightarrow\_{\mathsf{a}\_{\mathsf{clr}}} I[x'\leftarrow I][y\leftarrow I[z\leftarrow I]][x\leftarrow I][w\leftarrow I] \end{array}$$

Note that the fact that CbN and CbV take the same number of steps is by chance, as they reduce different redexes: CbN never reduce the unneeded redex II associated to y, but it reduces twice the needed II redex associated to x, while CbV reduces both, but each one only once.

In CbNeed, t evaluates in 4 multiplicative steps and 4 exponential steps.

$$\begin{array}{ccc} t \rightarrow\_{\mathtt{n}\_{\mathtt{n}\rm{end}}} (\lambda y.xx)[x\leftarrow II](II) & \rightarrow\_{\mathtt{n}\_{\mathtt{n}\rm{end}}} (xx)[y\leftarrow II][x\leftarrow II] \\ \rightarrow\_{\mathtt{n}\_{\mathtt{n}\rm{end}}} (xx)[y\leftarrow II][x\leftarrow z[z\leftarrow I]] & \rightarrow\_{\mathtt{a}\_{\mathtt{n}\rm{end}}} (xx)[y\leftarrow II][x\leftarrow I[z\leftarrow I]] \\ \rightarrow\_{\mathtt{a}\_{\mathtt{n}\rm{end}}} (Ix)[y\leftarrow II][x\leftarrow I][z\leftarrow I] & \rightarrow\_{\mathtt{n}\_{\mathtt{n}\rm{end}}} (w[w\leftarrow x])[y\leftarrow II][x\leftarrow I][z\leftarrow I] \\ \rightarrow\_{\mathtt{a}\_{\mathtt{n}\rm{end}}} w[w\leftarrow I][y\leftarrow II][x\leftarrow I][z\leftarrow I][z\leftarrow I] & \rightarrow\_{\mathtt{a}\_{\mathtt{n}\rm{end}}} I[w\leftarrow I][y\leftarrow II][x\leftarrow I][z\leftarrow I] \end{array}$$

CbV Diamond Property. CbV contexts coincide with weak ones. As a consequence, our presentation of CbV is non-deterministic, as for instance one can have

$$\iota\_\* x[x \leftarrow I](y[y \leftarrow I]) \dashv\_{\mathfrak{a}\_{\text{cbv}}} \leftarrow (II)(y[y \leftarrow I]) \rightarrow\_{\mathfrak{a}\_{\text{cbv}}} (II)(I[y \leftarrow I])$$

but it is easily seen that diagrams can be closed in exactly one step (if the two reducts are different). For instance,

$$x[x \leftarrow I](y[y \leftarrow I]) \rightarrow\_{\mathfrak{a}\_{\text{cbv}}} x[x \leftarrow I](I[y \leftarrow I]) \\_\_{\mathfrak{n}\_{\text{cbv}}} \leftarrow (II)(I[y \leftarrow I])$$

Moreover, the kind of steps is preserved, as the example illustrates. This is an instance of the strong form of confluence called diamond property. A consequence is that either all evaluation sequences normalise or all diverge, and if they normalise they have all the same length and the same number of steps of each kind. Roughly, the diamond property is a form of relaxed determinism. In particular, it makes sense to talk about the number of multiplicative/exponential steps to normal form, independently of the evaluation sequence. The proof of the property is an omitted routine check of diagrams.

Normal Forms. We use two predicates to characterise normal forms, one for both CbN and CbNeed normal forms, for which ES can contain whatever term, and one for CbV normal forms, where ES can only contain normal terms:

$$\begin{array}{cccc}\hline\hline\text{normal}(\lambda x.t) & \text{normal}(t[x \leftarrow s]) & \text{normal}\_{\text{cbv}}(\lambda x.t) \\\hline\hline\end{array} \\ \begin{array}{cccc}\hline\text{normal}\_{\text{cbv}}(t) & \text{normal}\_{\text{cbv}}(s) \\\hline\hline\end{array}$$

### **Proposition 1 (Syntactic characterization of closed normal forms).** Let t be a closed term.

1. CbN and CbNeed: For r ∈ {cbn, need}, t is r-normal if and only if normal(t). 2. CbV: t is cbv-normal if and only if normalcbv(t).

The simple structure of normal forms is the main point where the restriction to closed calculi plays a role in this paper.

From the syntactic characterization of normal forms (Proposition 1) it follows immediately that among closed terms, normal forms for CbN and CbNeed coincide, while normal forms for CbV are a subset of them. Such a subset is proper since the closed term I[x←δδ] (where I := λz.z and δ := λy.yy) is normal for CbN and CbNeed but not for CbV (and it cannot normalise in CbV).

# **3 Preliminaries About Multi Types**

In this section we define basic notions about multi types, type contexts, and (type) judgements that are shared by the three typing systems of the paper.

Multi-sets. The type systems are based on two layers of types, defined in a mutually recursive way, linear types L and finite multi-sets M of linear types. The intuition is that a linear type L corresponds to a single use of a term, and that an argument t is typed with a multi-set M of n linear types if it is going to end up (at most) n times in evaluation position, with respect to the strategy associated with the type system. The three systems differ on the definition of linear types, that is therefore not specified here, while all adopt the same notion of finite multi-set M of linear types (named multi type), that we now introduce:

$$\text{MULTI \text{\textquotedblleft}TYPE \text{\textquotedblright} }\qquad M, N \text{\textquotedblright} := [L\_i]\_{i \in J} \text{ (for any finite set } J\text{)}$$

where [...] denotes the multi-set constructor. The empty multi-set [ ] (the multi type obtained for J = ∅) is called empty (multi) type and denoted by the special symbol **0**. An example of multi-set is [L, L, L ], that contains two occurrences of L and one occurrence of L . Multi-set union is noted .

Type Contexts. A type context Γ is a (total) map from variables to multi types such that only finitely many variables are not mapped to **0**. The domain of Γ is the set dom(Γ) := {x | Γ(x) = **0**}. The type context Γ is empty if dom(Γ) = ∅.

Multi-set union is extended to type contexts point-wise, i.e. (Γ Π)(x) := Γ(x) Π(x) for each variable x. This notion is extended to a finite family of type contexts as expected, so that - <sup>i</sup>∈<sup>J</sup> <sup>Γ</sup><sup>i</sup> denotes a finite union of type contexts—it stands for the empty context when J = ∅. A type context Γ is denoted by <sup>x</sup><sup>1</sup> : <sup>M</sup>1,...,x<sup>n</sup> : <sup>M</sup><sup>n</sup> (for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>) if dom(Γ) ⊆ {x1,...,x<sup>n</sup>} and Γ(xi) = M<sup>i</sup> for all 1 ≤ i ≤ n. Given two type contexts Γ and Π such that dom(Γ) ∩ dom(Π) = ∅, the type context Γ,Π is defined by (Γ,Π)(x) := Γ(x) if x ∈ dom(Γ), (Γ,Π)(x) := Π(x) if x ∈ dom(Π), and (Γ,Π)(x) := **0** otherwise.

<sup>x</sup>:[L] -(0*,*1)<sup>x</sup> :<sup>L</sup> ax -(0*,*0)λx.t: normal normal Γ, x : <sup>M</sup> -(*m,e*) t:L Γ -(*m,e*) λx.t: M - L fun (Π*<sup>i</sup>* -(*mi,ei*) <sup>t</sup>:L*i*)*<sup>i</sup>*∈*<sup>J</sup>* - *<sup>i</sup>*∈*<sup>J</sup>* Π*<sup>i</sup>* -( - *<sup>i</sup>*∈*Jmi,* - *<sup>i</sup>*∈*<sup>J</sup> <sup>e</sup>i*) <sup>t</sup>: [L*i*]*<sup>i</sup>*∈*<sup>J</sup>* many Γ -(*m,e*) t: M - L Π -(*m,e*) s : M Γ Π -(*m*+*m*+1*,e*+*e*) ts :L app Γ, x:<sup>M</sup> -(*m,e*) <sup>t</sup>:L Π -(*m,e*) s : M Γ Π -(*m*+*m,e*+*e*) <sup>t</sup>[x←s] :<sup>L</sup> ES

**Fig. 1.** Type system for CbN evaluation

Judgements. Type judgements are of the form <sup>Γ</sup> (m,e) <sup>t</sup>:<sup>L</sup> or <sup>Γ</sup> (m,e) t: M (noted also (m,e) <sup>t</sup>:<sup>L</sup> and (m,e) t: M, respectively, when Γ is the empty context), where the indices m and e are natural numbers whose intended meaning is that t evaluates to normal form in m multiplicative steps and e exponential steps, with respect to the evaluation strategy associated with the type system.

To make clear in which type systems the judgement is derived, we write <sup>Φ</sup>cbn <sup>Γ</sup> (m,e) t:L if Φ is a derivation in the CbN system ending in the judgement <sup>Γ</sup> (m,e) t:L, and similarly for CbV and CbNeed.

# **4 Types by Name**

In this section we introduce the CbN multi type system, together with intuitions about multi types. We also prove that derivations provide exact bounds on CbN evaluation sequences, and define the induced denotational model.

CbN Types. The system is essentially a reformulation of de Carvalho's system R [23], itself being a type-based presentation of the relational model of the CbN λ-calculus induced by relational model of linear logic via the CbN translation of λ-calculus into linear logic. Definitions:

– CbN linear types are given by the following grammar:

$$\text{CBN LINEAR TYPES} \qquad \qquad L, L' ::= \mathsf{normal} \mid M \multimap L'$$

Multi(-sets) types are defined as in Sect. 3, relatively to CbN linear types. Note the linear constant normal (used to type abstractions, which are normal terms): it plays a crucial role in our quantitative analysis of CbN evaluation.


– The size of a derivation <sup>Φ</sup>cbn <sup>Γ</sup> (m,e) t:L is the sum m + e of the indices. A quick look to the typing rules shows that indices on typing judgements are not needed, as m can be recovered as the number of app rules, and e as the number of ax rules. It is however handy to note them explicitly.

Subtleties and Easy Facts. Let us overview some facts about our presentation of the type system.


**Lemma 1 (Type contexts and variable occurrences for CbN).** Let Φcbn <sup>Γ</sup> (m,e) t:L be a derivation. If x ∈ fv(t) then x /∈ dom(Γ).

Lemma 1 implies that derivations of closed terms have empty type context. Note that there can be free variables of t not in dom(Γ): the ones only occurring in subterms not touched by the evaluation strategy.

Key Ingredients. Two key points of the CbN system that play a role in the design of the CbNeed one in Sect. 6 are:


Tight Derivations. A term may have several derivations, indexed by different pairs (m, e). They always provide upper bounds on CbN evaluation lengths. The interesting aspect of our type systems, however, is that there is a simple description of a class of derivations that provide exact bounds for these quantities, as we shall show. Their definition relies on the normal type constant.

**Definition 1 (Tight derivations for CbN).** A derivation <sup>Φ</sup>cbn <sup>Γ</sup> (m,e) t:L is tight (for CbN) if L = normal and Γ is empty.

Example 2. Let us return to the term t := ((λx.λy.xx)(II))(II) used in Example 1 for explaining the difference in reduction lengths among the different strategies. We now give a derivation for it in the CbN type system.

First, let us shorten normal to n. Then, we define Φ as the following derivation for the subterm λx.λy.xx of t:

$$\begin{array}{l} \text{\$\!\!\!\!\!\!\!\!\!\/\| \,\!\!\!\/\| \,\!\!\!\/\| \,\!\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/\| \,\!\/$$

Now, we need two derivations for II, one of type n, given by Ψ as follows

<sup>z</sup> : [n] (0,1) <sup>z</sup> : <sup>n</sup> ax (0,1) λz.z : [n] n fun (0,0) λw.w : <sup>n</sup> normal (0,0) λw.w : [n] many (1,1) II : <sup>n</sup> app

and one of type [n] n, given by Ξ as follows

z : [[n] <sup>n</sup>] (0,1) <sup>z</sup> : [n] <sup>n</sup> ax (0,1) λz.z : [[n] n] - ([n] <sup>n</sup>) fun <sup>w</sup> : [n] (0,1) <sup>w</sup> : <sup>n</sup> ax (0,1) λw.w : [n] <sup>n</sup> fun (0,1) λw.w : [[n] n] many (1,2) II : [n] <sup>n</sup> app

Finally, we put Φ, Ψ and Ξ together in the following derivation Θ for t = (s(II))(II), where s := λx.λy.xx and n[n] := [n] n

$$\begin{array}{c} \vdots \s \Phi \xright \xmapsto \begin{array}{c} \vdots \y \xright \xmapsto \vdots \x \xmapsto \begin{array}{c} \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \xmapsto \vdots \x \x \xmapsto \vdots \x \xmapsto \vdots \x \x \xmapsto \vdots \x$$

Note that Θ is a tight derivation and the indices (5, 5) correspond to the number of mcbn-steps and ecbn-steps, respectively, from t to its cbn-normal form, as shown in Example 1. Theorem 1 below shows that this is not by chance: tight derivations for CbN are minimal and provide exact bounds to evaluation lengths in CbN.

The next two subsections prove the two halves of the properties of the CbN type system, namely correctness and completeness.

#### **4.1 CbN Correctness**

Correctness is the fact that every typable term is CbN normalising. In our setting it comes with additional quantitative information: the indices m and e of a derivation <sup>Φ</sup>cbn <sup>Γ</sup> (m,e) t:L provide upper bounds on the length of the CbN evaluation of t, that are exact when the derivation is tight.

The proof technique is standard. Moreover, the correctness theorems for CbV and CbNeed in the next sections follow exactly the same structure. The proof relies on a quantitative subject reduction property showing that m decreases by exactly one at each mcbn-step, and similarly for e and ecbn-steps. In turn, subject reduction relies on a linear substitution lemma. Last, correctness for tight derivations requires a further property of normal forms.

Let us point out that correctness is stated with respect to closed terms only, but the auxiliary results have to deal with open terms, since they are proved by inductions (over predicates defined by induction) over the structure of terms.

Linear Substitution. The linear substitution lemma states that substituting over a variable occurrence as in the exponential rule consumes exactly one linear type and decreases of one the exponential index e.

**Lemma 2 (CbN linear substitution).** If <sup>Φ</sup>cbn Γ, x : <sup>M</sup> (m,e) Cx:L then there is a splitting M = [L ] N such that for every derivation Ψ cbn <sup>Π</sup> (m- ,e- ) t:L there is a derivation Φ cbn Γ Π, x : <sup>N</sup> (m+m- ,e+e- <sup>−</sup>1)Ct:L.

The proof is by induction over CbN evaluation contexts.

Quantitative Subject Reduction. A key point of multi types is that the size of type derivations shrinks after every evaluation step, which is what allows to bound evaluation lengths. Remarkably, the size (defined as the sum of the indices) shrinks by exactly 1 at every evaluation step.

**Proposition 2 (Quantitative subject reduction for CbN).** Let Φ cbn <sup>Γ</sup> (m,e) t:L be a derivation.


The proof is by induction on t −→<sup>m</sup>cbn s and t −→<sup>e</sup>cbn s, using the linear substitution lemma for the root exponential step.

Tightness and Normal Forms. Since the indices are always non-negative, quantitative subject reduction (Proposition 2) implies that they bound evaluation lengths. The bound is not necessarily exact, as derivations of normal forms can have strictly positive indices. If they are tight, however, they are indexed by (0, 0), as we now show. The proof of this fact (by induction on the predicate normal) requires a slightly different statement, for the induction to go through.

**Proposition 3 (**normal **typing of normal forms for CbN).** Let t be such that normal(t), and <sup>Φ</sup>cbn <sup>Γ</sup> (m,e) t: normal be a derivation. Then Γ is empty, and so Φ is tight, and m = e = 0.

The Tight Correctness Theorem. The theorem is then proved by a straightforward induction on the evaluation length relying on quantitative subject reduction (Proposition 2) for the inductive case, and the properties of tight typings for normal forms (Proposition 3) for the base case.

**Theorem 1 (CbN tight correctness).** Let t be a closed term. If Φ cbn (m,e) t:L then there is s such that d: t −→<sup>∗</sup> cbns, with normal(s), |d|<sup>m</sup> ≤ m and |d|<sup>e</sup> ≤ e. Moreover, if Φ is tight then |d|<sup>m</sup> = m and |d|<sup>e</sup> = e.

Note that Theorem 1 implicitly states that tight derivations have minimal size among derivations.

#### **4.2 CbN Completeness**

Completeness is the fact that every CbN normalising term has a (tight) type derivation. As for correctness, the completeness theorem is always obtained via three intermediate steps, dual to those for correctness.

Normal Forms. The first step is to prove (by induction on the predicate normal) that every normal form is typable, and is actually typable with a tight derivation.

**Proposition 4 (Normal forms are tightly typable for CbN).** Let t be such that normal(t). Then there is tight derivation <sup>Φ</sup>cbn (0,0)t: normal.

Linear Removal. In order to prove subject expansion, we have to first show that typability can also be pulled back along substitutions, via a linear removal lemma dual to the linear substitution lemma.

**Lemma 3 (Linear removal for CbN).** Let <sup>Φ</sup>cbn Γ, x : <sup>M</sup> (m,e) Cs:L, where x /∈ fv(s). Then there exist


– Type contexts: Γ = Γ Π.

– Indices: (m, e)=(m + m, e + e − 1).

Quantitative Subject Expansion. This property is the dual of subject reduction.

#### **Proposition 5 (Quantitative subject expansion for CbN).** Let Φ cbn <sup>Γ</sup> (m,e) s :L be a derivation.


The proof is by induction on t −→<sup>m</sup>cbn s and t −→<sup>e</sup>cbn s, using the linear removal lemma for the root exponential step.

The Tight Completeness Theorem. The theorem is proved by a straightforward induction on the evaluation length relying on quantitative subject expansion (Proposition 5) in the inductive case, and the existence of tight typings for normal forms (Proposition 4) in the base case.

**Theorem 2 (CbN tight completeness).** Let t be a closed term. If d:t−→<sup>∗</sup> cbns and normal(s) then there is a tight derivation <sup>Φ</sup>cbn (|d|m,|d|e) t: normal.

Back to Erasing Steps. Our system can be easily adapted to measure also garbage collection steps (the CbN erasing rule is just before Example 1). First, a new, third index g on judgements is necessary. Second, one needs to distinguish the erasing and non-erasing cases of the app and ES rules, discriminated by the **0** type. For instance, the ES rules are (the app rules are similar):

$$\frac{\Gamma \vdash^{\{m,c,g\}} t : L \quad \Gamma(x) = \mathbf{0}}{\Gamma \vdash^{\{m,c,g+1\}} t \, ^t[x \gets s] : L} \operatorname{\mathsf{ES}}\_{\text{gc}} \quad \frac{\Gamma, x : M \vdash^{\{m,c,g\}} t : L \quad \Pi \vdash^{\{m',c',g'\}} s : M \quad M \neq \mathbf{0}}{\Gamma \mathbin{\mathsf{t}} \, \Pi \vdash^{\{m+m',c+e',g+g'\}} t \, ^t[x \gets s] : L} \operatorname{\mathsf{ES}}\_{\text{c}}$$

The right premise of rule ESgc has been removed because the only way to introduce **0** is via a many rule with no premises. The index g bounds to the number of erasing steps. In the closed case, however, the bound cannot be, in general, exact. Variables typed with **0** by Γ do not exactly match variables not appearing in the typed term (that is the condition triggering the erasing step), because a variable typed with **0** may appear in the body of abstractions typed with the normal rule, as such bodies are not typed.

It is reasonable to assume that exact bounds for erasing steps can only by provided by a type system characterising strong evaluation, whose typing rules have to inspect abstraction bodies. These erasing typing rules are nonetheless going to play a role in the design of the CbNeed system in Sect. 6.

#### **4.3 CbN Model**

The idea to build the denotational model from the multi type system is that the interpretation (or semantics) of a term is simply the set of its type assignments, i.e. the set of its derivable types together with their type contexts. More precisely, let t be a term and x1,...,x<sup>n</sup> (with n ≥ 0) be pairwise distinct variables. If fv(t) ⊆ {x1,...,x<sup>n</sup>}, we say that the list x = (x1,...,xn) is suitable for t. If x = (x1,...,xn) is suitable for t, the (relational) semantics of t for x is

$$\left[ \left[ t \right]\_{\vec{x}}^{\text{cbN}} := \left\{ \left( \left( M\_1, \ldots, M\_n \right), L \right) \mid \exists \, \Phi \succ\_{\text{cbn}} x\_1 : M\_1, \ldots, x\_n : M\_n \nvdash^{(m,e)} t : L \right\} \right.$$

Subject reduction (Proposition 2) and expansion (Proposition 5) guarantee that the semantics [[t]]CbN <sup>x</sup> of t (for any term t, possibly open) is invariant by CbN evaluation. Correctness (Theorem 1) and completeness (Theorem 2) guarantee that, given a closed term t, its interpretation [[t]]CbN <sup>x</sup> is non-empty if and only if t is CbN normalisable, that is, they imply that relational semantics is adequate.

$$\begin{pmatrix} \overbrace{\begin{subarray}{c} \tau: M \ \mathsf{F}^{(0,1)} \boldsymbol{x} : M \end{subarray}}^{\mathsf{H}} \end{pmatrix} \begin{subarray}{} \mathsf{ax} \\ \qquad \dfrac{\begin{subarray}{c} \Gamma \vdash^{(m,e)} t : [N \ \mathsf{-}\boldsymbol{M}] \quad \Pi \vdash^{(m',e')} s : N \\ \Gamma \uplus \Pi \vdash^{(m+m'+1,e+e')} t \boldsymbol{s} : M \end{subarray}}^{\mathsf{H}\ \mathsf{F}^{(m,e)} \boldsymbol{t} : [N \ \mathsf{F}^{(m',e')} \boldsymbol{s} : N \\ \Gamma \vdash^{(m,e)} \lambda \boldsymbol{x} . t : N \multimap \mathsf{M} \end{pmatrix}} \mathsf{Fun} \\ \begin{pmatrix} \begin{subarray}{c} \Pi\_{i} \mathsf{\mathsf{H}}^{(m\_{i},e\_{i})} \lambda \boldsymbol{x} . t : L\_{i} \boldsymbol{\rangle} \boldsymbol{s} \end{subarray}}{\begin{subarray}{c} \mathsf{H}\_{i} \ \mathsf{H}\_{i} \ \mathsf{\mathsf{H}}^{(m\_{i},e\_{i})} \boldsymbol{\lambda} . t : [L\_{i}] \boldsymbol{\$$

**Fig. 2.** Type system for CbV evaluation.

In fact, adequacy also holds with respect to open terms. The issue in that case is that the characterisation of tight derivations is more involved, see Accattoli, Graham-Lengrand and Kesner's [7]. Said differently, weaker correctness and completeness theorems without exact bounds also hold in the open case. The same is true for the CbV and CbNeed systems of the next sections.

### **5 Types by Value**

Here we introduce Ehrhard's CbV multi type system [34] adapted to our presentation of CbV in the LSC, and prove its properties. The system is similar, and yet in many aspects dual, to the CbN one, in particular the grammar of types is different. Linear types for CbV are defined by:

$$\text{CBV} \text{ LINEAR TYPES} \qquad \qquad L, L' ::= M \to N'$$

Multi(-sets) types are defined as in Sect. 3, relatively to CbV linear types. Note that linear types now have a multi type both as source and as target, and that the normal constant is absent—in CbV, its role is played by **0**.

The typing rules are in Fig. 2. It is a type-based presentation of the relational model of the CbV λ-calculus induced by relational model of linear logic via the CbV translation of λ-calculus into linear logic. Some remarks:


Intuitions: The Empty Type **0**. The empty multi-set type **0** plays a special role in CbV. As in CbN, it is the type of terms that can be erased, but, in contrast to CbN, not every term is erasable in CbV.

In the CbN multi type system every term, even a diverging one, is typable with **0**. On the one hand, this is correct, because in CbN every term can be erased, and erased terms can also be divergent, because they are never evaluated. On the other hand, adequacy is formulated with respect to non-empty types: a term terminates if and only if it is typable with a non-empty type.

In CbV, instead, terms have to be evaluated before being erased; and, of course, their evaluation has to terminate. Thus, terminating terms and erasable terms coincide. Since the multi type system is meant to characterise terminating terms, in CbV a term is typable if and only if it is typable with **0**, as we shall prove in this section. Then the empty type is not a degenerate type excluded for adequacy from the interesting types of a term, as in CbN, it rather is the type, characterising (adequate) typability altogether. And this is also the reason for the absence of the constant normal—one way to see it is that in CbV normal = **0**.

Note that, in particular, in a type judgement Γ t: M the type context Γ may give the empty type to a variable x occurring in t, as for instance in the axiom x : **0** x : **0**—this may seem very strange to people familiar with CbN multi types. We hope that instead, according to the provided intuition that **0** is the type of termination, it would rather seem natural.

**Definition 2 (Tight derivation for CbV).** A derivation <sup>Φ</sup>cbv <sup>Γ</sup> (m,e) t: M is tight (for CbV) if M = **0** and Γ is empty.

Example 3. Let's consider again the term t := ((λx.λy.xx)(II))(II) of Example 1 (where I := λz.z), for which a CbN tight derivation was given in Example 2, and let us type it in the CbV system with a tight derivation.

We define the following derivation Φ<sup>1</sup> for the subterm s := λx.λy.xx of t

$$\begin{array}{l} \overbrace{\begin{subarray}{l} x:[\mathbf{0}\multimap\mathbf{0}]\vdash^{(0,1)}x:[\mathbf{0}\multimap\mathbf{0}]\end{subarray}}&\begin{array}{l} \mathbf{a}\times\ \mathbf{0}\vdash^{(0,1)}x:\mathbf{0}\end{array}}\end{array}\underbrace{\begin{array}{l} \mathbf{a}\times\ \mathbf{0}\vdash^{(0,1)}x:\mathbf{0}\end{array}}\;\mathbf{a}\mathbf{x}}\_{\mathbf{a}\mathbf{p}}\\\hline\begin{subarray}{l} x:[\mathbf{0}\multimap\mathbf{0}]\vdash^{(1,2)}x:\mathbf{0}\\ x:[\mathbf{0}\multimap\mathbf{0}]\vdash^{(1,2)}\lambda y.xx:\mathbf{0}\multimap\mathbf{0}\end{array}}\;\mathbf{f}\mathbf{m}\mathbf{n}\\\begin{subarray}{l} x:[\mathbf{0}\multimap\mathbf{0}]\vdash^{(1,2)}\lambda y.xx:[\mathbf{0}\multimap\mathbf{0}]\\ \hline\vdash^{(1,2)}s:[\mathbf{0}\multimap\mathbf{0}]\multimap\mathbf{0}\end{array}}\;\mathbf{f}\mathbf{m}\mathbf{n}\\\hline\begin{subarray}{l} \mathbf{\!\!\!\!^{(1,2)}}s:[\mathbf{[0\multimap\mathbf{0}]}\multimap\mathbf{0}\multimap\mathbf{0}]\end{subarray}}\end{array}\end{array}\begin{array}{l}\mathbf{a}\mathbf{x}\\\hline\mathbf{a}\mathbf{p}\mathbf{p}\\\hline\mathbf{m}\mathbf{n}\\\mathbf{\!\!\!\!T}\mathbf{m}\mathbf{n}\mathbf{p}\\\hline\mathbf{\!\!\!T}\mathbf{m}\mathbf{n}\mathbf{p}\\\hline\end{array}$$

Note that [**0** - **0**] **0** = [**0** - **0**], which explains the shape of the type context in the conclusion of the app rule. Next, we define the derivation Φ<sup>2</sup> as follows


and the derivation Φ<sup>3</sup> as follows

$$\begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} x': \mathbf{0} \vdash^{(0,1)} x': \mathbf{0} \\ \hline \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \begin{array}{l} \mathbf{a} \\ \mathbf{b} \end{array} \\ \begin{array}{l} \begin{array}{l} \begin{array}{l} \mathbf{0} \end{array} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \mathbf{a} \end{array} \\ \begin{array}{l} \begin{array}{l} \mathbf{m} \text{any} \\ \hline \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \mathbf{m} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \mathbf{a} \end{array} \end{array} \end{array}$$

Finally, we put Φ1, Φ<sup>2</sup> and Φ<sup>3</sup> together in the following derivation Φ for t

$$\begin{array}{c} \begin{array}{ccc} \vdots \Phi\_{1} & \vdots \Phi\_{2} \\ \vdots \end{array} \star \begin{array}{ccc} \vdots \Phi\_{2} \\ \vdots \\ \end{array} \\ \begin{array}{ccc} \vdash^{(1,2)} \begin{array}{c} s \mathrel{\mathop{\begin{array}{c} (\mathbf{0}\mul-\mathbf{0}\mul-\mathbf{0}\end{array} \end{array} \rightarrow \begin{array}{c} \begin{array}{c} \vdots \end{array} \Phi\_{2} \\ \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \vdots \Phi\_{2} \\ \end{array} \\ \begin{array}{c} \begin{array}{c} \vdots \end{array} \Phi\_{3} \\ \end{array} \\ \begin{array}{c} \begin{array}{c} (\mathbf{(\lambda}\mul-\mathbf{x}\boldsymbol{\lambda}\boldsymbol{\mu}.\boldsymbol{xx})(II) : [\mathbf{0}\mul-\mathbf{0}\mul-\mathbf{0}] \end{array} \end{array} \begin{array}{c} \begin{array}{c} \vdots \end{array} \Phi\_{3} \\ \end{array} \\ \begin{array}{c} \begin{array}{c} (\mathbf{(\lambda}\mul-\mathbf{x}\boldsymbol{\lambda}\boldsymbol{\mu}.\boldsymbol{xx})(II))(II) : \mathbf{0} \end{array} \end{array} \end{array} \end{array} \end{array}$$

Note that Φ is a tight derivation and the indices (5, 5) correspond to the number of mcbv-steps and ecbv-steps, respectively, from t to its cbv-normal form, as shown in Example 1. Theorem 3 below shows that this is not by chance: tight derivations for CbV are minimal and provide exact bounds to evaluation lengths in CbV.

Correctness (i.e. typability implies normalisability) and completeness (i.e. normalisability implies typability) of the CbV type system with respect to CbV evaluation (together with quantitative information about evaluation lengths) follow exactly the same pattern of the CbN case, mutatis mutandis.

#### **5.1 CbV Correctness**

**Lemma 4 (CbV linear substitution).** Let <sup>Φ</sup>cbv Γ, x:<sup>M</sup> (m,e) V x: N and v be a value. There is a splitting M = O P such that, for any derivation <sup>Ψ</sup>cbv <sup>Π</sup> (m- ,e- ) v : O, there is a derivation Φ cbv Γ Π, x : <sup>P</sup> (m+m- ,e+e- <sup>−</sup>1)<sup>V</sup> v: <sup>N</sup>.

**Proposition 6 (Quantitative subject reduction for CbV).** Let Φ cbv <sup>Γ</sup> (m,e) t: M be a derivation.


**Proposition 7 (Tight typings for normal forms for CbV).** Let Φ cbv <sup>Γ</sup> (m,e) t: **0** be a derivation, with normalcbv(t). Then Γ is empty, and so Φ is tight, and m = e = 0.

**Theorem 3 (CbV tight correctness).** Let t be a closed term. If Φ cbv <sup>Γ</sup> (m,e) t: M then there is s such that d: t −→<sup>∗</sup> cbvs, with normalcbv(s), |d|<sup>m</sup> ≤ m and |d|<sup>e</sup> ≤ e. Moreover, if Φ is tight then |d|<sup>m</sup> = m and |d|<sup>e</sup> = e.

#### **5.2 CbV Completeness**

**Proposition 8 (Normal forms are tightly typable for CbV).** Let t be such that normalcbv(t). Then there exists a tight derivation <sup>Φ</sup>cbv (0,0)t: **<sup>0</sup>**.

**Lemma 5 (Linear removal for CbV).** Let <sup>Φ</sup>cbv Γ, x : <sup>M</sup> (m,e) V v: N and v be a value, where x /∈ fv(v). Then, there exist

– a multi type M and two type contexts Γ and Π,

– a derivation <sup>Φ</sup> cbv <sup>Γ</sup> (m- ,e- ) v : M and

– a derivation Ψ cbv Π, x : M <sup>M</sup> (m--,e--) V x: N

such that

– Type contexts: Γ = Γ Π,

– Indices: (m, e)=(m + m, e + e − 1).

**Proposition 9 (Quantitative subject expansion for CbV).** Let Φ cbv <sup>Γ</sup> (m,e) t : M be a derivation.


**Theorem 4 (CbV tight completeness).** Let t be a closed term. If d:t −→<sup>∗</sup> cbvs with normalcbv(s), then there is a tight derivation <sup>Φ</sup>cbv (|d|m,|d|e) t: **0**.

CbV Model. The interpretation of terms with respect to the CbV system is defined as follows (where x = (x1,...,xn) is a list of variables suitable for t):

[[t]]CbV <sup>x</sup> := {((M1,...,Mn), N) | ∃Φ cbv <sup>x</sup><sup>1</sup> : <sup>M</sup>1,...,x<sup>n</sup> : <sup>M</sup><sup>n</sup> (m,e) t: N} .

Note that rule fun assigns a linear type but the interpretation considers only multi types. The invariance and the adequacy of [[t]]CbV <sup>x</sup> with respect to CbV evaluation are obtained exactly as for the CbN case.

# **6 Types by Need**

CbNeed as a Blend of CbN and CbV. The multi type system for CbNeed is obtained by carefully blending ingredients from the CbN and CbV ones:


<sup>x</sup>:<sup>M</sup> -(0*,*1)<sup>x</sup> : <sup>M</sup> ax <sup>Γ</sup> -(*m,e*) t: [N - <sup>M</sup>] <sup>Π</sup> -(*m,e*) s : N Γ Π -(*m*+*m*+1*,e*+*e*) ts : M app -(0*,*0)t: **0** many<sup>0</sup> (Π*<sup>i</sup>* -(*mi,ei*) λx.t:L*i*)*<sup>i</sup>*∈*<sup>J</sup>* <sup>J</sup> <sup>=</sup> <sup>∅</sup> - *<sup>i</sup>*∈*<sup>J</sup>* Π*<sup>i</sup>* -( - *<sup>i</sup>*∈*Jmi,* - *<sup>i</sup>*∈*<sup>J</sup> <sup>e</sup>i*) λx.t: [L*i*]*<sup>i</sup>*∈*<sup>J</sup>* many*<sup>&</sup>gt;*<sup>0</sup> Γ, x : <sup>N</sup> -(*m,e*) t: M Γ -(*m,e*) λx.t: N - M fun Γ, x : <sup>N</sup> -(*m,e*) <sup>t</sup>: M Π -(*m,e*) s : N Γ Π -(*m*+*m,e*+*e*) <sup>t</sup>[x←s] : <sup>M</sup> ES -(0*,*0) λx.t: normal normal

**Fig. 3.** Na¨ıve type system for CbNeed evaluation.

It seems then that a type system for CbNeed can easily be obtained by basically adopting the CbV system plus


Therefore, the grammar of linear types is:

CbNeed linear types L, L ::= normal | M -N

Multi(-sets) types are defined as in Sect. 3, relatively to CbNeed linear types. The rules of this na¨ıve system for CbNeed are in Fig. 3.

Issue with the Na¨ıve System. Unfortunately, the na¨ıve system does not work: tight derivations—defined as expected: empty type context and the term typed with [normal]—do not provide exact bounds. The problem is that the na¨ıve blend of ingredients allows derivations of **0** with strictly positive indices m and e. Instead, derivations of **0** should always have 0 in both indices—as is the case when they are derived with a many<sup>0</sup> rule with 0 premises—because they correspond to terms to be erased, that are not evaluated in CbNeed. For any term t, indeed, one can for instance derive the following derivation Φ:

$$\begin{array}{c} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \end{array} \mathsf{many} \,\,\, \mathsf{a}\,\, \mathsf{many} \,\, \, \mathsf{a} \\ \end{array} \\ \begin{array}{l} \begin{array}{l} \begin{array}{l} \end{array} \mathsf{then} \,\, \mathsf{a}\,\, \mathsf{a}\,\, \mathsf{b} \end{array} \end{array} \end{array} \end{array} \end{array} \begin{array}{} \begin{array}{l} \mathsf{many}} \mathsf{any}\_{0} \\ \mathsf{then} \\ \begin{array}{l} \mathsf{then} \\ \end{array} \end{array} \end{array}$$

$$\begin{array}{l} \begin{array}{l} \begin{array}{l} \mathsf{then} \\ \end{array} \end{array} \end{array} \begin{array}{l} \mathsf{then} \\ \mathsf{then} \\ \end{array} \end{array} \begin{array}{l} \begin{array}{l} \mathsf{then} \\ \end{array} \mathsf{many}\_{0} \end{array} \end{array}$$

Note that introducing (0,1) <sup>x</sup> : **<sup>0</sup>** with rule ax rather than via many<sup>0</sup> (the typing context x : **0** is equivalent to the empty type context) would give a derivation with final judgement (1,1) (λx.x)t: **<sup>0</sup>**—thus, the system messes up both indices.

Such bad derivations of **0** are not a problem per se, because in CbNeed one expects correctness and completeness to hold only for derivations of non-empty multi types. However, they do mess up also derivations of non-empty multi types because they can still appear inside tight derivations, as sub-derivations of subterms to be erased; consider for instance:

normal (0,0) <sup>I</sup> : normal many<sup>&</sup>gt;<sup>0</sup> (0,0) <sup>I</sup> : [normal] fun (0,0) λy.I : **<sup>0</sup>** - [normal] many<sup>&</sup>gt;<sup>0</sup> (0,0) λy.I : [**<sup>0</sup>** - [normal]] . . . . Φ (1,0) (λx.x)t: **<sup>0</sup>** app (2,0) (λy.I)((λx.x)t):[normal]

The term normalises in just 1 mneed-step to I[y←(λx.x)t] but the multiplicative index of the derivation is 2. The mismatch is due to a bad derivation of **0** used as right premise of an app rule. Similarly, the induced typing of I[y←(λx.x)t] is an example of a bad derivation used as right premise of a rule ES:

$$\begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \text{normal} \\ \end{array} \end{array} \begin{array}{l} \text{normal} \\ \end{array} \end{array} \begin{array}{l} \begin{array}{l} \text{normal} \\ \end{array} \end{array} \begin{array}{l} \begin{array}{l} \vdots\\ \end{array} \Phi \end{array} \\ \begin{array}{l} \begin{array}{l} \text{normal} \\ \end{array} \end{array} \begin{array}{l} \begin{array}{l} \text{normal} \\ \end{array} \end{array} \begin{array}{l} \begin{array}{l} \vdots\\ \end{array} \Phi \end{array} \end{array} \end{array} \end{array}$$

The Actual Type System. Our solution to such an issue is to modify the system as to avoid derivations of **0** to appear as right premises of rules app and ES. We follow the schema of the rules for counting erasing steps given right after Theorem 2.

Therefore, we add two dedicated rules appgc and ESgc, and constrain the right premise of rules app and ES to have a non-empty type. The system is in Fig. 4 and it is based on the same grammar of types of the na¨ıve system. Note that rules many and ax can still introduce **0**. These **0**s, however, can no longer mess up the indices of tight derivations, as we are going to show.

Note that the indices m and e are incremented and summed exactly as in the CbN and CbV type systems.

**Definition 3 (Tight derivations for CbNeed).** A derivation Φ need <sup>Γ</sup> (m,e) <sup>t</sup>: <sup>M</sup> is tight (for CbNeed) if <sup>M</sup> = [normal] and <sup>Γ</sup> is empty.

Example 4. We return to the term t := ((λx.λy.xx)(II))(II) used in Example 1 and we give it a tight derivation in the CbNeed type system.

Again, we shorten normal to n. Then, we define Ψ as follows

<sup>x</sup> : <sup>M</sup> -(0*,*1) <sup>x</sup> : <sup>M</sup> ax -(0*,*0) λx.t: normal normal Γ, x : <sup>N</sup> -(*m,e*) t: M Γ -(*m,e*) λx.t: N - M fun (Γ*<sup>i</sup>* -(*mi,ei*) λx.t:L*i*)*<sup>i</sup>*∈*<sup>J</sup>* - *<sup>i</sup>*∈*<sup>J</sup>* Γ*<sup>i</sup>* -( - *<sup>i</sup>*∈*Jmi,* - *<sup>i</sup>*∈*<sup>J</sup> <sup>e</sup>i*) λx.t: [L*i*]*<sup>i</sup>*∈*<sup>J</sup>* many Γ -(*m,e*) t: [**0** - M] Γ -(*m*+1*,e*) ts : M appgc Γ -(*m,e*) t: [N - <sup>M</sup>] <sup>Π</sup> -(*m,e*) <sup>s</sup> : N N <sup>=</sup> **<sup>0</sup>** Γ Π -(*m*+*m*+1*,e*+*e*) ts : M app Γ -(*m,e*) t: M Γ(x) = **0** Γ -(*m,e*) <sup>t</sup>[x←s] : <sup>M</sup> ESgc Γ, x : <sup>N</sup> -(*m,e*) <sup>t</sup>: M Π -(*m,e*) <sup>s</sup> : N N <sup>=</sup> **<sup>0</sup>** Γ Π -(*m*+*m,e*+*e*) <sup>t</sup>[x←s] : <sup>M</sup> ES

**Fig. 4.** Type system for CbNeed evaluation.


and, shortening [n] - [n] to [n] [n] , we define Θ as follows


Finally, we put Ψ and Θ together in the following derivation Φ for t

$$\begin{array}{c} \vdots \upparrow \begin{array}{c} \vdots \upparrow \\ \vdots \end{array} \\ \begin{array}{c} \{\mathsf{n}.\lambda\mathsf{p}.\lambda\mathsf{y}.xx:\,[[\mathsf{n}.\mathsf{[n]}]\mathsf{n}]\to\,[\mathsf{0}\mathsf{0}\to[\mathsf{n}]]\} \\ \end{array} \begin{array}{c} \begin{array}{c} \vdots \upparrow \\ \end{array} \\ \end{array} \end{array} \begin{array}{c} \vdots \upparrow \begin{array}{c} \vdots \upparrow \\ \end{array} \\ \end{array} \\ \begin{array}{c} \{\mathsf{n}.\lambda\mathsf{y}.xx\}(II):\,[\mathsf{0}\to[\mathsf{n}]] \\ \hline \end{array} \begin{array}{c} \begin{array}{c} \{\mathsf{n}.\mathsf{[n]}\}(I):\,[\mathsf{n}]^{\mathsf{[n]}}\,[\mathsf{n}]\end{array} \\ \end{array} \begin{array}{c} \mathsf{n}.\mathsf{pp}.\mathsf{[}\\ \mathsf{n}.\mathsf{(}\mathsf{k}.\mathsf{.}\mathsf{y}.xx\text{)}(I):\,[\mathsf{0}\to[\mathsf{n}]] \\ \end{array} \end{array} \begin{array}{c} \mathsf{n}.\mathsf{pp}.\mathsf{[}\mathsf{p}.\mathsf{(}\mathsf{q}.\mathsf{)}\text{)} \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} \mathsf{n}.\mathsf{(}\mathsf{p}.\mathsf{(}\mathsf{q}.\mathsf{)}\text{)}\,[\mathsf{n}.\mathsf{(}\mathsf{q}.\mathsf{)}\text{)} \\ \end{array} \end{array} \begin{array}{c} \mathsf{n}.\mathsf{pp}.\mathsf{(}\mathsf{q}.\mathsf{array}\mathsf{(}\mathsf{q}.\mathsf{)}\text{)} \\ \end{array} \end{array}$$

Note that the indices (4, 4) correspond exactly to the number of mneed-steps and eneed-steps, respectively, from t to its need-normal form—as shown in Example 1—and that Φ is a tight derivation. Forthcoming Theorem 5 shows once again that this is not by chance: tight derivations for CbNeed are minimal and provides exact bounds to evaluation lengths in CbNeed.

Remarkably, the technical development to prove correctness and completeness of the CbNeed type system with respect to CbNeed evaluation follows smoothly along the same lines of the two other systems, mutatis mutandis.

#### **6.1 CbNeed Correctness**

**Lemma 6 (CbNeed linear substitution).** Let <sup>Φ</sup>need Γ, x:<sup>M</sup> (m,e)Ex:<sup>N</sup> and v be a value. There is a splitting M = O P such that for any derivation <sup>Ψ</sup>need <sup>Π</sup> (m- ,e- ) v : O there exists Φ need Γ Π, x : <sup>P</sup> (m+m- ,e+e- <sup>−</sup>1) <sup>E</sup>v: <sup>N</sup>.

**Proposition 10 (Quantitative subject reduction for CbNeed).** Let <sup>Φ</sup>need <sup>Γ</sup> (m,e) <sup>t</sup>: <sup>M</sup> be a derivation such that <sup>M</sup> <sup>=</sup> **<sup>0</sup>**.


Note the condition M = **0** in the statement of subject reduction, that is in contrast to the CbV system but akin to the CbN one. It is due to the way multi types are used as arguments, via rules ESgc and appgc. The restriction is necessary: the CbNeed type system derives (0,1) <sup>x</sup>[x←δδ] : **<sup>0</sup>**, but <sup>x</sup>[x←δδ] is not normalising for CbNeed evaluation. And it is expected, as it amounts to the fact that adequacy holds only with respect to non-empty types, as for CbN, and as stressed when introducing the CbNeed type system. The same restriction appears in Theorem 5, Proposition 13 and Theorem 6 below, for the same reason.

**Proposition 11 (**[normal] **typings for normal forms for CbNeed).** Let <sup>Φ</sup>need <sup>Γ</sup> (m,e) <sup>t</sup>: [normal] be a derivation, with normal(t). Then <sup>Γ</sup> is empty, and so Φ is tight, and m = e = 0.

**Theorem 5 (CbNeed tight correctness).** Let t be a closed term. If Φ need (m,e) <sup>t</sup>: <sup>M</sup> where <sup>M</sup> <sup>=</sup> **<sup>0</sup>**, then there is <sup>s</sup> such that <sup>d</sup>: <sup>t</sup> −→<sup>∗</sup> needs, with normal(s), |d|<sup>m</sup> ≤ m and |d|<sup>e</sup> ≤ e. Moreover, if Φ is tight then |d|<sup>m</sup> = m and |d|<sup>e</sup> = e.

### **6.2 CbNeed Completeness**

**Proposition 12 (Normal forms are tightly typable for CbNeed).** Let t be such that normal(t). Then there is a tight derivation <sup>Φ</sup>need (0,0) <sup>t</sup>: [normal].

**Lemma 7 (Linear removal for CbNeed).** Let <sup>Φ</sup>need Γ, x : <sup>M</sup> (m,e) Ev: N be a derivation and v be a value, with x /∈ fv(v). Then there exist – a multi type M and two type contexts Γ and Π,


#### such that


**Proposition 13 (Quantitative subject expansion for CbNeed).** Let <sup>Φ</sup>need <sup>Γ</sup> (m,e) <sup>s</sup> : <sup>M</sup> be a derivation such that <sup>M</sup> <sup>=</sup> **<sup>0</sup>**. Then,

– Multiplicative: if t −→<sup>m</sup>need s then there is a derivation Φ need <sup>Γ</sup> (m+1,e) <sup>t</sup>: <sup>M</sup>,

– Exponential: if <sup>t</sup> −→<sup>e</sup>need <sup>s</sup> then there is a derivation <sup>Φ</sup> need <sup>Γ</sup> (m,e+1) <sup>t</sup>: <sup>M</sup>.

**Theorem 6 (CbNeed tight completeness).** Let t be a closed term. If d: t −→<sup>∗</sup> need<sup>s</sup> and normal(s) then there exists a tight derivation <sup>Φ</sup>need (|d|m,|d|e) t: [normal].

CbNeed Model. The interpretation [[t]]CbNeed <sup>x</sup> with respect to the CbNeed system is defined as the set (where x = (x1,...,xn) is a list of variables suitable for t):

$$\{ ((M\_1, \ldots, M\_n), N) \mid \exists \Phi \rhd\_{\text{need}} \; x\_1 \colon M\_1, \ldots, x\_n \colon M\_n \; \vdash^{(m,e)} t : N \text{ and } N \neq \mathbf{0} \} \; . $$

Note that the right multi type is required to be non-empty. The invariance and the adequacy of [[t]]CbNeed <sup>x</sup> with respect to CbNeed evaluation are obtained exactly as for the CbN and CbV cases.

# **7 A New Fundamental Theorem for Call-by-Need**

CbNeed Erases Wisely. In the literature, the theorem about CbNeed is the fact that it is operationally equivalent to CbN. This result was first proven independently by two groups, Maraist, Odersky, and Wadler [48], and Ariola and Felleisen [11], in the nineties, using heavy rewriting techniques.

Recently, Kesner gave a much simpler proof via CbN multi types [40]. She uses multi types to first show termination equivalence of CbN and CbNeed, from which she then infers operational equivalence. Termination equivalence means that a given term terminates in CbN if and only if terminates in CbNeed, and it is a consequence of our slogan that CbN and CbNeed both erase wisely.

With our terminology and notations, Kesner's result takes the following form.

**Theorem 7 (Kesner** [40]**).** Let t be a closed term.


Note that, with respect to the other similar theorems in this paper, the result does not cover tight derivations and it does not provide exact bounds. In fact, the CbN system cannot provide exact bounds for CbNeed, because it does provide them for CbN evaluation, that in general is slower than CbNeed. Consider for instance the term t in Example 1 and its CbN tight derivation in Example 2: the derivation provides indices (5, 5) for t (and so t evaluates in 10 CbN steps), but t evaluates in 8 CbNeed steps. Closing such a gap is the main motivation behind this paper, achieved by the CbNeed multi type system in Sect. 6.

CbNeed Duplicates Wisely. Curiously, in the literature there are no dual results showing that CbNeed duplicates as wisely as CbV. One of the reasons is that it is a theorem that does not admit a simple formulation such as operational or termination equivalence, because CbNeed and CbV are not in such relationships. Morally, this is subsumed by the logical interpretation according to which CbNeed corresponds to an affine variant of the linear logic representation of CbV. Yet, it would be nice to have a precise, formal statement establishing that CbNeed duplicates as wisely as CbV —we provide it here.

Our result is that the CbV multi type system is correct with respect to CbNeed evaluation. In particular, the indices (m, e) provided by a CbV type derivation provide bounds for CbNeed evaluation lengths. Two important remarks before we proceed with the formal statement:


CbV Correctness with Respect to CbNeed. Pleasantly, our presentations of CbV and CbNeed make the proof of the result straightforward. It is enough to observe that, since we do not consider garbage collection and we adopt a nondeterministic formulation of CbV, CbNeed is a subsystem of CbV. Formally, if t −→needs then t −→cbvs, as it is easily seen from the definitions (CbNeed reduces only some subterms of applications and ES, while CbV reduces all such subterms). The result is then a corollary of the correctness theorem for CbV.

**Corollary 1 (CbV correctness w.r.t. CbNeed).** Let t be a closed term and <sup>Φ</sup>cbv (m,e) t: M be a derivation. Then there exists s such that d: t −→<sup>∗</sup> needs and normal(s), with |d|<sup>m</sup> ≤ m and |d|<sup>e</sup> ≤ e.

Since the CbNeed system provides exact bounds (Theorem 5), we obtain that CbNeed duplicates as wisely as CbV, when the comparison makes sense, that is, on CbV normalisable terms.

**Corollary 2 (CbNeed duplicates as wisely as CbV).** Let d: t −→<sup>∗</sup> cbvu with normalcbv(u). Then there is d : t −→<sup>∗</sup> needs with normal(s) and |d |<sup>m</sup> ≤ |d|<sup>m</sup> and |d |<sup>e</sup> ≤ |d|e.

# **8 Conclusions**

Contributions. This paper introduces a multi type system for CbNeed evaluation, carefully blending ingredients from multi type systems for CbN and CbV evaluation in the literature. Notably, it is the first type system whose minimal derivations—explicitly characterised—provide exact bounds for evaluation lengths. It also characterises CbNeed termination, and thus its judgements provide an adequate relational semantics.

The technical development is simple, and uniform with respect to those of CbN and CbV multi type systems. The typing rules count evaluation steps following exactly the same schema of the CbN and CbV rules. The proofs of correctness and completeness also follow exactly the same structure.

A further side contribution of the paper is a new fundamental result of CbNeed, formally stating that it duplicates as wisely as CbV. More precisely, the CbV multi type system is (quantitatively) correct with respect to CbNeed evaluation. Pleasantly, our presentations of CbV and CbNeed provide the result for free. This result dualizes the other fundamental theorem stating that CbNeed erases as wisely as CbN, usually formulated as termination equivalence, and recently re-proved by Kesner using CbN multi types [40].

Future Work. Recently, Barenbaum et al. extended CbNeed to strong evaluation [14], and it is natural to try to extend our type system as well. The definition of the system, in particular the extension of tight derivations to that setting, seems however far from being evident. Barembaum, Bonelli, and Mohamed also apply CbN multi types to a CbNeed calculus extended with pattern matching and fixpoints [15], that might be interesting to refine along the lines of our work.

An orthogonal direction is the study of the denotational models of CbNeed. It would be interesting to have a categorical semantics of CbNeed, as well as a categorical way of discriminating our quantitative precise model from the quantitatively lax one given by CbN multi types. It would also be interesting to obtain game semantics of CbNeed, hopefully satisfying a strong correspondence with our multi types in the style of what happens in CbN [30,31,51,56].

A further, unconventional direction is to dualise the inception of the CbNeed type system trying to mix silly duplication from CbN and silly erasure from CbV, obtaining—presumably—a multi types system measuring a perpetual strategy.

**Acknowledgements.** This work has been partially funded by the ANR JCJC grant COCA HOLA (ANR-16-CE40-004-01) and by the EPSRC grant EP/R029121/1 "Typed Lambda-Calculi with Sharing and Unsharing".

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Verifiable Certificates for Predicate Subtyping**

Frederic Gilbert(B)

Inria, Cachan, France frederic.a.gilbert@inria.fr

**Abstract.** Adding *predicate subtyping* to higher-order logic yields a very expressive language in which type-checking is undecidable, making the definition of a system of verifiable certificates challenging. This work presents a solution to this issue with a minimal formalization of predicate subtyping, named PVS-Core, together with a system of verifiable certificates for PVS-Core, named PVS-Cert. PVS-Cert is based on the introduction of proof terms and explicit coercions. Its design is similar to that of PTSs with dependent pairs, with the exception of the definition of conversion, which is based on a specific notion of reduction →<sup>β</sup>∗, corresponding to β-reduction combined with the *erasure of coercions*. The use of this reduction instead of the more standard reduction →βσ allows to establish a simple correspondence between PVS-Core and PVS-Cert. On the other hand, a type-checking algorithm is designed for PVS-Cert, built on proofs of type preservation of →βσ and strong normalization of both →βσ and →<sup>β</sup>∗. Combining these results, PVS-Cert judgements are used as verifiable certificates for predicate subtyping. In addition, the reduction →βσ is used to define a cut elimination procedure for predicate subtyping. This definition provides a new tool to study the properties of predicate subtyping, as illustrated with a proof of consistency.

**Keywords:** Higher-order logic · Predicate subtyping · Type theory · Proof theory

# **1 Introduction**

Extending higher-order logic with *predicate subtyping* yields a very expressive type system, used notably at the core of the proof system PVS [17]. However, proof judgements and typing judgements become entangled in the presence of predicate subtyping, making type-checking undecidable. As a consequence, defining a language of verifiable proofs for predicate subtyping becomes challenging. In pure higher-order logic, complete judgement derivations are too heavy to be used in practice as certificates, but lighter certificates can be produced by removing typing rules, recording deduction rules only: as this approach requires the decidability of type-checking, it doesn't apply directly to predicate subtyping.

This paper presents a new formal language, PVS-Cert, designed to be used as a language of verifiable certificates for predicate subtyping. PVS-Cert is built starting from a minimal formalization of predicate subtyping named PVS-Core, by adding explicit proofs and coercions. PVS-Cert is also equipped with a notion of *cut elimination*, which can be used directly to study both PVS-Cert and PVS-Core meta-theoretical properties.

#### **1.1 Extending Higher-Order Logic with Predicate Subtyping**

Higher-order logic is characterized by the coexistence of *types* and *predicates* as two radically different kinds of attributes to mathematical expressions. For instance, the mathematical expression 1 + 1 can be assigned a type *Nat* expressing that it is a natural number, or a predicate *Even* expressing that it is divisible by two. The assignment of types remains very simple: in particular, type-checking is decidable in higher-order logic. In return, most attributes of mathematical expressions formulated as predicates cannot be formulated as types: for instance, being a natural number different from 0 is expressible as a predicate, but not as a type.

Predicate subtyping allows to recover a symmetrical situation between the expressivity of types and predicates. It is defined as the addition of new types, referred to as *predicate subtypes*. Given a predicate P defined on a domain A (e.g. *Even*, defined on the domain *Nat*), the predicate subtype {<sup>x</sup> : <sup>A</sup> <sup>|</sup> <sup>P</sup>(x)} is defined. An expression t can be assigned this type if and only if it can be assigned the type A and P(t) is provable. For instance, if *Nonzero* is a predicate of domain *Nat* expressing the difference of a natural number from 0, proving *Nonzero*(1) allows to conclude that 1 admits the type {<sup>x</sup> : *Nat* <sup>|</sup> *Nonzero*(x)}.

This augmented expressivity of the language of types permits to exclude many unwanted expressions from reasoning. For instance, defining the denominators domain of Euclidean division as {<sup>x</sup> : *Nat* <sup>|</sup> *Nonzero*(x)}, all divisions in which the denominator is not provably different from zero become ill-typed.

As expressions may have several types, predicate subtyping induces a form of subtyping: for instance, as any expression of type {<sup>x</sup> : *Nat*|*Nonzero*(x)} also admits the type *Nat*, the former can be considered as a subtype of the latter.

As previously mentioned, a major counterpart of this extension of higher-order logic is the fact that typing judgements and proof judgements become entangled. For instance, proving the equality (1/1) = 1 requires that 1 can be assigned the type {<sup>x</sup> : *Nat*|*Nonzero*(x)}, which, in turn, requires to prove *Nonzero*(1). As a direct consequence, type-checking is not decidable in the presence of predicate subtyping.

#### **1.2 Contributions**

**PVS-Core.** Higher-order logic, as well as its extension with predicate subtyping, can be defined in various ways. The first contribution of this paper is the formalization, in Sect. 2, of a minimal system for predicate subtyping, denoted PVS-Core. Besides its minimality, the main design choice for this system is the use of β-equivalence as a conversion relation (or definitional equality).

**PVS-Cert and Its Basic Properties.** Starting from PVS-Core, the second contribution of this work is the formalization, in Sect. 3, of a language of verifiable proofs for PVS-Core. This new language, denoted PVS-Cert, is designed from PVS-Core with the addition of explicit proof terms, formalized as λ-terms, as well as the addition, at the level of expressions, of explicit coercions based on these proof terms. The addition of explicit proof terms follows the Curry-Howard isomorphism in the sense that PVS-Cert proofs terms are typed by their corresponding formulas.

PVS-Cert is an extension of the Pure Type System (PTS) λ-HOL (see for instance [4], where λ-HOL as well as the general notion of PTS are defined). More precisely, PVS-Cert is designed to extend λ-HOL in the same way that PVS-Core extends higher-order logic (denoted HOL in the following). This situation is illustrated in this diagram, where vertical arrows represent extensions and horizontal arrows represent the introduction of explicit proofs (and, in the case of PVS-Core and PVS-Cert, of explicit coercions).

This choice of a PTS-like system is well-suited to describe reasoning modulo β: all steps of β-reduction or β-expansion are kept implicit in proof terms, which allows to keep them compact. As detailed in Sect. 3.3, PVS-Cert is comparable to the formalism of PTSs with dependent pairs. However, conversion in PVS-Cert is neither defined as ≡<sup>β</sup> nor as its extension ≡βσ (see for instance [16]) used in PTSs with dependent pairs: instead, it uses a new conversion relation ≡<sup>β</sup><sup>∗</sup> corresponding to syntactical equality modulo β-reduction *and coercion erasure* (defined in Sect. 3.1). This distinctive definition allows to define a simple correspondence between PVS-Core and PVS-Cert – presented later in Sect. 9.

Basic properties of PVS-Cert are presented in Sect. 4, containing notably the Church-Rosser property for the reduction →<sup>β</sup><sup>∗</sup> underlying the conversion ≡<sup>β</sup>∗, as well as the uniqueness of types: contrary to the case of PVS-Core, a well-typed term admits a unique type up to ≡<sup>β</sup>∗.

As in λ-HOL, well-typed terms are organized according to a stratification, presented in Sect. 5, which includes a class of *types*, a class of *expressions* (containing notably propositions), and a class of *proof terms*. This stratification is at the core of the correspondence between PVS-Cert and PVS-Core.

**Type Preservation and Strong Normalization.** In contrast to the case of the reduction →βσ in PTSs with dependent pairs, →<sup>β</sup><sup>∗</sup> is not a type preserving reduction in PVS-Cert. We prove however in Sect. 6 that →βσ is a type preserving reduction in PVS-Cert (Theorem 6).

In Sect. 7, we present the main ideas leading to a proof of strong normalization for both →<sup>β</sup><sup>∗</sup> and →βσ (Theorem 7) – the details of the proof can be found in the author's PhD dissertation [1]. Moreover, the strong normalization of the type preserving reduction <sup>→</sup>βσ defines a *cut elimination theorem* (Theorem 8). This theorem is used in the remainder of this section to prove the consistency of PVS-Cert. This result is used in turn at the very end of this work to conclude the consistency of PVS-Core, illustrating how cut elimination in PVS-Cert can be used to study the meta-theoretical properties of predicate subtyping.

**Type-Checking in PVS-Cert.** We present in Sect. 8 the design of a typechecking algorithm for PVS-Cert, showing that, contrary to the case of PVS-Core, type-checking is decidable in PVS-Cert. This algorithm is based on the type preservation of →βσ as well as the strong normalization of →<sup>β</sup><sup>∗</sup> and →βσ.

**Using PVS-Cert as a System of Verifiable Certificates for PVS-Core.** The connection between PVS-Core and PVS-Cert is formalized in Sect. 9. On the one hand, a translation from PVS-Cert to PVS-Core is defined through the erasure of coercions. On the other hand, the choice of conversion ≡<sup>β</sup><sup>∗</sup> in PVS-Cert allows to define a very simple translation from PVS-Core derivations to PVS-Cert derivable judgements (Definition 7 and Theorem 11).

These translations are used in Sect. 10 together with the PVS-Cert typechecking algorithm to define how to use PVS-Cert judgements as verifiable certificates for PVS-Core, reaching the first purpose of this paper. Such certificates are much lighter than the PVS-Core *derivations* represented through them, as they only require to record one single judgement.

Last, the translations between PVS-Core and PVS-Cert are exploited to transpose the consistency property, established in PVS-Cert using cut elimination, to PVS-Core. This illustrates how the PVS-Cert cut elimination theorem can be used to study both PVS-Cert and PVS-Core meta-theoretical properties.

#### **1.3 Related Works**

The most important related work is the author's PhD dissertation [1], which contains detailed versions of all proofs presented in this paper.

The introduction of predicate subtyping can be traced back to the first-order language OBJ2 [9] and its *sort constraints*, allowing to restrict some typing relations to the satisfaction of a predicate. This idea was later refined and combined with higher-order logic in the proof system PVS, which is one of the most important systems based on predicate subtyping. Overviews of the PVS specification language and its use of predicate subtyping are given for instance in [17] and [20].

In the present work, the issue of the undecidability of predicate subtyping is handled with the introduction of an alternative system, PVS-Cert. An alternative approach to this issue is to weaken the definition of predicate subtyping sufficiently to obtain systems in which type-checking remains decidable. This approach has been followed in [13,19]. A intermediary situation is followed in [15], in which predicate subtyping is weakened sufficiently to allow for run-time type-checking verifications. However, contrary to the case of PVS, predicate subtyping is not fully represented in these different systems.

As mentioned in the previous section, PVS-Cert is an adaptation of the formalism of Pure Type Systems (PTSs) – sometimes also referred to as Generalized Type Systems (GTSs) –, presented for instance in [4]. The definition of PTSs is itself the result of several successive works, including notably [3,7,11,24–26]. More specifically, PVS-Cert is derived from the notion of PTSs *with dependent pairs*, which has its roots in the system ECC [16]. A subsystem of PVS-Cert, named PVS-Cert− and presented in Sect. 3, corresponds directly to a fragment of ECC (PVS-Cert− is the system obtained from PVS-Cert by replacing ≡β<sup>∗</sup> by the standard conversion ≡βσ of PTSs with dependent pairs). PVS-Cert<sup>−</sup> is also comparable to the notion of *subset types* in Coq [5]. However, contrary to PVS-Cert, PVS-Cert<sup>−</sup> and subset types are not well-suited to reflect predicate subtyping, as conversion in these systems does not reflect conversion in PVS-Core – more precisely, Proposition 5 doesn't hold with ≡βσ.

Another important related work is [8], in which two systems are presented: ICCΣ, a type system with *implicit* type constructions, and AICCΣ, a system obtained from ICC<sup>Σ</sup> by adding *explicit* coercions. ICC<sup>Σ</sup> contains several advanced features, including a generalization of predicate subtypes. The construction of PVS-Cert from PVS-Core follows the same idea as the construction of AICC<sup>Σ</sup> from ICCΣ: adding the missing information explicitly in the terms of the language to recover the decidability of type-checking. The main difference between the two approaches lies in the complexity of the respective languages. ICC<sup>Σ</sup> is a very rich and complex language, making its analysis difficult – in particular, strong normalization in ICC<sup>Σ</sup> is kept as a conjecture, on which the decidability of type-checking itself relies. Conversely, PVS-Core is designed as a minimal language including predicate subtyping, making its analysis simpler.

A variant of predicate subtyping was also formalized as an extension of the calculus of constructions in [22]. As in the present work, this presentation contains two systems connected with each other. On the one hand, it includes one system, named Russell, which is comparable to a weakened version of PVS-Core in which a term <sup>t</sup> of type <sup>A</sup> admits the type {<sup>x</sup> : <sup>A</sup> <sup>|</sup> <sup>P</sup>} even when <sup>P</sup>[t/x] is not provable. In this variant of predicate subtyping named *subset equivalence*, type-checking is decidable. On the other hand, this work includes a system with explicit coercions which is comparable to PVS-Cert. Contrary to PVS-Core, Russell derivations are not intended to contain all information necessary to build complete terms with explicit coercions: instead, a translation producing incomplete terms in the system with explicit coercions is presented. This system allows to write programs and specifications together in Russell, and to prove their correctness in a second step by filling all proof holes produced through the translation, in a way which is similar to the functioning of PVS.

Contrary to the case of PVS-Core and Russell, PVS-Cert and the counterpart of Russell with explicit coercions have similar characteristics. Although its theoretical properties are not formalized, this latter system is presented as a simple extension of the proof-irrelevant type theory presented in [27]. There exists indeed a tight connection between proof irrelevance and PVS-Cert: if one considers for instance the usual predicate *Even* on natural numbers expressing divisibility by two, the predicate subtype *even* <sup>=</sup> {<sup>x</sup> : *Nat* <sup>|</sup> *Even*(x)}, and two expressions with explicit coercions 2, p*even* and 2, q*even* of this type with <sup>p</sup> and <sup>q</sup> two proofs of *Even*(2), then the hypothesis of proof irrelevance ensures that the expressions 2, p*even* and 2, q*even* are convertible, as does the choice of conversion relation ≡β<sup>∗</sup> in PVS-Cert.

This relation between proof irrelevance and predicate subtyping is explored further in [27]. Besides the fact that this work is based on the calculus of constructions and besides some technical differences in the precise definition of conversion between the system presented in this paper and PVS-Cert, analyzing the strong relation between these two systems appears as a very interesting future work. In particular, it would provide a possible strategy for building a proof of strong normalization for this system from the proof of strong normalization presented in Sect. 7. Also following the relation between proof irrelevance and predicate subtyping, the system IITT presented in [2], which is equipped with explicit occurrences of irrelevant terms, also admits some similarities with PVS-Cert. However, it is restricted to predicative type theory, in which higher-order reasoning cannot be expressed.

Another important work carried out on predicate subtyping is the presentation of a *formal semantics* for PVS in [18]. This work defines, for some fragment of the PVS language including predicate subtyping but also other features such as *parametric theories*, set-theoretical interpretations of types and expressions. These interpretations are limited to *standard* interpretations: the interpretation of a function type is the set of all functions from the interpretation of the domain to the interpretation of the co-domain, and the interpretation of the type of propositions is a set containing exactly two elements, distinguishing *true* propositions from *false* ones. Such an approach is complementary to the presented paper, which is only focused on the distinction between *provable* propositions and *unprovable* ones. As a possible future work, it would be interesting to adapt the work presented in [18] to obtain a notion of *standard model* for PVS-Core.

# **2 PVS-Core: A Minimal Extension of HOL with Predicate Subtyping**

This section is dedicated to the first contribution of this work: the formalization of a minimal system for predicate subtyping. This system is named PVS-Core, in reference to PVS [17]. The main distinctive design choice for PVS-Core is the introduction of a conversion relation (or definitional equality), corresponding to β-equivalence.

#### **2.1 Definitions**

**Variables and Terms.** We first define a set of **variables** V as the disjoint union of two infinite countable sets of symbols Vexpressions and Vtypes. We introduce the generic notation v or w to refer to a variable in general, as well as the following specific notations:


Then, we define a set of **terms** as the disjoint union of the three following sets. The last two are defined together recursively.


*Remark 1.* There is no formal distinction between the expressions denoted t or u and the expressions denoted P or Q, as all of them refer to expressions in general. Yet, in the following, the notations P and Q will be often used to refer to expressions admitting the type P rop, also referred to as *formulas* or *propositions*.

#### **Declarations, Contexts, Judgements.** We define:


We use the notation DV (Γ) to refer to the set of variables declared in a context <sup>Γ</sup>: for instance, DV (P, x : A, X : T ype) = {x, X}.

**Reduction.** We equip PVS-Core terms with the usual β-reduction. In the following, we use the notation <sup>β</sup> for the reduction of a <sup>β</sup>-redex, <sup>→</sup><sup>β</sup> for the context closure of β, <sup>β</sup> for the reflexive transitive closure of →β, and ≡<sup>β</sup> for the symmetric closure of <sup>β</sup>, i.e. β-conversion.

#### **Derivation Rules.** The rules of PVS-Core are the following:

#### **Well-formed contexts**

$$\begin{array}{cc} \hline \begin{array}{c} \hline \mathcal{D} \vdash WF \\ \hline \end{array} & \begin{array}{c} \begin{array}{c} \Gamma \vdash WF\\ \Gamma, X \mathrel{\,\,} \mathrel{\,\,} \vdash WF \end{array} \end{array} & \begin{array}{c} \begin{array}{c} \Gamma \vdash WF\\ \Gamma, X \mathrel{\,\,} \mathrel{\,\,} \vdash WF \end{array} \end{array} \begin{array}{c} X \in \mathcal{V}\_{type} \,\bigvee \,\mathcal{V}(\varGamma) \quad \text{TypeDec} \\\\ \hline \end{array} \\\\ \begin{array}{c} \Gamma \vdash A \mathrel{\,\,} \vdash Type \\ \Gamma, x \mathrel{\,\,} \mathrel{\,\,} \vdash WF \end{array} \end{array} \end{array}$$

#### **Well-formed types**

Γ - *WF* (<sup>X</sup> : T ype) <sup>∈</sup> <sup>Γ</sup> TypeVar <sup>Γ</sup> - X : T ype Γ - *WF* Prop <sup>Γ</sup> - P rop : T ype Γ, x : A - <sup>B</sup> : T ype Pi <sup>Γ</sup> - Πx : A.B : T ype Γ, x : A - <sup>P</sup> : P rop Subtype <sup>Γ</sup> -{x : A | P} : T ype

#### **Well-typed expressions**

Γ - *WF* (<sup>x</sup> : <sup>A</sup>) <sup>∈</sup> <sup>Γ</sup> EltVar <sup>Γ</sup> x : A Γ <sup>t</sup> : {<sup>x</sup> : <sup>A</sup> <sup>|</sup> <sup>P</sup>} SubtypeElim1 <sup>Γ</sup> t : A Γ, x : A t : B Lam <sup>Γ</sup> λx : A.t : Πx : A.B Γ t : Πx : A.B Γ <sup>u</sup> : <sup>A</sup> App <sup>Γ</sup> tu : B[u/x] Γ, x : A - <sup>P</sup> : P rop Forall <sup>Γ</sup> - ∀x : A.P : P rop Γ, P - <sup>Q</sup> : P rop Imply <sup>Γ</sup> - P ⇒ Q : P rop Γ t : A Γ - P[t/x] Γ - {<sup>x</sup> : <sup>A</sup> <sup>|</sup> <sup>P</sup>} : T ype SubtypeIntro <sup>Γ</sup> t : {x : A | P} Γ t : A Γ - <sup>B</sup> : T ype <sup>A</sup> <sup>≡</sup><sup>β</sup> <sup>B</sup> TypeConversion Γ t : B **Deductions** Γ - *WF* <sup>P</sup> <sup>∈</sup> <sup>Γ</sup> Axiom <sup>Γ</sup> - P Γ - P Γ - <sup>Q</sup> : P rop <sup>P</sup> <sup>≡</sup><sup>β</sup> <sup>Q</sup> PropConversion Γ - Q Γ, P - <sup>Q</sup> ImplyIntro <sup>Γ</sup> - P ⇒ Q Γ - P ⇒ Q Γ - P ImplyElim <sup>Γ</sup> - Q Γ, x : A - P ForallIntro <sup>Γ</sup> - ∀x : A.P Γ - ∀x : A.P Γ <sup>t</sup> : <sup>A</sup> ForallElim <sup>Γ</sup> -P[t/x]

$$\frac{\Gamma \vdash t : \{x : A \mid P\}}{\Gamma \vdash P[t/x]} \text{ \textbf{S}\textbf{U}\textbf{B}\textbf{T}\textbf{Y}\textbf{E}\textbf{L}\textbf{M}\textbf{M}\textbf{2}}$$

#### **2.2 A Minimal System Expressing Predicate Subtyping**

Predicate subtyping is expressed in PVS-Core with the term construction {<sup>x</sup> : <sup>A</sup> <sup>|</sup> <sup>P</sup>} and the following rules:


The system obtained from PVS-Core by removing the construction {<sup>x</sup> : <sup>A</sup> <sup>|</sup> <sup>P</sup>} and these four rules is a formulation of constructive higher-order logic. In particular, the types of this subsystem correspond to the expected simple types: for any type of the form Πx : A.B in this subsystem, x cannot appear free in B, hence this type is a non-dependent function type. As a consequence, the rule TypeConversion can be safely removed from this subsystem to obtain a simpler but equivalent formulation of higher-order logic.

PVS-Core is a minimal constructive system, which can be extended with classical reasoning or extensionality principles through the addition of axioms.

The rule PropConversion allows to consider reasoning *modulo* β, which will be useful in the definition of PVS-Core to keep proof terms compact. The rule TypeConversion is its counterpart at the level of types, allowing to consider typing *modulo* β as well.

# **3 PVS-Cert: Verifiable Certificates for PVS-Core**

This section is dedicated to the presentation of an alternative system, PVS-Cert, which will be used to achieve the purpose of the work: defining a language of verifiable certificates for predicate subtyping.

At first glance, there is no need to introduce any new system to design PVS-Core certificates: the language of PVS-Core derivations itself is a language of verifiable proofs for PVS-Core. However, this language is heavy as many parts of PVS-Core derivations contain unnecessary or redundant information. As a comparison, in higher-order logic, as type-checking is decidable, only the deduction rules need to be recorded.

The main idea in the definition of PVS-Cert as a language of certificates for predicate subtyping is to formalize proofs as new kinds of terms, in addition to the types and expressions which are already present in PVS-Core, and to introduce explicit coercions based on these proof terms in order to ensure the decidability of type-checking. As a consequence, a complete certificate is simply the typing judgement of some proof term with its corresponding theorem. Such certificates are much lighter than PVS-core derivations, as only one single judgement is recorded.

Moreover, PVS-Cert will be equipped (in Sect. 7) with a definition of *cut elimination*, defined as a computation rule on proof terms.

# **3.1 Definitions**

As detailed further in Sect. 3.2, the definition of PVS-Cert is strongly related to the formalism of PTSs, presented for instance in [4].

**Terms.** We define:

– **Sorts** <sup>S</sup> <sup>=</sup> {P rop, T ype, Kind}

We use the notation s to refer to a sort.


**Contexts, Judgements.** We define:


As in PVS-Core, set of variables declared in a context Γ is denoted DV (Γ).

**Reduction.** The main specificity of PVS-Cert is the use of a distinctive notion of reduction and conversion. In addition to the usual β-redex reduction (λv : T.M)N <sup>β</sup> <sup>M</sup>[N/v], we introduce a new reduction relation ∗, defined with the following rules:

– M1, M2<sup>T</sup> <sup>∗</sup> <sup>M</sup><sup>1</sup> – <sup>π</sup>1(M) <sup>∗</sup> <sup>M</sup>

We denote the union of <sup>β</sup> and <sup>∗</sup> as <sup>β</sup>∗. As in the definition of PVS-Core, we use the notation <sup>→</sup><sup>β</sup><sup>∗</sup> for the context closure of <sup>β</sup>∗, <sup>β</sup><sup>∗</sup> for the reflexive transitive closure of →<sup>β</sup>∗, and ≡<sup>β</sup><sup>∗</sup> for the symmetric closure of β∗.

The new relation ∗, which can be interpreted as the elimination of a coercion at the head of a term, allows the expression of predicate subtyping in PVS-Cert. More detailed motivations and justifications for this definition are given in Sect. 3.3.

**Derivation Rules.** The rules of PVS-Cert are defined as follows:

Empty <sup>∅</sup> - *WF* Γ - <sup>T</sup> : <sup>s</sup> <sup>v</sup> ∈ Vs\DV (Γ) Decl Γ, v : <sup>T</sup> - *WF* Γ - *WF* (<sup>v</sup> : <sup>T</sup>) <sup>∈</sup> <sup>Γ</sup> Var <sup>Γ</sup> v : T Γ - M : T Γ - <sup>U</sup> : <sup>s</sup> <sup>T</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>U</sup> Conversion Γ - M : U Γ - *WF* (s1, s2) ∈ A Sort <sup>Γ</sup> s<sup>1</sup> : s<sup>2</sup> Γ - T : s<sup>1</sup> Γ, v : T - <sup>U</sup> : <sup>s</sup><sup>2</sup> (s1, s2, s3) ∈ R Prod <sup>Γ</sup> - Πv : T.U : s<sup>3</sup> Γ, v : T - M : U Γ - Πv : T.U : <sup>s</sup> Lam <sup>Γ</sup> λv : T.M : Πv : T.U Γ - M : Πv : T.U Γ - <sup>N</sup> : <sup>T</sup> App <sup>Γ</sup> - MN : U[N/v] Γ - T : T ype Γ, v : T - <sup>U</sup> : P rop Subtype <sup>Γ</sup> - {v : T | U} : T ype Γ - M : T Γ - N : U[M/v] Γ - {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} : T ype Pair <sup>Γ</sup> - M,N{v:<sup>T</sup> <sup>|</sup>U} : {v : T | U} Γ - <sup>M</sup> : {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} Proj1 <sup>Γ</sup> π1(M) : T Γ - <sup>M</sup> : {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} Proj2 <sup>Γ</sup> π2(M) : U[π1(M)/v]

### **3.2 An Extension of** *λ***-HOL**

PVS-Cert is an extension of the PTS λ-HOL (see for instance [4]). More precisely, <sup>λ</sup>-HOL can be obtained from PVS-Cert by removing the term constructions {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>}, <sup>π</sup>i(M), and M,N<sup>T</sup> , removing the rules Subtype, Pair, Proj1, and Proj2, and replacing ≡β<sup>∗</sup> by ≡<sup>β</sup> in the Conversion rule.

As PTS-like systems, the formalism of PVS-Cert allows to describe reasoning modulo β: all steps of β-reduction or β-expansion in reasoning are kept implicit, which allows to keep proof terms compact, making PVS-Cert more scalable. Moreover, the choice of formalization of PVS-Cert as a PTS-like system allows to transpose some PTS properties to PVS-Cert, such as the thinning property and the substitution property mentioned in the next section. It also allows to describe this system using a small number of rules in comparison with PVS-Core, making the proof of certain expected properties of PVS-Cert lighter.

The well-typed terms of PVS-Cert are classified into the same classes as in the case of λ-HOL, involving a class of *types*, a class of *expressions*, and a class of *proof terms*. This property is presented in Sect. 5, and referred to as *stratification*.

#### **3.3 Expressing Predicate Subtyping**

The expression of predicate subtyping in PVS-Cert is enlightened through the *stratification*: indeed, in any derivable judgement,


As mentioned in the introduction, this formalism used to express predicate subtyping is very similar to the formalism of dependent pairs, used for instance in the type system ECC [16]. More precisely, the terms {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} are comparable with types of dependent pairs (usually denoted Σv : T.U), the terms M,N<sup>T</sup> are comparable with dependent pairs, and the terms πi(M) are comparable with projections.

The only difference between PVS-Cert and the formalism of dependent pairs lies in the choice of conversion ≡<sup>β</sup>∗: in the case of a system with dependent pairs, ≡<sup>β</sup><sup>∗</sup> is replaced by the more standard conversion ≡βσ. This conversion is defined from the usual reduction <sup>π</sup>iM1, M2<sup>T</sup> <sup>σ</sup> <sup>M</sup>i. We define the relations βσ, <sup>→</sup>βσ, βσ, and <sup>≡</sup>βσ in a similar way to the definitions of <sup>β</sup>∗, <sup>→</sup><sup>β</sup>∗, <sup>β</sup>∗, and ≡<sup>β</sup>∗.

Applied to types or expressions, the conversion ≡<sup>β</sup><sup>∗</sup> includes the more standard conversion ≡βσ (this property is a direct consequence of Theorem 5 together with the Church-Rosser property of →βσ). However, this inclusion is strict: for instance, it is not difficult to find two well-typed terms M,N1<sup>T</sup> and M,N2<sup>T</sup> which are not convertible using ≡βσ, although they are convertible using ≡<sup>β</sup>∗.

As a direct consequence of this property, PVS-Cert is an extension of the system obtained from it by replacing ≡β<sup>∗</sup> by ≡βσ, and this extension is strict. In this paper, this subsystem will be referred to as PVS-Cert−. It is a PTS with dependent pairs, and corresponds more precisely to the system obtained from the PTS λ-HOL by adding the single dependent pair rule (T ype, P rop, T ype). It is strictly included in the type system ECC presented in [16].

An mentioned in the introduction, this choice of a strictly more flexible conversion allows to define a very simple translation from PVS-Core derivations to PVS-Cert derivable judgements. Indeed, using ≡<sup>β</sup><sup>∗</sup> ensures that two PVS-Cert types (resp. expressions) are convertible as long as the corresponding types (resp. expressions) in PVS-Core are also convertible, which allows to define a very simple translation from PVS-Core derivations to PVS-Cert derivable judgements (Definition 7 and Theorem 11).

The reduction →<sup>β</sup><sup>∗</sup> underlying conversion does not preserve typing: for instance, the judgement <sup>x</sup> : P rop, h : <sup>x</sup> x, h<sup>T</sup> : <sup>T</sup> with <sup>T</sup> <sup>=</sup> {<sup>y</sup> : P rop <sup>|</sup> <sup>y</sup>} is derivable, and x, h<sup>T</sup> <sup>→</sup><sup>β</sup><sup>∗</sup> <sup>x</sup>, but <sup>x</sup> : P rop, h : <sup>x</sup> <sup>x</sup> : <sup>T</sup> is not derivable. However, as presented in Sect. 6, the reduction →βσ is type preserving, and will be used both as a definition of cut elimination for PVS-Cert proofs (Sect. 7) and in the definition of a type checking-algorithm (Sect. 8).

# **4 Properties of PVS-Cert**

One of the most important properties satisfied by PVS-Cert is the Church-Rosser property.

**Theorem 1 (Church-Rosser for** <sup>→</sup><sup>β</sup>∗**).** *Whenever* <sup>M</sup><sup>1</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>M</sup>2*, there exists* <sup>N</sup> *such that* M<sup>1</sup> <sup>β</sup><sup>∗</sup> <sup>N</sup> *and* <sup>M</sup><sup>2</sup> <sup>β</sup><sup>∗</sup> <sup>N</sup>*.*

*Proof.* <sup>T</sup> equipped with <sup>→</sup><sup>β</sup><sup>∗</sup> is an orthogonal combinatory reduction system (as defined in [14]), as rules are left-linear and non-overlapping. As proved in [14], such a system admits the Church-Rosser property.

In the case of PTSs, the Church-Rosser property of →<sup>β</sup> is at the core of the type preservation of →β. In the case of PVS-Cert, the situation is different, as →<sup>β</sup><sup>∗</sup> is not a type preserving reduction. However, in a first step, the Church-Rosser property of →<sup>β</sup><sup>∗</sup> will be used to establish the expected stratification theorem, presented in Sect. 5. In a second step, the Church-Rosser property of →<sup>β</sup><sup>∗</sup> will be used again together with the stratification theorem to establish the type preservation of an alternative reduction, →βσ, used both as a definition of cut elimination (Sect. 7) and at the core of the definition of a type-checking algorithm (Sect. 8).

Another important property of PVS-Cert used to design a type-checking algorithm is the uniqueness of types modulo conversion. As presented in Sect. 8, this property allows – together with the decidability of ≡<sup>β</sup><sup>∗</sup> on well-typed terms – to reduce the problem of *type-checking* to a problem of *type inference*. This property also underlines the fact that, even though PVS-Cert is designed to reflect predicate subtyping, it doesn't admit any subtyping itself. The proof of type uniqueness is standard, and does not involve any specific difficulty.

**Theorem 2 (Uniqueness of types).** *If two judgements* <sup>Γ</sup> <sup>M</sup> : <sup>T</sup><sup>0</sup> *and* <sup>Γ</sup> <sup>M</sup> : <sup>T</sup><sup>1</sup> *are derivable, then* <sup>T</sup><sup>0</sup> <sup>≡</sup>β<sup>∗</sup> <sup>T</sup>1*.*

PVS-Cert also satisfies several other standard properties expected from PTSs and PTSs extended with dependent pairs, among which thinning and substitution, described for instance in [4], as well as context conversion, described for instance in [21], which is based on the extension of conversion to contexts. In these three cases, the corresponding proofs are straightforwardly adapted from the case of PTS.

We end this section with the following important theorem, which also holds in λ-HOL. The proof is adapted from the case of λ-HOL and does not involve any specific difficulty.

**Theorem 3.** *If* <sup>Γ</sup> <sup>M</sup> : <sup>T</sup> *is derivable and* <sup>T</sup> = Kind*, there exists a sort* s *such that* <sup>Γ</sup> <sup>T</sup> : <sup>s</sup>*.*

# **5 Stratification in PVS-Cert**

The stratification of terms in PVS-Cert reveals a strong link between PVS-Cert and PVS-Core (defined in Sect. 9), in the same way that the stratification of terms in λ-HOL reveals its link with higher-order logic. The property of stratification holds for several other systems, such as the injective PTSs presented in [11] – in this paper, PTSs are referred to as GTSs, and this result is referred to as classification.

The main lemma used to establish such a result is the fact that, whenever the rule of conversion is used in some derivation, the two terms involved in the conversion belong to the same class of terms. The simplest way to prove this result is to choose classes of terms that are stable under reduction and to conclude using the Church-Rosser theorem. In the case of injective PTSs, these classes are specific classes of well-typed terms, and the stability under reduction follows from the type preservation of →β.

However, as mentioned in Sect. 3.3, type preservation does not hold for →β<sup>∗</sup> in PVS-Cert. For this reason, we will choose a relaxed definition of stratified terms, where the different classes are not restricted to well-typed terms. Using this relaxed definition, it will be possible to prove, even in the absence of type preservation for →<sup>β</sup>∗, that most classes of stratified terms are stable by reduction with →<sup>β</sup>∗.

We first present three classes of terms: **types**, **expressions**, and **proofs**. The expected property of stability by reduction will only be proved for types and expressions (Proposition 1), which is not problematic as the conversion rules are never directly applied to proofs in valid derivations.

### **Definition 1 (Variables stratification).** *We introduce the notations:*


**Definition 2 (Stratified terms).** *We define stratified terms as follows.*


*Remark 2.* As in the case of PVS-Core (Remark 1), there is no formal distinction between the notations t, u, P, and Q although, in the following, the notations of expressions P, Q will be preferred for expressions of type P rop.

The most important remark on the definition of stratified terms is the fact that any pair t, M<sup>A</sup> (where <sup>t</sup> is an expression and <sup>A</sup> is a type) is accepted as a correct expression: the term M used in it can be arbitrary, and in particular it is not required to be a proof term. This choice is due to the fact that proofs are not stable by <sup>→</sup><sup>β</sup>∗: for instance, (λh : x.h)<sup>y</sup> is a proof, but <sup>y</sup> is not. Hence, compared to the alternative of restricting pairs to terms of the form t, pA, the present relaxed definition is necessary to ensure the stability of types and expressions under →<sup>β</sup>∗, which is formalized in the following proposition – the proof does not involve any specific difficulty, as the definitions of types and expressions are designed to satisfy this property.

**Proposition 1.** *Whenever* <sup>M</sup> <sup>→</sup><sup>β</sup><sup>∗</sup> <sup>N</sup> *and* <sup>M</sup> *is a type (resp. an expression), so is* N*.*

Beyond its use in the proof of the stratification theorem (Theorem 4), this stability property is also directly useful in the proof of the strong normalization theorem for →<sup>β</sup><sup>∗</sup> and →βσ, as briefly mentioned in Sect. 7.

Finally, we present the expected stratification theorem, based on the following definitions.

### **Definition 3 (Stratified contexts, stratified judgements).** *We define*



### **Theorem 4 (Stratification).** *Any derivable judgement is stratified.*

*Proof.* The proof is straightforward by induction on the derivation. In the case of Conversion, Proposition 1 and the Church-Rosser property of →<sup>β</sup><sup>∗</sup> are used together to conclude that the two convertible terms are either both expressions, both types, both T ype, or both Kind. Basic stability properties of types and expressions under substitution are also involved in the cases Proj2 and App. They are proved directly by induction.

# **6 A Type Preserving Reduction**

Contrary to the case of PTSs (resp. PTSs with dependent pairs), in which →<sup>β</sup> (resp. →βσ) is a type preserving reduction, →β<sup>∗</sup> is not a type preserving reduction in PVS-Cert. Instead, we present in this section the type preservation of the reduction →βσ in PVS-Cert. This reduction will be used both as a definition of cut elimination for PVS-Cert proofs (Sect. 7) and in the type-checking algorithm (Sect. 8).

The specificity of this proof of type preservation compared to similar results for PTSs lies in the fact that <sup>M</sup> <sup>→</sup>βσ <sup>N</sup> does not imply <sup>M</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>N</sup> in general. However, this implication always holds if M is either a type or an expression – the corresponding proof involves no particular difficulty.

**Theorem 5.** *Whenever* <sup>M</sup> <sup>→</sup>βσ <sup>N</sup> *and* <sup>M</sup> *is a type (resp. an expression), so is* <sup>N</sup>*, and* <sup>M</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>N</sup>*.*

Finally, the type preservation theorem for →βσ is the following.

**Theorem 6.** *Given a derivable judgement* <sup>Γ</sup> <sup>M</sup> : <sup>T</sup>*, and* <sup>N</sup> *such that* <sup>M</sup> <sup>→</sup>βσ <sup>N</sup>*, the judgement* <sup>Γ</sup> <sup>N</sup> : <sup>T</sup> *is derivable.*

*Proof.* The proof is done by induction on the derivation. The situations where M βσ N and the cases where M βσ N are separated. We present here one case for each situation – the full proof can be found in the author's PhD dissertation [1].

– We illustrate the situation where <sup>M</sup> βσ N with the case of the rule Prod, which involves Theorem 5. Discarding the notations of the original statement, we describe the last inference step with the following new notations:

$$\frac{\begin{array}{c} \begin{array}{c} T \vdash T \mathrel{\mathop{:}} \ s\_{1} \end{array} \quad \begin{array}{c} T, v:T \vdash U: s\_{2} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} s\_{1}, s\_{2}, s\_{3} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} T \mathrel{\mathop{:}} \ \text{Rron} \end{array} \end{array} \end{array} \end{$$

If the reduction occurs in U, we conclude directly by induction hypothesis. If the reduction occurs in <sup>T</sup>, we write <sup>T</sup> <sup>→</sup>βσ <sup>T</sup> . By induction hypothesis, <sup>Γ</sup> <sup>T</sup> : <sup>s</sup><sup>1</sup> is derivable. By the stratification theorem, <sup>v</sup> ∈ V<sup>s</sup><sup>1</sup> , hence Γ, v : <sup>T</sup> *WF* is derivable using the Decl rule. By the stratification theorem and Theorem 5, <sup>T</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>T</sup> . Hence, using the second premise and context conversion (mentioned in Sect. 4), Γ, v : <sup>T</sup> <sup>U</sup> : <sup>s</sup><sup>2</sup> is derivable. Finally, using Prod, <sup>Γ</sup> Πv : <sup>T</sup> .U : s<sup>3</sup> is derivable.

– We illustrate the situation where M βσ N with the case of the rule Proj1. As M is a first projection and M βσ N, M is a σ-redex. We replace the notation <sup>M</sup> and <sup>T</sup> of the original statement by <sup>π</sup>1M,N<sup>T</sup> βσ <sup>M</sup> and <sup>T</sup> . In this setting, the last inference step has the following form:

$$\frac{\Gamma \vdash \langle M, N \rangle\_T : \{v : T' \mid U' \}}{\Gamma \vdash \pi\_1 \langle M, N \rangle\_T : T'} \text{PROJ}\_1$$

Analyzing the derivation of the premise (and more precisely the last rule different from Conversion used in it, which is necessarily Pair), we conclude that <sup>T</sup> has the form {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} where {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup> } ≡<sup>β</sup><sup>∗</sup> {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} and <sup>Γ</sup> M,N<sup>T</sup> : {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} admits a derivation ending with an inference step of the form

$$\frac{\Gamma \vdash M : T'' \qquad \Gamma \vdash N : U''[M/v] \qquad \Gamma \vdash \{v : T'' \mid U''\} : Type\\\frac{\Gamma \vdash \{v : T'' \mid U''\} \qquad \text{PAIR}}{\Gamma \vdash \langle M, N \rangle\_T : \{v : T'' \mid U''\}} \text{ PAR}$$

We derive the expected judgement <sup>Γ</sup> <sup>M</sup> : <sup>T</sup> from the first premise of this latter derivation using conversion. For this, we need to prove <sup>T</sup> <sup>≡</sup>β<sup>∗</sup> <sup>T</sup> and to derive <sup>Γ</sup> <sup>T</sup> : <sup>s</sup> for some <sup>s</sup>. These two requirements are proved as follows. On the one hand, we establish <sup>T</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>T</sup> from {<sup>v</sup> : <sup>T</sup> <sup>|</sup> <sup>U</sup>} ≡<sup>β</sup><sup>∗</sup> {<sup>v</sup> : <sup>T</sup> <sup>|</sup> U } using the Church-Rosser property (Theorem 1). On the other hand, by the stratification theorem, <sup>T</sup> = Kind, hence we can use Theorem 3 on the original conclusion to establish that <sup>Γ</sup> <sup>T</sup> : <sup>s</sup> is derivable for some sort <sup>s</sup>, as expected.

# **7 Strong Normalization and Cut Elimination**

This section is dedicated to the strong normalization of both →βσ and →<sup>β</sup><sup>∗</sup> on welltyped PVS-Cert terms. These two reductions will be used separately in Sect. 8 to define a type-checking algorithm for PVS-Cert: more precisely, the reduction →<sup>β</sup><sup>∗</sup> is used to decide whether two well-typed terms are convertible with ≡<sup>β</sup>∗, while the type preserving reduction →βσ will be used in the type-checking of applications. Moreover, the strong normalization of →βσ combined with its type preservation property provides a cut elimination theorem, which is a powerful tool to study properties of both PVS-Cert and PVS-Core. Its use is illustrated in a proof of consistency of PVS-Cert (Theorem 9), used in turn to establish the consistency of PVS-Core (Theorem 12) at the end of this paper.

#### **7.1 Strong Normalization**

A direct approach to prove the strong normalization of →βσ and →<sup>β</sup><sup>∗</sup> for welltyped terms would be to prove the strong normalization for well-typed terms of their union, referred to as →βσ∗. Unfortunately, this reduction is not strongly terminating on well-typed terms, as shown in the following proposition.

**Proposition 2.** *There exists a well-typed term admitting an infinite reduction using* <sup>→</sup>βσ∗*.*

*Proof.* We first define two well-typed terms M and N such that MN admits an infinite reduction. It is simple to find two such terms, using the fact that PVS-Cert is an extension of System F [12]. For instance:


Using these terms, we build the expected counter-example of normalization of →βσ<sup>∗</sup> as follows:


Because of Proposition 2, we keep the expected strong normalization theorem in PVS-Cert formulated as follows.

**Theorem 7 (Strong normalization).** *For any derivable judgement* <sup>Γ</sup> <sup>M</sup> : <sup>T</sup>*,* <sup>M</sup> *is strongly normalizing under both* <sup>→</sup>βσ *and* <sup>→</sup>β∗*:*


The proof of this theorem is left out of the scope of this paper. It is detailed in the author's PhD dissertation [1]. We simply highlight here some of its specificities, which illustrate the consequences of the choice, in PVS-Cert, of a conversion relation which is not based on a type-preserving reduction.


### **7.2 Cut Elimination in PVS-Cert**

The following cut elimination theorem is a direct corollary of the strong normalization theorem and the type preservation of →βσ.

**Theorem 8 (Cut elimination).** *Whenever some PVS-Cert judgement of the form* <sup>Γ</sup> <sup>p</sup> : <sup>P</sup> *is derivable for some proposition* <sup>P</sup> *and some proof* <sup>p</sup>*,* <sup>p</sup> *can be reduced using the reduction* <sup>→</sup>βσ *to a normal form* <sup>q</sup> *such that the judgement* <sup>Γ</sup> <sup>q</sup> : <sup>P</sup> *is derivable.*

*Proof.* By the strong normalization theorem, p can be reduced to a normal form <sup>q</sup> using the reduction <sup>→</sup>βσ. By the type preservation theorem (Theorem 6), the judgement <sup>Γ</sup> <sup>q</sup> : <sup>P</sup> is derivable.

We conclude this section showing how the cut elimination theorem can be used together with the properties of terms in normal form with respect to →βσ as a tool to analyze some meta-theoretical properties of PVS-Cert. As presented at the end of this work, this approach will also allow to use cut elimination in PVS-Cert to analyze some meta-theoretical properties of PVS-Core. This use of cut elimination is illustrated with the following proof of consistency.

**Theorem 9.** *PVS-Cert is consistent: there exists no proof term* <sup>p</sup> *such that* <sup>p</sup> : Πx : P rop.x *is derivable.*

We use the following notion of *elimination context* in the proof:

**Definition 4 (Elimination contexts).** *We define the set of elimination contexts* <sup>E</sup> *with the grammar* <sup>e</sup> := • <sup>|</sup> <sup>π</sup>i(e) <sup>|</sup> e M*.*

*For any term* N *we define the instantiation* e[N] *by*

•[N] = N πi(e)[N] = πi(e[N]) (eM)[N]=(e[N])M

*Proof (Theorem 9).* We suppose that there exists a proof p such that the judgement <sup>p</sup> : Πx : P rop.x admits some derivation, and find a contradiction in the following way. Using the thinning property (mentioned in Sect. 4), <sup>x</sup> : P rop <sup>p</sup> : Πx : P rop.x is also derivable. Hence, applying the rule Lam followed by the rule App, λx : P rop.(px) : Πx : P rop.x is derivable.

By the cut elimination Theorem 8, λx : P rop.(px) admits a normal form λx : P rop.q with respect to βσ, which is such that the judgement λx : P rop.q : Πx : P rop.x is derivable.

Considering the last rule different from Conversion used in such a derivation (which is necessarily Lam), and using the stratification theorem, there exists a derivable judgement <sup>x</sup> : P rop <sup>q</sup> : <sup>t</sup> for some expression <sup>t</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>x</sup>. Hence, using Conversion, <sup>x</sup> : P rop <sup>q</sup> : <sup>x</sup> is also derivable. We consider <sup>D</sup> a possible derivation of this judgement.

As q is a proof and is in normal form with respect to βσ, we conclude from a careful case analysis that q has one of the following forms: λv : T.M or e[v]. We discard the first possibility as follows. If q = λv : T.M, considering the last rule different from Conversion used in D (which is necessarily Lam), there exists some term of the form Πv : T .U such that Πv : T .U <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>x</sup>. By the Church-Rosser property (Theorem 1), this conversion cannot hold. As a consequence, q has the form e[v] for some elimination context e and some variable v.

Considering the last rule different from Conversion, Proj1, Proj2, or App used in <sup>D</sup> (which is necessarily Var), some judgement of the form <sup>x</sup> : P rop <sup>v</sup> : <sup>T</sup> is derivable, and <sup>v</sup> <sup>=</sup> <sup>x</sup>. As <sup>q</sup> is a proof, <sup>e</sup>[x] = <sup>q</sup> = x. Hence, D admits some subderivation of a judgement of the form <sup>x</sup> : P rop xt : <sup>T</sup> or <sup>x</sup> : P rop <sup>π</sup>i(x) : T . Considering the last rule different from Conversion in such a derivation, and using the uniqueness of types (Theorem 2), this implies that there exists a term U

of the form Πv : <sup>T</sup>1.T<sup>2</sup> or {v : <sup>T</sup><sup>1</sup> <sup>|</sup> <sup>T</sup>2} such that <sup>U</sup> <sup>≡</sup>β<sup>∗</sup> P rop. By the Church-Rosser property (Theorem 1), this conversion cannot hold. As a consequence, there exists no proof term <sup>p</sup> such that the judgement <sup>p</sup> : Πx : P rop.x is derivable.

# **8 Type-Checking in PVS-Cert**

The purpose of this section is to present the main ideas leading to the definition of a type-checking algorithm for PVS-Cert. The decidability of type-checking is one of the most important results expected for PVS-Cert. In particular, it will be used in Sect. 10 together with the translation from PVS-Core derivations to PVS-Cert established in Sect. 9 to show that PVS-Cert judgements can be used as verifiable certificates for PVS-Core.

This algorithm is mainly based on the type preservation Theorem 6 and the strong normalization Theorem 7 presented in the previous sections. In this section, we will only focus on the main specificities of the algorithm. Its precise definition, as well as the proofs of its soundness, termination, and completeness can be found in the author's PhD dissertation [1].

The algorithm is comparable to the algorithm presented in [6] for the general case of injective PTSs (which applies to λ-HOL). Besides the fact that our algorithm is extended to handle predicate subtypes, coercionsM,N<sup>T</sup> and projections <sup>π</sup>i(M), the main difference between the two is the use of both reductions <sup>→</sup><sup>β</sup><sup>∗</sup> and →βσ in the case of PVS-Cert, while only →<sup>β</sup> is used for injective PTSs.

On the one hand, →<sup>β</sup>∗-normalization is used to check ≡<sup>β</sup>∗-conversion on welltyped terms: by the Church-Rosser property and strong normalization, two welltyped terms are ≡<sup>β</sup>∗-equivalent if and only if they admit the same normal form, which is unique. As in [6], this decision procedure for conversion on well-typed terms is used in turn together with the uniqueness of types (Theorem 2) to define type-checking from type inference, which is itself defined recursively.

*Remark 3.* In order to avoid redundant context well-formedness verifications in the multiple recursive calls of the type inference algorithm, we choose here to check the well-formedness of a context Γ beforehand when inferring a type for some term M in Γ. For this reason, type inference and type-checking are defined in two steps. First, we define auxiliary type inference and type-checking algorithms which are only ensured to operate soundly with well-formed contexts. Then, we use these auxiliary functions to define context well-formedness verification as well as complete type inference and type-checking algorithms, which operate soundly with any context.

On the other hand, →βσ is used in type inference to handle applications:

$$\frac{\Gamma \vdash M : \Pi v : T\_1.T\_2 \qquad \Gamma \vdash N : T\_1}{\Gamma \vdash MN : T\_2[N/v]} \text{ App}$$

In this situation, the recursive call on the first premise may produce a term <sup>U</sup> such that <sup>Γ</sup> <sup>M</sup> : <sup>U</sup> is derivable, but <sup>U</sup> is not ensured to have the form Πv : U1.U<sup>2</sup> – counterexamples can be easily found when M is a proof and U is a proposition. The usual solution to this issue, used e.g. in [6], is to reduce U using the reduction underlying conversion (or more specifically its restriction to weak head reduction, which is more economic): indeed, using the uniqueness of types as well as strong normalization, type preservation, and the Church-Rosser property, it can be proved that a term U will be obtained, that M admits the type U , and that U has the form Πv : U1.U<sup>2</sup> if M admits a type of this form.

However, in the case of PVS-Cert, this approach cannot be followed directly, as the reduction underlying conversion, which is <sup>→</sup>β∗, is not type preserving: <sup>U</sup> is not necessary a valid type for M. For this reason, we use instead the type preserving reduction →βσ (again, we use more specifically its restriction to weak head reduction, which is more economic). Using the strong normalization theorem, this operation terminates and yields some term U. As a direct corollary of type preservation (based on Theorems 3 and 5), M admits the type U. What is left is to prove that U has the form Πv : U1.U<sup>2</sup> if M admits a type of this form, which is done as follows. If <sup>M</sup> admits a type of the form Πv : <sup>T</sup>1.T2, then <sup>U</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> Πv : <sup>T</sup>1.T<sup>2</sup> by the uniqueness of types. Hence, analyzing the possible forms of the weak head normal form U and using the Church-Rosser property, we conclude that U has the form Πv : U1.U2, as expected.

Compared to [6], new cases must be added for predicate subtypes, coercions M,N<sup>T</sup> , and projections <sup>π</sup>i(M). These cases are handled in a similar way as in the case of PTSs with dependent pairs (see for instance ECC [16]), and don't involve any specific difficulty. Instead, a more distinctive specificity of the algorithm lies in the case of λ-abstraction:

$$\frac{\Gamma, v:T \vdash M:U \qquad \Gamma \vdash \Pi v:T.U:s}{\Gamma \vdash \lambda v:T.M:\Pi v:T.U} \text{ LAM}$$

As in the case of injective PTSs studied in [6], applying a recursive call on this second premise would be problematic. On the one hand, it would make the algorithm slower. On the other hand, it would break the simplicity of the proof of termination, based on the fact that recursive calls of type inference are done on subterms exclusively.

A general solution for this issue, applicable to any injective PTSs, is presented in [6] using some classification of terms to avoid this unwanted recursive call. The solution selected for PVS-Cert follows the same approach, adapted to the stratified terms of PVS-Cert. It relies on a classifying algorithm Level(·), which ensures that whenever M is either an expression, a type, T ype, or Kind, then Level(M) is either 1, 2, 3, or 4 respectively. As it is specifically suited to PVS-Cert, this definition is simpler than the classification presented in [6], which is intended to be applicable to a wide family of type systems. The algorithm is defined as follows:

**Definition 5.** *We define the algorithm* Level(·) *by recursion on its argument. The possible cases are the following.*


# **9 Expressing PVS-Core in PVS-Cert**

The final purpose of PVS-Cert is to encode PVS-Core derivations as PVS-Cert judgements, and to use the type-checking algorithm presented in Sect. 8 to use these judgements as verifiable certificates. In this perspective, we define a correspondence between PVS-Core and PVS-Cert. This correspondence reflects the fact that, even though these two systems are very different at the level of terms and judgements, they are almost identical at the level of derivations.

#### **9.1 An Erasing Function from PVS-Cert to PVS-Core**

We begin the description of this correspondence with a translation from PVS-Cert to PVS-Core, referred to as *erasing*. This translation mainly consists in the erasure of PVS-Cert explicit coercions ·, M<sup>A</sup> and <sup>π</sup>1(·).

**Definition 6.** *We define an erasure function* -· *from PVS-Cert expressions, types, and* T ype *to PVS-Core terms recursively as follows.*


*Then, we extend straightforwardly* -· *from PVS-Cert stratified contexts to PVS-Core contexts: for instance,* -P, x : A, X : T ype = -P, x : -A, X : T ype*.*

*Last, we extend straightforwardly* -· *from all PVS-Cert stratified judgements except those of the form* <sup>Γ</sup> T ype : Kind *to PVS-Core judgements. For instance,* <sup>x</sup> : A, X : T ype <sup>p</sup> : <sup>P</sup> <sup>=</sup> <sup>x</sup> : -<sup>A</sup>, X : T ype -P*. The PVS-Cert judgements of the form* <sup>Γ</sup> T ype : Kind *are not translated.*

By the stratification theorem in PVS-Cert, all PVS-Cert derivable judgements are stratified judgements. Hence, unless they have the form <sup>Γ</sup> T ype : Kind, their erasure in PVS-Core is well-defined. We will prove in Theorem 10 that they are derivable in PVS-Core. This theorem relies in particular on the fact that conversion in PVS-Cert and PVS-Core are related through the erasure function -·, established in the following proposition. The corresponding proof does not involve any specific difficulty.

**Proposition 3.** *For all terms* M *and* N *which are either expressions, types, or* T ype*, whenever* <sup>M</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>N</sup>*, then* -<sup>M</sup> <sup>≡</sup><sup>β</sup> -N*.*

Using the two previous propositions and the stratification theorem in PVS-Cert, we conclude the following theorem, which allows to map PVS-Cert derivations to PVS-Core derivations.

**Theorem 10.** *Every derivable PVS-Cert judgement either has the form* <sup>Γ</sup> T ype : Kind *or admits an image through* -·*. In the latter case, this image is derivable in PVS-Core.*

*Proof.* The first part of the proof is a direct consequence of the stratification theorem. The second part is proved by induction on the height of PVS-Cert derivations. All cases are straightforward, using the stratification theorem when necessary to establish a correspondence between stratified versions of PVS-Cert rules and PVS-Core rules. For instance:


#### **9.2 Expressing PVS-Core Derivations as PVS-Cert Judgements**

Theorem 10 shows that a PVS-Cert derivable judgement can testify to the PVS-Core derivability of another judgement: its erasure. In this section, we show conversely that, given any PVS-Core derivation, we can build such a PVS-Cert judgement. For this purpose, we first present an algorithm Certificate, which translates a PVS-Core derivation into a PVS-Cert judgement. In a second step, we will prove that such PVS-Cert judgements are always derivable in PVS-Cert.

**Definition 7.** *For any PVS-Core derivation* D*, we define recursively the PVS-Cert stratified judgement* Certificate(D) *such that* -Certificate(D) *corresponds to the conclusion of* D*.*

*In this definition, we use an injective function* <sup>h</sup>(·) *mapping natural numbers to PVS-Cert proof variables, which can be chosen arbitrarily. We present two cases:* Assumption*, which shows how* <sup>h</sup>(·) *is used, and* ImplyElim*. This latter case (as well as* ForallElim*) is more complex than others as it involves the computation of a normal form with respect to* ∗*, i.e. the erasure of coercions at the head of a term. The other cases are detailed in the author's PhD dissertation* [1]*.*

$$- \quad \frac{\Gamma \vdash P : Prop}{\Gamma, P \vdash W\\F} \text{Assumption}$$

*We consider* <sup>D</sup><sup>1</sup> *the derivation of* <sup>Γ</sup> <sup>P</sup> : P rop*.* Certificate(D1) *has the form* <sup>Γ</sup><sup>1</sup> <sup>P</sup><sup>1</sup> : P rop*. We consider* <sup>n</sup> *the number of declarations of the form* (<sup>h</sup> : <sup>Q</sup>) *in* <sup>Γ</sup>1*, and we define* Certificate(D) = <sup>Γ</sup>1, h(n) : <sup>P</sup><sup>1</sup> *WF . –* <sup>Γ</sup> <sup>P</sup> <sup>⇒</sup> Q Γ <sup>P</sup> ImplyElim <sup>Γ</sup> <sup>Q</sup>

*We consider* <sup>D</sup><sup>1</sup> *and* <sup>D</sup><sup>2</sup> *the respective derivations of* <sup>Γ</sup> <sup>P</sup> <sup>⇒</sup> <sup>Q</sup> *and* <sup>Γ</sup> <sup>P</sup>*.* Certificate(D2) *has the form* <sup>Γ</sup><sup>2</sup> <sup>p</sup><sup>2</sup> : <sup>P</sup><sup>2</sup> *and* Certificate(D1) *has the form* <sup>Γ</sup><sup>1</sup> <sup>p</sup><sup>1</sup> : <sup>Q</sup> <sup>1</sup>*. As* -Q <sup>1</sup> = (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>)*, its normal form with respect to* <sup>∗</sup> *has the form* Πh : <sup>P</sup>1.Q1*. We define* Certificate(D) = <sup>Γ</sup><sup>1</sup> <sup>p</sup>1p<sup>2</sup> : <sup>Q</sup>1[p2/h]*. As all proof terms are deleted through the erasure function,* -Q1[p2/h] = -Q1*. On the other hand, by induction hypothesis,* -Q1 = Q*, hence the erasure of this judgement is* <sup>Γ</sup> <sup>Q</sup>*, as expected.*

### **9.3 Relating Conversion in PVS-Core and PVS-Cert**

In order to prove that the outputs of the algorithm Certificate are derivable in PVS-Cert (presented in Theorem 11), the main required lemma is the fact that is the converse of Proposition 3: for any terms M and N which are either expressions, types, or T ype and which verify -<sup>M</sup> <sup>≡</sup><sup>β</sup> -<sup>N</sup>, then <sup>M</sup> <sup>≡</sup>β<sup>∗</sup> <sup>N</sup>. More precisely, this property will be used in the proof of Theorem 11 to handle the cases of conversion rules TypeConversion and PropConversion.

We first establish a modified version of this expected result, using equality and ≡<sup>∗</sup> instead of ≡<sup>β</sup> and ≡<sup>β</sup><sup>∗</sup> respectively. The proof is straightforward by induction on the two involved terms.

**Proposition 4.** *For all terms* M *and* N *which are either expressions, types, or* T ype*, whenever* -M = -<sup>N</sup>*, then* <sup>M</sup> <sup>≡</sup><sup>∗</sup> <sup>N</sup>*.*

Then, we establish the expected converse of Proposition 3 as follows.

**Proposition 5.** *For all terms* M *and* N *which are either expressions, types, or* T ype*, whenever* -<sup>M</sup> <sup>≡</sup><sup>β</sup> -<sup>N</sup>*, then* <sup>M</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>N</sup>*.*

*Proof.* We present a proof based on the definition of a simple translation of PVS-Core terms as PVS-Cert expressions, types, or Type, which does not introduce any explicit coercion: for instance,

– [Πx : A.B] = Πx : [A].[B]

– [<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>] = Πh : [P].[Q] for an arbitrary proof variable <sup>h</sup>

We first show straightforwardly that the respective images through [·] of two terms related by ≡<sup>β</sup> are also related by ≡β. As a consequence, [-<sup>M</sup>] <sup>≡</sup><sup>β</sup> [-N].

On the other hand, it is straightforward to show that [·] is a right inverse of the erasure function -·. Hence, -[-M] = -M. By Proposition 4, we conclude that [-<sup>M</sup>] <sup>≡</sup><sup>∗</sup> <sup>M</sup>. Following the same reasoning, [-<sup>N</sup>] <sup>≡</sup><sup>∗</sup> <sup>N</sup>.

As a consequence, <sup>M</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> [-<sup>M</sup>] <sup>≡</sup><sup>β</sup><sup>∗</sup> [-<sup>N</sup>] <sup>≡</sup><sup>β</sup><sup>∗</sup> <sup>N</sup>.

#### **9.4 Soundness of the Synthesis of Certificates**

The last proposition needed to prove the soundness of the algorithm Certificate is the following. It shows that the operation of normalization through <sup>∗</sup> (which erases the coercions <sup>π</sup>1(·) and ·, M<sup>T</sup> at the head of a term) is safely used in the definition of Certificate.

**Proposition 6.** *For any derivable PVS-Cert judgement of the form* <sup>Γ</sup> <sup>t</sup> : {xn...{x<sup>1</sup> : P rop <sup>|</sup> <sup>Q</sup>1}... <sup>|</sup> <sup>Q</sup>n}*, if* <sup>t</sup> *admits a normal form with respect to* <sup>∗</sup> *which has the form* Πv : M.T*, then* <sup>Γ</sup> Πv : M.T : P rop *is derivable.*

In fact, only the specific case n = 0 is used in the proof of soundness of Certificate, but this generalization is preferred as it admits a direct proof by induction on t, which does not involve any specific difficulty.

Last, we present the expected soundness property for Certificate:

**Theorem 11.** *For any PVS-Core derivation* D*,* Certificate(D) *is derivable in PVS-Cert.*

*Proof.* The proof is done by induction on D. Most cases are proved without any specific difficulty. In particular, the cases of conversion rules TypeConversion and PropConversion are straightforward using Proposition 5.

The most complex cases correspond to the rules ImplyElim and ForallElim which involve, by definition of Certificate, some normalization with respect to ∗. In such cases, Proposition <sup>6</sup> is used to handle the specific difficulties related to this normalization. We present the case ImplyElim:

$$\frac{\Gamma \vdash P \Rightarrow Q \qquad \Gamma \vdash P}{\Gamma \vdash Q} \text{ IMPLYELIM}$$

We consider <sup>D</sup><sup>1</sup> and <sup>D</sup><sup>2</sup> the respective derivations of <sup>Γ</sup> <sup>P</sup> <sup>⇒</sup> <sup>Q</sup> and <sup>Γ</sup> <sup>P</sup>. Certificate(D2) has the form <sup>Γ</sup><sup>2</sup> <sup>p</sup><sup>2</sup> : <sup>P</sup><sup>2</sup> and Certificate(D1) has the form <sup>Γ</sup><sup>1</sup> <sup>p</sup><sup>1</sup> : <sup>Q</sup> <sup>1</sup>. As -Q <sup>1</sup> = (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>), its normal form with respect to <sup>∗</sup> has the form Πh : <sup>P</sup>1.Q1. In this setting, Certificate(D) = <sup>Γ</sup><sup>1</sup> <sup>p</sup>1p<sup>2</sup> : <sup>Q</sup>1[p2/h]. By induction hypothesis, <sup>Γ</sup><sup>1</sup> <sup>p</sup><sup>1</sup> : <sup>Q</sup> <sup>1</sup> and <sup>Γ</sup><sup>2</sup> <sup>p</sup><sup>2</sup> : <sup>P</sup><sup>2</sup> are derivable in PVS-Cert. By Proposition <sup>3</sup> and the stratification theorem, <sup>Γ</sup><sup>1</sup> <sup>Q</sup> <sup>1</sup> : P rop is derivable in PVS-Cert. Hence, by Proposition 6, <sup>Γ</sup><sup>1</sup> Πh : <sup>P</sup>1.Q<sup>1</sup> : P rop is derivable as well. As Q <sup>1</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> Πh : <sup>P</sup>1.Q1, we conclude applying the Conversion rule that <sup>Γ</sup><sup>1</sup> <sup>p</sup><sup>1</sup> : Πh : <sup>P</sup>1.Q<sup>1</sup> is derivable.

On the other hand, using Proposition 4, we can conclude from -Γ1 = Γ = -Γ2 that <sup>Γ</sup><sup>1</sup> <sup>≡</sup><sup>∗</sup> <sup>Γ</sup><sup>2</sup> as long as both contexts admit the list of declared proof variables, in the same order. This is the case as, by straightforward induction on PVS-Core derivations, this list is <sup>h</sup>(1), h(2), ..., h(n), where <sup>h</sup>(·) is the injective function used in the definition of Certificate and n is the number of proof variable declarations in <sup>Γ</sup><sup>1</sup> and <sup>Γ</sup>2. Hence, <sup>Γ</sup><sup>1</sup> <sup>≡</sup><sup>∗</sup> <sup>Γ</sup>2.

As <sup>Γ</sup><sup>1</sup> <sup>p</sup><sup>1</sup> : Πh : <sup>P</sup>1.Q<sup>1</sup> is derivable, by Theorem <sup>3</sup> and the stratification theorem, <sup>Γ</sup><sup>1</sup> Πh : <sup>P</sup>1.Q<sup>1</sup> : P rop is derivable. Hence, considering the last rule different from Conversion used in such a derivation (which is necessarily Prod), and using the stratification theorem, <sup>Γ</sup><sup>1</sup> <sup>P</sup><sup>1</sup> : P rop is derivable as well. As a consequence, using context conversion (mentioned in Sect. 4), <sup>Γ</sup><sup>1</sup> <sup>p</sup><sup>2</sup> : <sup>P</sup><sup>1</sup> is derivable in PVS-Cert. Hence, applying the rule App, <sup>Γ</sup><sup>1</sup> <sup>p</sup>1p<sup>2</sup> : <sup>Q</sup>1[p2/h] is derivable, as expected.

# **10 Using PVS-Cert as a System of Verifiable Certificates for PVS-Core**

This final section shows how to use the different results presented in this paper to answer to the main question addressed in the current work: defining a system of verifiable certificates for PVS-Core.

A PVS-Cert judgement <sup>Γ</sup> <sup>p</sup> : <sup>P</sup> can be used as a certificate for its PVS-Core erasure -<sup>Γ</sup> -P (Definition 6), which is verifiable using the type-checking algorithm presented in Sect. 8. On the one hand, this approach is sound: whenever the type-checking algorithm succeeds, <sup>Γ</sup> <sup>p</sup> : <sup>P</sup> is derivable in PVS-Cert, hence -<sup>Γ</sup> -P is derivable in PVS-Core by Theorem 10.

On the other hand, valid certificates can be generated for arbitrary PVS-Core theorems in the following way. Given some PVS-Core judgement <sup>Δ</sup> <sup>Q</sup> derivable through some derivation D, the PVS-Cert judgement Certificate(D) can be used as a certificate of <sup>Δ</sup> <sup>Q</sup>. Indeed, using the notations <sup>Γ</sup> <sup>p</sup> : <sup>P</sup> for Certificate(D), the following statements hold.


These PVS-Cert certificates represent PVS-Core derivations in a very compact way. As each of the different constructions of types, expressions, and proofs in PVS-Cert corresponds to some PVS-Core derivation rule, the size of a PVS-Cert certificate is comparable, as a rough estimation, with the size of a corresponding PVS-Core derivation in which all PVS-Core judgements are deleted.

We finally show that, through the construction of certificates, the PVS-Cert cut elimination theorem can be used to study meta-theoretical properties of PVS-Core. This possible use is illustrated with the case of consistency, proved in PVS-Cert in Theorem 9 using cut elimination.

**Theorem 12.** *The system PVS-Core is consistent: the judgement* ∀<sup>x</sup> : P rop.x *is not derivable.*

*Proof.* If the judgement ∀<sup>x</sup> : P rop.x admits a PVS-Core derivation <sup>D</sup>, we consider <sup>p</sup> : <sup>P</sup> <sup>=</sup> Certificate(D). By definition, -<sup>P</sup> <sup>=</sup> <sup>∀</sup><sup>x</sup> : P rop.x <sup>=</sup> -Πx : P rop.x. Hence, by Proposition 5, <sup>P</sup> <sup>≡</sup><sup>β</sup><sup>∗</sup> Πx : P rop.x. As Πx : P rop.x : P rop is derivable in PVS-Cert, we can apply the conversion rule to conclude that <sup>p</sup> : Πx : P rop.x is derivable in PVS-Cert, which is impossible by Theorem 9.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Security and Incremental Computation

# Robustly Safe Compilation

Marco Patrignani1,2(B) and Deepak Garg<sup>3</sup>

<sup>1</sup> Stanford University, Stanford, USA mp@cs.stanford.edu

<sup>2</sup> CISPA Helmholz Center for Information Security, Saarbrücken, Germany

<sup>3</sup> Max Planck Institute for Software Systems, Saarbrücken, Germany

Abstract. Secure compilers generate compiled code that withstands many target-level attacks such as alteration of control flow, data leaks or memory corruption. Many existing secure compilers are proven to be fully abstract, meaning that they reflect and preserve observational equivalence. Fully abstract compilation is strong and useful but, in certain cases, comes at the cost of requiring expensive runtime constructs in compiled code. These constructs may have no relevance for security, but are needed to accommodate differences between the source and target languages that fully abstract compilation necessarily needs.

As an alternative to fully abstract compilation, this paper explores a different criterion for secure compilation called robustly safe compilation or *RSC*. Briefly, this criterion means that the compiled code preserves relevant safety properties of the source program against all adversarial contexts interacting with the compiled program. We show that *RSC* can be proved more easily than fully abstract compilation and also often results in more efficient code. We also develop two illustrative robustlysafe compilers and, through them, illustrate two different proof techniques for establishing that a compiler attains *RSC*. Based on these, we argue that proving *RSC* can be simpler than proving fully abstraction.

*To better explain and clarify notions, this paper uses colours. For a better experience, please print or view this paper in colours.*<sup>1</sup>

### 1 Introduction

Low-level adversaries, such as those written in C or assembly can attack colinked code written in a high-level language in ways that may not be feasible in the high-level language itself. For example, such an adversary may manipulate or hijack control flow, cause buffer overflows, or directly access private memory,

<sup>1</sup> Specifically, in this paper we use a blue,sans-serif font for source elements, an **orange**, **bold** font for **target** elements and a *black*, *italic* font for elements common to both languages (to avoid repeating similar definitions twice). Thus, C is a source-level component, **C** is a target-level component and *C* is generic notation for either a source-level or a target-level component.

all in contravention to the abstractions of the high-level language. Specific countermeasures such as Control Flow Integrity [3] or Code Pointer Integrity [41] have been devised to address some of these attacks *individually*. An alternative approach is to devise a *secure compiler*, which seeks to defend against entire *classes* of such attacks. Secure compilers often achieve security by relying on different protection mechanisms, e.g., cryptographic primitives [4,5,22,26], types [10,11], address space layout randomisation [6,37], protected module architectures [9,53,57,59] (also know as enclaves [46]), tagged architectures [7,39], etc. Once designed, the question researchers face is how to formalise that such a compiler is indeed secure, and how to prove this. Basically, we want a criterion that specifies secure compilation. A widely-used criterion for compiler security is fully abstract compilation (*FAC* ) [2,35,52], which has been shown to preserve many interesting security properties like confidentiality, integrity, invariant definitions, well-bracketed control flow and hiding of local state [9,37,53,54].

Informally, a compiler is fully abstract if it preserves and reflects observational equivalence of source-level components (i.e., partial programs) in their compiled counterparts. Most existing work instantiates observational equivalence with contextual equivalence: co-divergence of two components in any larger context they interact with. Fully abstract compilation is a very strong property, which preserves *all* source-level abstractions.

Unfortunately, preserving *all* source-level abstractions also has downsides. In fact, while *FAC* preserves many relevant security properties, it also preserves a plethora of other non-security ones, and the latter may force inefficient checks in the compiled code. For example, when the target is assembly, two observationally equivalent components must compile to code of the same size [9,53], else full abstraction is trivially violated. This requirement is security-irrelevant in most cases. Additionally, *FAC* is not well-suited for source languages with undefined behaviour (e.g., C and LLVM) [39] and, if used naïvely, it can fail to preserve even simple safety properties [60] (though, fortunately, no *existing* work falls prey to this naïvety).

Motivated by this, recent work started investigating alternative secure compilation criteria that overcome these limitations. These security-focussed criteria take the form of preservation of hyperproperties or classes of hyperproperties, such as hypersafety properties or safety properties [8,33]. This paper investigates one of these criteria, namely, *Robustly Safe Compilation* (*RSC* ) which has clear security guarantees and can often be attained more efficiently than FAC.

Informally, a compiler attains *RSC* if it is correct and it preserves *robust safety* of source components in the target components it produces. Robust safety is an important security notion that has been widely adopted to formalize security, e.g., of communication protocols [14,17,34]. Before explaining *RSC*, we explain robust safety as a language property.

*Robust Safety as a Language Property.* Informally, a program property is a safety property if it encodes that "bad" sequences of events do not happen when the program executes [13,63]. A program is *robustly safe* if it has relevant (specified) safety properties *despite* active attacks from adversaries. As the name suggests, robust safety relies on the notions of safety and robustness which we now explain.

Safety. As mentioned, safety asserts that "no bad sequence of events happens", so we can specify a safety property by the set of *finite observations* which characterise all bad sequences of events. A whole program has a safety property if its behaviours exclude these bad observations. Many security properties can be encoded as safety, including integrity, weak secrecy and functional correctness.

*Example 1 (Integrity).* Integrity ensures that an attacker does not tamper with code invariants on state. For example, consider the function charge\_account(n) which deducts amount n from an account as part of an electronic card payment. A card PIN is required if n is larger than 10 euros. So the function checks whether n > 10, requests the PIN if this is the case, and then changes the account balance. We expect this function to have a safety (integrity) property in the account balance: A reduction of more than 10 euros in the account balance must be preceded by a call to request\_pin(). Here, the relevant observation is a trace (sequence) of account balances and calls to request\_pin(). Bad observations for this safety property are those where an account balance is at least 10 euros less than the previous one, without a call to request\_pin() in between. Note that this function seems to have this safety property, but it may not have the safety property *robustly*: a target-level adversary may transfer control directly to the "else" branch of the check <sup>n</sup> > 10 after setting <sup>n</sup> to more than 10, to violate the safety property. -

*Example 2 (Weak Secrecy).* Weak secrecy asserts that a program secret never flows *explicitly* to the attacker. For example, consider code that manages network\_h, a handler (socket descriptor) for a sensitive network interface. This code does not expose network\_h directly to external code but it provides an API to use it. This API makes some security checks internally. If the handler is directly accessible to outer code, then it can be misused in insecure ways (since the security checks may not be made). If the code has weak secrecy wrt network\_h then we know that the handler is never passed to an attacker. In this case we can define bad observations as those where network\_h is passed to external code (e.g., as a parameter, as a return value on or on the heap). -

*Example 3 (Correctness).* Program correctness can also be formalized as a safety property. Consider a program that computes the nth Fibonacci number. The program reads n from an input source and writes its output to an output source. Correctness of this program is a safety property. Our observations are pairs of an input (read by the program) and the corresponding output. A bad observation is one where the input is n (for some n) but the output is different from the nth Fibonacci number. -

These examples not only illustrate the expressiveness of safety properties, but also show that safety properties are quite *coarse-grained*: they are only concerned with (sequences of) relevant events like calls to specific functions, changes to specific heap variables, inputs, and outputs. They do not specify or constrain how the program computes between these events, leaving the programmer and the compiler considerable flexibility in optimizations. However, safety properties are not a panacea for security, and there are security properties that are not safety. For example, noninterference [70,72], the standard information flow property, is not safety. Nonetheless, many interesting security properties are safety. In fact, many non-safety properties including noninterference can be conservatively approximated as safety properties [20]. Hence, safety properties are a meaningful goal to pursue for secure compilation.

Robustness. We often want to reason about properties of a component of interest that hold irrespective of any other components the component interacts with. These other components may be the libraries the component is linked against, or the language runtime. Often, these surrounding components are modelled as the *program context* whose hole the component of interest fills. From a security perspective the context represents the attacker in the threat model. When the component of interest links to a context, we have a whole program that can run. A property holds *robustly* for a component if it holds in *any* context that the component of interest can be linked to.

*Robust Safety Preservation as a Compiler Property.* A compiler attains robustly safe compilation or *RSC* if it maps any source component that has a safety property *robustly* to a compiled component that has the *same* safety property robustly. Thus, safety has to hold robustly in the target language, which often does not have the powerful abstractions (e.g., typing) that the source language has. Hence, the compiler must insert enough defensive runtime checks into the compiled code to prevent the more powerful target contexts from launching attacks (violations of safety properties) that source contexts could not launch. This is unlike correct compilation, which either considers only those target contexts that behave like source contexts [40,49,65] or considers only whole programs [43].

As mentioned, safety properties are usually quite coarse-grained. This means that *RSC* still allows the compiler to optimise code internally, as long as the sequence of observable events is not affected. For example, when compiling the fibonacci function of Example 3, the compiler can do any internal optimisation such as caching intermediate results, as long as the end result is correct. Crucially, however, these intermediate results must be protected from tampering by a (target-level) attacker, else the output can be incorrect, breaking *RSC* .

A *RSC* -attaining compiler focuses only on preserving security (as captured by robust safety) instead of contextual equivalence (typically captured by full abstraction). So, such a compiler can produce code that is more efficient than code compiled with a fully abstract compiler as it does not have to preserve *all* source abstractions (we illustrate this later).

Finally, robust safety scales naturally to thread-based concurrency [1,34,58]. Thus *RSC* also scales naturally to thread-based concurrency (we demonstrate this too). This is unlike *FAC*, where thread-based concurrency can introduce additional undesired abstractions that also need to be preserved.

*RSC* is a very recently proposed criterion for secure compilers. Recent work [8,33] define *RSC* abstractly in terms of preservation of program behaviours, but their development is limited to the definition only. Our goal in this paper is to examine how *RSC* can be realized and established, and to show that in certain cases it leads to compiled code that is more efficient than what *FAC* leads to. To this end, we consider a specific setting where observations are values in specific (sensitive) heap locations at cross-component calls. We define robust safety and *RSC* for this specific setting (Sect. 2). Unlike previous work [8,33] which assumed that the domain of traces (behaviours) is the same in the source and target languages, our *RSC* definition allows for different trace domains in the source and target languages, as long as they can be suitably related. The second contribution of our paper is two proof techniques to establish *RSC*.


To argue that *RSC* is general and is not limited to compilation targets based on capabilities, we also develop a third compiler. This compiler starts from the same source language as our second compiler but targets an untyped concurrent language with support for *coarse-grained memory isolation*, modelling recent hardware extensions such as Intel's SGX [46]. Due to space constraints, we report this result only in the companion technical report [61].

The final contribution of this paper is a comparison between *RSC* and *FAC*. For this, we describe changes that would be needed to attain *FAC* for the first compiler and argue that these changes make generated code inefficient and also complicate the backtranslation proof significantly (Sect. 5).

Due to space constraints, we elide some technical details and limit proofs to sketches. These are fully resolved in the companion technical report [61].

# 2 Robustly Safe Compilation

This section first discusses robust safety as a language (not a compiler) property (Sect. 2.1) and then presents *RSC* as a compiler property along with an informal discussion of techniques to prove it (Sect. 2.2).

#### 2.1 Safety and Robust Safety

To explain robust safety, we first describe a general *imperative* programming model that we use. Programmers write *components* on which they want to enforce safety properties robustly. A component is a list of function definitions that can be linked with other components (the context) in order to have a runnable whole program (functions in "other" components are like extern functions in C). Additionally, every component declares a set of "sensitive" locations that contain all the data that is safety-relevant. For instance, in Example 1 this set may contain the account balance and in Example 3 it may contain the I/O buffers. We explain the relevance of this set after we define safety properties.

We want safety properties to specify that a component never executes a "bad" sequence of events. For this, we first need to fix a notion of events. We have several choices here, e.g., our events could be inputs and outputs, all syscalls, all changes to the heap (as in CompCert [44]), etc. Here, we make a specific choice motivated by our interest in robustness: We define events as calls/returns that cross a component boundary, together with the state of the heap at that point. Consequently, our safety properties can constrain the contents of the heap at component boundaries. This choice of component boundaries as the point of observation is meaningful because, in our programming model, control transfers to/from an adversary happen only at component boundaries (more precisely, they happen at cross-component function call and returns). This allows the compiler complete flexibility in optimizing code within a component, while not reducing the ability of safety properties to constrain observations of the adversary.

Concretely, a component behaviour is a *trace*, i.e., a sequence of *actions* recording component boundary interactions and, in particular, the heap at these points. *Actions*, the items on a trace, have the following grammar:

# *Actions* <sup>α</sup> ::= call *fvH* ? <sup>|</sup> call *fvH* ! <sup>|</sup> ret *<sup>H</sup>* ! <sup>|</sup> ret *<sup>H</sup>* ?

These actions respectively capture call and callback to a function f with parameter v when the heap is H as well as return and returnback with a certain heap H. <sup>2</sup> We use ? and ! decorations to indicate whether the control flow of the action goes from the context to the component (?) or from the component to the context (!). Well-formed traces have alternations of ? and ! decorated actions,

<sup>2</sup> A callback is a call from the component to the context, so it generates label call *fvH* !. A returnback is a return from such a callback, i.e., the context returning to the component, and it generates the label ret *<sup>H</sup>* ?.

starting with ? since execution starts in the context. For a sequence of actions α, relevant(α) is the list of heaps H mentioned in the actions of α.

Next, we need a representation of safety properties. Generally, properties are sets of traces, but safety properties specifically can be specified as automata (or monitors in the sequel) [63]. We choose this representation since monitors are less abstract than sets of traces and they are closer to enforcement mechanisms used for safety properties, e.g., runtime monitors. Briefly, a safety property is a monitor that transitions states in response to events of the program trace. At any point, the monitor may refuse to transition (it gets *stuck*), which encodes property violation. While a monitor can transition, the property has not been violated. Schneider [63] argues that all properties codable this way are safety properties and that all enforceable safety properties can be coded this way.

Formally, a monitor *M* in our setting consists of a set of abstract states {σ ···}, the transition relation -, an initial state σ*<sup>0</sup>* , the set of heap locations that matter for the monitor, {*<sup>l</sup>* ···}, and the current state <sup>σ</sup>*<sup>c</sup>* (we indicate a set of elements of class e as {e ···}). The transition relation is a set of triples of the form (σ*<sup>s</sup>* , *H* , σ*<sup>f</sup>* ) consisting of a starting state σ*<sup>s</sup>* , a final state σ*<sup>f</sup>* and a heap *H* . The transition (σ*<sup>s</sup>* , *H* , σ*<sup>f</sup>* ) is interpreted as "*state* σ*<sup>s</sup> transitions to* σ*<sup>f</sup> when the heap is H* ". When determining the monitor transition in response to a program action, we restrict the program's heap to the location set {*<sup>l</sup>* ···}, i.e., to the set of locations the monitor cares about. This heap restriction is written H- - {l··· }. We assume determinism of the transition relation: for any σ*<sup>s</sup>* and (restricted heap) *<sup>H</sup>* , there is at most one <sup>σ</sup>*<sup>f</sup>* such that (σ*<sup>s</sup>* , *<sup>H</sup>* , σ*<sup>f</sup>* ) <sup>∈</sup> -.

Given the behaviour of a program as a trace α and a monitor *M* specifying a safety property, *<sup>M</sup>* <sup>α</sup> denotes that the trace satisfies the safety property. Intuitively, to satisfy a safety property, the sequence of heaps in the actions of a trace must never get the monitor stuck (Rule Valid trace). Every single heap must allow the monitor to step according to its transition relation (Rule Monitor Step). Note that we overload the notation here to also denote an auxiliary relation, the *monitor small-step semantics* (Rule Monitor Step-base and Rule Monitor Step-ind).

$$\begin{array}{c} \text{(Vald trace)}\\ \hline M; \texttt{releverant}(\overline{\alpha}) \leadsto M'\\ \hline M \vdash \overline{\alpha} \end{array} \quad \begin{array}{c} \text{(Monitor Step-base)}\\ \hline M; \overline{H} \leadsto M' \\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \hline M'; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M'; \overline{H} \leadsto M' \\ \end{array} \quad \begin{array}{c} \text{(Monitor Step-ind)}\\ \hline M; \overline{H} \leadsto M'; \overline{H} \leadsto M'; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Miver}; \overline{H} \operatorname{Mver}; \overline{H} \operatorname$$

With this setup in place, we can formalise safety, attackers and robust safety. In defining (robust) safety for a component, we only admit monitors (safety properties) whose {*<sup>l</sup>* ···} agrees with the sensitive locations declared by the component. Making the set of safety-relevant locations explicit in the component and the monitor gives the compiler more flexibility by telling it precisely which locations need to be protected against target-level attacks (the compiler may choose to not protect the rest). At the same time, it allows for expressive modelling. For instance, in Example 3 the safety-relevant locations could be the I/O buffers from which the program performs inputs and outputs, and the safety property can constrain the input and output buffers at corresponding call and return actions involving the Fibonacci function.

#### Definition 1 (Safety, attacker and robust safety).

$$\begin{aligned} M \vdash C: saf \stackrel{\mathsf{def}}{=} \; if \vdash C: whole \; then \; if \; \Omega\_0 \; (C) \stackrel{\overline{\mathsf{\varpi}}}{\implies} \\_ then \; M \vdash \overline{\alpha} \\\ C \vdash A: atk \stackrel{\mathsf{def}}{=} C = \{l \cdot \cdot \cdot \}, \overline{F} \; and \; \{l \cdot \cdot \cdot \} \cap \mathsf{fn}(A) = \mathcal{Q} \\\ M \vdash C: rs \stackrel{\mathsf{def}}{=} \forall A. \; if \; M \,\!\!\\_ C \; and \; C \vdash A: atk \; then \; M \vdash A \; [C]: safe \end{aligned}$$

A whole program *<sup>C</sup>* is safe for a monitor *<sup>M</sup>* , written *<sup>M</sup> <sup>C</sup>* : *safe*, if the monitor accepts any trace the program generates from its initial state (Ω<sup>0</sup> (C)).

An attacker *<sup>A</sup>* is valid for a component *<sup>C</sup>* , written *<sup>C</sup> <sup>A</sup>* : *atk*, if *<sup>A</sup>*'s free names (denoted fn(*A*)) do not refer to the locations that the component cares about. This is a basic sanity check: if we allow an attacker to mention heap locations that the component cares about, the attacker will be able to modify those locations, causing all but trivial safety properties to not hold robustly.

A component *<sup>C</sup>* is robustly safe wrt monitor *<sup>M</sup>* , written *<sup>M</sup> <sup>C</sup>* : *rs*, if *C* composed with *any* attacker is safe wrt *M* . As mentioned, for this setup to make sense, the monitor and the component must agree on the locations that are safety-relevant. This agreement is denoted *M C* .

#### 2.2 Robustly Safe Compilation

Robustly-safe compilation ensures that robust safety properties *and their meanings* are preserved across compilation. But what does it means to preserve meanings across languages? If a source safety property says never write 3 to a location, and we compile to an assembly language by mapping numbers to binary, the corresponding target property should say **never write 0x11 to an address**.

In order to relate properties across languages, we assume a relation <sup>≈</sup> : <sup>v</sup> <sup>×</sup> **<sup>v</sup>** between source and target values that is *total*, so it maps any source value <sup>v</sup> to a target value **<sup>v</sup>** : <sup>∀</sup>v.∃**v**.<sup>v</sup> <sup>≈</sup> **<sup>v</sup>**. This value relation is used to define a relation between heaps: <sup>H</sup> <sup>≈</sup> **<sup>H</sup>**, which intuitively holds when related locations point to related values. This is then used to define a relation between actions: α ≈ α, which holds when the two actions are the "same" modulo this relation, i.e., call ··· ? only relates to call ··· ? and the arguments of the action (values and heap) are related. Next, we require a relation <sup>M</sup> <sup>≈</sup>**<sup>M</sup>** between source and target monitors, which means that the source monitor M and the target monitor **M** code the same safety property, modulo the relation ≈ on values assumed above. The precise definition of this relation depends on the source and target languages; specific instances are shown in Sects. 3.3 and 4.3. 3

<sup>3</sup> Accounting for the difference in the representation of safety properties sets us apart from recent work [8,33], which assumes that the source and target languages have the same trace alphabet. The latter works only in some settings.

We denote a compiler from language <sup>S</sup> to language **<sup>T</sup>** by · S **<sup>T</sup>**. A compiler · S **<sup>T</sup>** attains *RSC*, if it maps any component <sup>C</sup> that is robustly safe wrt <sup>M</sup> to a component **<sup>C</sup>** that is robustly safe wrt **<sup>M</sup>**, provided that <sup>M</sup> <sup>≈</sup>**M**.

#### Definition 2 (Robustly Safe Compilation).

 · S **<sup>T</sup>** : *RSC* def <sup>=</sup> <sup>∀</sup>C, <sup>M</sup>,**M**. *if* <sup>M</sup> <sup>C</sup> : rs *and* <sup>M</sup> <sup>≈</sup>**<sup>M</sup>** *then* **<sup>M</sup>** C S **<sup>T</sup>** : **rs**

A consequence of the universal quantification over monitors here is that the compiler cannot be property-sensitive. A robustly-safe compiler preserves all robust safety properties, not just a specific one, e.g., it does not just enforce that fibonacci is correct. This seemingly strong goal is sensible as compiler writers will likely not know what safety properties individual programmers will want to preserve.

*Remark.* Some readers may wonder why we do not follow existing work and specify safety as "programmer-written assertions never fail" [31,34,45,68]. Unfortunately, this approach does not yield a meaningful criterion for specifying a compiler, since assertions in the compiled program (if any) are generated by the compiler itself. Thus a compiler could just erase all assertions and the compiled code it generates would be trivially (robustly) safe – no assertion can fail if there are no assertions in the first place!

Proving *RSC* . Proving that a compiler attains *RSC* can be done either by proving that a compiler satisfies Definition 2 or by proving something *equivalent*. To this end, Definition 3 below presents an alternative, equivalent formulation of *RSC*. We call this characterisation *property-free* as it does not mention monitors explicitly (it mentions the relevant( · ) function for reasons we explain below).

Definition 3 (Property-Free *RSC*).

 · S **<sup>T</sup>** : *PF*-*RSC* def <sup>=</sup> <sup>∀</sup>C, **<sup>A</sup>**, α. *if* C S **<sup>T</sup> <sup>A</sup>** : **atk** *and* **<sup>A</sup>** C S **T** : **whole** *and* **Ω<sup>0</sup> A** C S **T** <sup>α</sup> ==<sup>⇒</sup> \_ *then* <sup>∃</sup>A, α. <sup>C</sup> <sup>A</sup> : atk *and* <sup>A</sup>[C] : whole *and* <sup>Ω</sup><sup>0</sup> (A[C]) <sup>α</sup> ==<sup>⇒</sup> \_ *and* relevant(α) <sup>≈</sup> relevant(α)

Specifically, *PF*-*RSC* states that the compiled code produces behaviours that *refine* source level behaviours *robustly* (taking contexts into account).

*PF*-*RSC* and *RSC* should, in general, be equivalent (Proposition 1).

Proposition 1 (*PF-RSC* and *RSC* are equivalent).

<sup>∀</sup>· S **<sup>T</sup>**, · S **<sup>T</sup>** : *PF-RSC* ⇐⇒ · S **<sup>T</sup>** : *RSC*

Informally, a property is safety if and only if it implies programs not having any trace prefix from a given set of bad prefixes (i.e., finite traces). Hence, *not* having a safety property robustly amounts to some context being able to induce a bad prefix. Consequently, preserving *all* robust safety properties (*RSC* ) amounts to ensuring that all target prefixes can be generated (by some context) in the source too (*PF*-*RSC* ). Formally, since Definition 2 relies on the monitor relation, we can prove Proposition 1 only after such a relation is finalised. We give such a monitor relation and proof in Sect. 3.3 (see Theorem 3). However, in general this result should hold for any cross-language monitor relation that correctly relates safety properties. If the proposition does not hold, then the relation does not capture how safety in one language is represented in the other.

Assuming Proposition 1, we can prove *PF*-*RSC* for a compiler in place of *RSC*. *PF*-*RSC* can be proved with a *backtranslation* technique. This technique has been often used to prove full abstraction [7–9,33,39,50,53,54,59] and it aims at building a source context starting from a target one. In fact *PF*-*RSC* , leads directly to a backtranslation-based proof technique since it can be rewritten (eliding irrelevant details) as:

$$\begin{aligned} \text{If } \exists \mathsf{A}, \overline{\alpha}. \Omega\_0 \left( \mathsf{A} \left[ \left[ \mathsf{C} \right]\_T^{\mathsf{S}} \right] \right) & \stackrel{\overline{\alpha}}{\Longrightarrow} \ \mathsf{\\_} \\ \text{then } \exists \mathsf{A}, \overline{\alpha}. \Omega\_0 \left( \mathsf{A} \left[ \mathsf{C} \right] \right) & \stackrel{\overline{\pi}}{\Longrightarrow} \ \mathsf{\\_} \text{ and } \mathsf{res1evant}(\overline{\alpha}) \approx \mathsf{res1evant}(\overline{\alpha}). \end{aligned}$$

Essentially, given a target context **A**, a compiled program C S **<sup>T</sup>** and a target trace α that **A** causes C S **<sup>T</sup>** to have, we need to construct, or *backtranslate* to, a source context A that will cause the source program C to simulate α. Such backtranslation based proofs can be quite difficult, depending on the features of the languages and the compiler. However, backtranslation for *RSC* (as we show in Sect. 3.3) is not as complex as backtranslation for *FAC* (Sect. 5.2).

A simpler proof strategy is also viable for *RSC* when we compile only those source programs that have been *verified* to be robustly safe (e.g., using a type system). The idea is this: from the verification of the source program, we can find an invariant which is always maintained by the target code, and which, in turn, implies the robust safety of the target code. For example, if the safety property is that values in the heap always have their expected types, then the invariant can simply be that values in the target heap are always related to the source ones (which have their expected types). This is tantamount to proving type preservation in the target in the presence of an active adversary. This is harder than standard type preservation (because of the active adversary) but is still much easier than backtranslation as there is no need to map target constructs to source contexts syntactically. We illustrate this proof technique in Sect. 4.

*RSC* Implies Compiler Correctness. As stated in Sect. 1, *RSC* implies (a form of) compiler correctness. While this may not be apparent from Definition 2, it is more apparent from its equivalent characterization in Definition 3. We elaborate this here.

Whether concerned with whole programs or partial programs, compiler correctness states that the behaviour of compiled programs *refines* the behaviour of source programs [18,36,40,44,49,65]. So, if {α ···} and {α ···} are the sets of compiled and source behaviours, then a compiler should force {<sup>α</sup> ···} ⊂∼ {<sup>α</sup> ···}, where ⊂∼ is the composition of <sup>⊆</sup> and of the relation <sup>≈</sup>−1.

If we consider a source component C that is whole, then it can only link against empty contexts, both in the source and in the target. Hence, in this special case, *PF*-*RSC* simplifies to standard refinement of traces, i.e., whole program compiler correctness. Hence, assuming that the correctness criterion for a compiler is concerned with the same observations as safety properties (values in safety-relevant heap locations at component crossings in our illustrative setting), *PF*-*RSC* implies whole program compiler correctness.

However, *PF*-*RSC* (or, equivalently, *RSC* ) does not imply, nor is implied by, any form of *compositional compiler correctness* (CCC) [40,49,65]. CCC requires that the behaviours produced by a compiled component linked against a target context that is related (in behaviour) to a source context can also be produced by the source component linked against the *related* source context. In contrast, *PF*-*RSC* allows picking *any* source context to simulate the behaviours. Hence, *PF*-*RSC* does not imply CCC. On the other hand, *PF*-*RSC* universally quantifies over all target contexts, while CCC only quantifies over target contexts related to a source context, so CCC does not imply *PF*-*RSC* either. Hence, compositional compiler correctness, if desirable, must be imposed in addition to *PF*-*RSC* . Note that this lack of implications is unsurprising: *PF*-*RSC* and CCC capture two very different aspects of compilation: security (against all contexts) and compositional preservation of behaviour (against well-behaved contexts).

# 3 *RSC* via Trace-Based Backtranslation

This section illustrates how to prove that a compiler attains *RSC* by means of a trace-based backtranslation technique [7,53,59]. To present such a proof, we first introduce our source language L<sup>U</sup>, an untyped, first-order imperative language with abstract references and hidden local state (Sect. 3.1). Then, we present our target language **L<sup>P</sup>**, an untyped imperative target language with a concrete heap, whose locations are natural numbers that the context can compute. **L<sup>P</sup>** provides hidden local state via a fine-grained capability mechanism on heap accesses (Sect. 3.2). Finally, we present the compiler · LU **<sup>L</sup><sup>P</sup>** and prove that it attains *RSC* (Sect. 3.3) by means of a trace-based backtranslation. The section conclude with an example detailing why *RSC* preserves security (Example 4).

To avoid focussing on mundane details, we deliberately use source and target languages that are fairly similar. However, they differ substantially in one key point: the heap model. This affords the target-level adversary attacks like guessing private locations and writing to them that do not obviously exist in the source (and makes our proofs nontrivial). We believe that (with due effort) the ideas here will generalize to languages with larger gaps and more features.

# 3.1 The Source Language **LU**

L<sup>U</sup> is an untyped imperative while language [51]. Components C are triples of function definitions, interfaces and a special location written root, so C ::= root; F; I. Each function definition maps a function name and a formal argument to a body <sup>s</sup>: <sup>F</sup> ::= <sup>f</sup>(x) <sup>→</sup> <sup>s</sup>;return;. An interface is a list of functions that the component relies on the context to provide (similar to C's extern declarations). The special location root defines the locations that are monitored for safety, as explained below. Attackers A (program contexts) are function definitions that represent untrusted code that a component interacts with. A function's body is a statement, s. Statements are rather standard, so we omit a formal syntax. Briefly, they can manipulate the heap (location creation let x = new e in s, assignment x := e), do recursive function calls (call f e), condition (if-then-else), define local variables (let-in) and loop. Statements use effect-free expressions, e, which contain standard boolean expressions (<sup>e</sup> <sup>⊗</sup> <sup>e</sup>), arithmetic expressions (<sup>e</sup> <sup>⊕</sup> <sup>e</sup>), pairing (e, <sup>e</sup>) and projections, and location dereference (!e). Heaps <sup>H</sup> are maps from abstract locations to values v.

As explained in Sect. 2.1, safety properties are specified by monitors. L<sup>U</sup>'s monitors have the form: <sup>M</sup> ::= ({<sup>σ</sup> ···} ,-, σ0, root, σc). Note that in place of the set {*<sup>l</sup>* ···} of safety-relevant locations, the description of a monitor here (as well as a component above) contains a *single* location root. The interpretation is that any location *reachable* in the heap starting from root is relevant for safety. This set of locations can change as the program executes, and hence this is more flexible than statically specifying all of {*<sup>l</sup>* ···} upfront. This representation of the set by a single location is made explicit in the following monitor rule:

$$\begin{array}{c} \{\mathsf{l}^{\mathsf{U}}\text{-}\mathsf{Monitor}\,\mathsf{Setp}\} \\ \mathsf{M} = (\{\sigma\,\,\cdots\,\}, \,\stackrel{\sim}{\leadsto}, \sigma\_{0}, \ell\_{\text{root}}, \sigma\_{c}) & \mathsf{M}' = (\{\sigma\,\,\cdots\,\}, \,\stackrel{\sim}{\leadsto}, \sigma\_{0}, \ell\_{\text{root}}, \sigma\_{t}) \\ (\sigma\_{c}, \mathsf{H}', \sigma\_{t}) \in \leadsto & \mathsf{H}' \subseteq \mathsf{H} & \mathsf{dom}(\mathsf{H}') = \mathsf{reach}(\ell\_{\text{root}}, \mathsf{H}) \\ \hline \end{array}$$

Other than this small point, monitors, safety, robust safety and *RSC* are defined as in Sect. 2. In particular, a monitor and a component agree if they mention the same root: <sup>M</sup> <sup>C</sup> def = (<sup>M</sup> <sup>=</sup> ({<sup>σ</sup> ···} ,-, σ0, root, σc)) and (C = (root; F; I))

A program state <sup>C</sup>, <sup>H</sup> (s)<sup>f</sup> (denoted with <sup>Ω</sup>) includes the function bodies <sup>C</sup>, the heap H, a statement s being executed and a stack of function calls f (often omitted in the rules for simplicity). The latter is used to populate judgements of the form <sup>I</sup> <sup>f</sup>, <sup>f</sup> : internal/in/out. These determine whether calls and returns are internal (within the attacker or within the component), directed from the attacker to the component (in) or directed from the component to the attacker (out). This information is used to determine whether the semantics should generate a label, as in Rules EL<sup>U</sup>-return to EL<sup>U</sup>-retback, or no label, as in Rules EL<sup>U</sup>-ret-internal and EL<sup>U</sup>-call-internal since internal calls should not be observable. L<sup>U</sup> has a bigstep semantics for expressions (<sup>H</sup> <sup>e</sup> →→ <sup>v</sup>) that relies on evaluation contexts, a small-step semantics for statements (<sup>Ω</sup> <sup>λ</sup>−−→ <sup>Ω</sup> ) that has labels λ ::= | α and a semantics that accumulates labels in traces (Ω <sup>α</sup> ==<sup>⇒</sup> <sup>Ω</sup> ) by omitting silent actions and concatenating the rest. Unlike existing work on compositional compiler correctness which only rely on having the component [40], the semantics relies on having both the component and the context.

#### 3.2 The Target Language **L<sup>P</sup>**

**L<sup>P</sup>** is an untyped, imperative language that follows the structure of L<sup>U</sup> and it has similar expressions and statements. However, there are critical differences (that make the compiler interesting). The main difference is that heap locations in **L<sup>P</sup>** are concrete natural numbers. Upfront, an adversarial context can guess locations used as private state by a component and clobber them. To support hidden local state, a location can be "hidden" explicitly via the statement **let x** = **hide e in s**, which allocates a new capability **k**, an abstract token that grants access to the location **n** to which **e** points [64]. Subsequently, all reads and writes to **n** must be authenticated with the capability, so reading and writing a location take another parameter as follows: !**e with e** and **x** := **e with e**. In both cases, the **e** after the **with** is the capability. Unlike locations, capabilities cannot be guessed. To make a location private, the compiler can make the capability of the location private. To bootstrap this hiding process, we assume that a component has one location that can only be accessed by it, a priori in the semantics (in our formalization, we always focus on only one component and we assume that, for this component, this special location is at address **0**).

In detail, **L<sup>P</sup>** heaps **H** are maps from natural numbers (locations) **n** to values **<sup>v</sup>** and a tag <sup>η</sup> as well as capabilities, so **<sup>H</sup>** ::= <sup>∅</sup> <sup>|</sup> **<sup>H</sup>**; **<sup>n</sup>** <sup>→</sup> **<sup>v</sup>** : <sup>η</sup> <sup>|</sup> **<sup>H</sup>**; **<sup>k</sup>**. The tag η can be ⊥, which means that **n** is globally available (not protected) or a capability **k**, which protects **n**. A globally available location can be freely read and written but one that is protected by a capability requires the capability to be supplied at the time of read/write (Rule E**L<sup>P</sup>**-assign, Rule E**L<sup>P</sup>**-deref).

**L<sup>P</sup>** also has a big-step semantics for expressions, a labelled small-step semantics and a semantics that accumulates traces analogous to that of L<sup>U</sup>.

(E**LP**-deref) **n** -<sup>→</sup> **<sup>v</sup>** : <sup>η</sup> <sup>∈</sup> **<sup>H</sup>** (<sup>η</sup> <sup>=</sup> <sup>⊥</sup>) or (<sup>η</sup> <sup>=</sup> **<sup>k</sup>** and **<sup>v</sup>**- = **k**) **H** !**n with v**- →→ **<sup>H</sup> <sup>v</sup>** (E**LP**-new) **<sup>H</sup>** <sup>=</sup> **<sup>H</sup>1**; **<sup>n</sup>** -<sup>→</sup> (**v**, η) **<sup>H</sup> <sup>e</sup>** →→ **v H**- <sup>=</sup> **<sup>H</sup>**; **<sup>n</sup>** <sup>+</sup> **<sup>1</sup>** -<sup>→</sup> **<sup>v</sup>** : <sup>⊥</sup> **<sup>C</sup>**, **<sup>H</sup> let x** <sup>=</sup> **new e in s** −→ **<sup>C</sup>**, **<sup>H</sup>**- **s**[**n** + **1** / **x**] (E**LP**-hide) **<sup>H</sup> <sup>e</sup>** →→ **n k** <sup>∈</sup>/ dom(**H**) **<sup>H</sup>** <sup>=</sup> **<sup>H</sup>1**; **<sup>n</sup>** -<sup>→</sup> **<sup>v</sup>** : <sup>⊥</sup>; **<sup>H</sup><sup>2</sup> <sup>H</sup>**- <sup>=</sup> **<sup>H</sup>1**; **<sup>n</sup>** -<sup>→</sup> **<sup>v</sup>** : **<sup>k</sup>**; **<sup>H</sup>2**; **<sup>k</sup> <sup>C</sup>**, **<sup>H</sup> let x** <sup>=</sup> **hide e in s** −→ **<sup>C</sup>**, **<sup>H</sup>**- **s**[**k** / **x**] (E**LP**-assign) **<sup>H</sup> <sup>e</sup>** →→ **v H** <sup>=</sup> **<sup>H</sup>1**; **<sup>n</sup>** -<sup>→</sup> \_ : <sup>η</sup>; **<sup>H</sup><sup>2</sup> <sup>H</sup>**- <sup>=</sup> **<sup>H</sup>1**; **<sup>n</sup>** -<sup>→</sup> **<sup>v</sup>** : <sup>η</sup>; **<sup>H</sup><sup>2</sup>** (<sup>η</sup> <sup>=</sup> <sup>⊥</sup>) or (<sup>η</sup> <sup>=</sup> **<sup>k</sup>** and **<sup>v</sup>**- = **k**) **C**, **H n** := **e with v**- −→ **<sup>C</sup>**, **<sup>H</sup>**-**skip**

A second difference between **L<sup>P</sup>** and L<sup>U</sup> is that **L<sup>P</sup>** has no booleans, while L<sup>U</sup> has them. This makes the compiler and the related proofs interesting, as discussed in the proof of Theorem 1.

In **L<sup>P</sup>**, the locations of interest to a monitor are all those that can be reached from the address **0**. **0** itself is protected with a capability **kroot** that is assumed to occur only in the code of the component in focus, so a component is defined as **C** ::= **kroot**; **F**; **I**. We can now give a precise definition of component-monitor agreement for **L<sup>P</sup>** as well as a precise definition of attacker, which must care about the **kroot** capability.

$$\begin{aligned} \mathcal{M} \urcorner \mathcal{C} & \stackrel{\text{def}}{=} \left( \mathcal{M} = (\{ \sigma \cdot \cdot \}, \leadsto, \sigma\_0, \mathbf{k}\_{\text{root}}, \sigma\_\mathbf{c}) \right) \text{ and } \left( \mathcal{C} = (\mathbf{k}\_{\text{root}}; \overline{\mathbf{F}}; \overline{\mathbf{I}}) \right) \\ \mathcal{C} \vdash \mathcal{A} & : \text{atk} \stackrel{\text{def}}{=} \mathcal{C} = (\mathbf{k}\_{\text{root}}; \overline{\mathbf{F}}; \overline{\mathbf{I}}), \mathcal{A} = \overline{\mathbf{F}'}, \mathbf{k}\_{\text{root}} \notin \mathbf{f} \mathbf{n}(\overline{\mathbf{F}'}) \end{aligned}$$

# 3.3 Compiler from **LU** to **<sup>L</sup><sup>P</sup>**

We now present · LU **<sup>L</sup><sup>P</sup>** , the compiler from <sup>L</sup><sup>U</sup> to **<sup>L</sup><sup>P</sup>**, detailing how it uses the capabilities of **<sup>L</sup><sup>P</sup>** to achieve *RSC*. Then, we prove that · LU **<sup>L</sup><sup>P</sup>** attains *RSC*.

Compiler · LU **<sup>L</sup><sup>P</sup>** takes as input a <sup>L</sup><sup>U</sup> component <sup>C</sup> and returns a **<sup>L</sup><sup>P</sup>** component (excerpts of the translation are shown below). The compiler performs a simple pass on the structure of functions, expressions and statements. Each L<sup>U</sup> location is encoded as a pair of a **L<sup>P</sup>** location and the capability to access the location; location update and dereference are compiled accordingly. The compiler codes source booleans true to **0** and false to **1**, and the source number n to the target counterpart **n**.


This compiler solely relies on the capability abstraction of the target language as a defence mechanism to attain *RSC*. Unlike existing secure compilers, · LU **<sup>L</sup><sup>P</sup>** needs neither dynamic checks nor other constructs that introduce runtime overhead to attain *RSC* [9,32,39,53,59].

Proof of *RSC* . Compiler · LU **<sup>L</sup><sup>P</sup>** attains *RSC* (Theorem 1). In order to set up this theorem, we need to instantiate the cross-language relation for values, which we write as <sup>≈</sup><sup>β</sup> here. The relation is parametrised by a partial bijection <sup>β</sup> : ×**n**×<sup>η</sup> from source heap locations to target heap locations which determines when a source location and a target location (and its capability) are related. On values, <sup>≈</sup><sup>β</sup> is defined as follows: true <sup>≈</sup><sup>β</sup> **<sup>0</sup>**; false <sup>≈</sup><sup>β</sup> **<sup>n</sup>** when **<sup>n</sup>** <sup>=</sup> **<sup>0</sup>**; <sup>n</sup> <sup>≈</sup><sup>β</sup> **<sup>n</sup>**; <sup>≈</sup><sup>β</sup> **n**, **<sup>k</sup>** if (, **<sup>n</sup>**, **<sup>k</sup>**) <sup>∈</sup> <sup>β</sup>; <sup>≈</sup><sup>β</sup> **n**, \_ if (, **<sup>n</sup>**, <sup>⊥</sup>) <sup>∈</sup> <sup>β</sup>; v1, <sup>v</sup>2 <sup>≈</sup><sup>β</sup> **v1**, **<sup>v</sup>2** if <sup>v</sup><sup>1</sup> <sup>≈</sup><sup>β</sup> **<sup>v</sup><sup>1</sup>** and <sup>v</sup><sup>2</sup> <sup>≈</sup><sup>β</sup> **<sup>v</sup>2**. This relation is then used to define the heap, monitor state and action relations. Heaps are related, written <sup>H</sup> <sup>≈</sup><sup>β</sup> **<sup>H</sup>**, when locations related in <sup>β</sup> point to related values. States are related, written <sup>Ω</sup> <sup>≈</sup><sup>β</sup> **<sup>Ω</sup>**, when they have related heaps. The action relation (α ≈<sup>β</sup> α) is defined as in Sect. 2.2.

*Monitor Relation.* In Sect. 2.2, we left the monitor relation abstract. Here, we define it for our two languages. Two monitors are related when they can *simulate* each other on related heaps. Given a monitor-specific relation σ ≈ σ on monitor states, we say that a relation R on source and target monitors is a *bisimulation* if the following hold whenever <sup>M</sup> <sup>=</sup> ({<sup>σ</sup> ···} ,-, σ0, root, σc) and **<sup>M</sup>** <sup>=</sup> ({<sup>σ</sup> ···} ,-, σ**0**, **<sup>k</sup>root**, σ**c**) are related by <sup>R</sup>:

	- (a) (σc, <sup>H</sup>, \_) <sup>∈</sup> iff (σ**c**, **<sup>H</sup>**, \_) <sup>∈</sup> -, and
	- (b) (σc, H, σ ) <sup>∈</sup> and (σ**c**, **H**, σ ) <sup>∈</sup> imply ({<sup>σ</sup> ···} ,-, σ0, root, σ )R({<sup>σ</sup> ···} ,-, σ**0**, **kroot**, σ ).

In words, <sup>R</sup> is a bisimulation only if <sup>M</sup>R**<sup>M</sup>** implies that <sup>M</sup> and **<sup>M</sup>** simulate each other on heaps related by *any* β that relates root to **0**. In particular, this means that neither M nor **M** can be sensitive to the *specific* addresses allocated during the run of the program. However, they can be sensitive to the "shape" of the heap or the values stored in the heap. Note that the union of any two bisimulations is a bisimulation. Hence, there is a largest bisimulation, which we denote as ≈. Intuitively, <sup>M</sup> <sup>≈</sup>**<sup>M</sup>** implies that <sup>M</sup> and **<sup>M</sup>** encode the same safety property (up to the aforementioned relation on values ≈β). With all the boilerplate for *RSC* in place, we state our main theorem.

#### Theorem 1 (· LU **<sup>L</sup><sup>P</sup>** attains *RSC*). · LU **<sup>L</sup><sup>P</sup>** : *RSC*

We outline our proof of Theorem 1, which relies on a backtranslation ·**<sup>L</sup><sup>P</sup>** LU . Intuitively, ·**<sup>L</sup><sup>P</sup>** LU takes a target trace α and builds a *set* of source contexts such that *one* of them when linked with C, produces a related trace α in the source (Theorem 2). In prior work, backtranslations return a single context [10,11,21,

Fig. 1. Example of a trace and its backtranslated code.

28,50,53,59]. This is because they all, explicitly or implicitly, assume that ≈ is injective from source to target. Under this assumption, the backtranslation is unique: a target value **v** will be related to at most one source value v. We do away with this assumption (e.g., the target value **0** is related to both source values 0 and true) and thus there can be multiple source values related to any given target value. This results in a set of backtranslated contexts, of which at least one will reproduce the trace as we need it.

We bypass the lengthy technical setup for this proof and provide an informal description of why the backtranslation achieves what it is supposed to. As an example, Fig. <sup>1</sup> contains a trace <sup>α</sup> and the the output of α**<sup>L</sup><sup>P</sup>** LU .

·**<sup>L</sup><sup>P</sup>** LU first generates empty method bodies for all context methods called by the compiled component. Then it backtranslates each *action* on the given trace, generating code blocks that mimic that action and places that code inside the appropriate method body. Figure 1 shows the code blocks generated for each action. Backtranslated code maintains a support data structure at runtime, a list of locations denoted L where locations are added (::) and they are looked up (L(n)) based on their second field n, which is their target-level address. In order to backtranslate the first call, we need to set up the heap with the right values and then perform the call. In the diagram, dotted lines describe which source statement generates which part of the heap. The return only generates code that will update the list L to ensure that the context has access to all the locations it knows in the target too. In order to backtranslate the last call we lookup the locations to be updated in L so we can ensure that when the call f 2 statement is executed, the heap is in the right state.

For the backtranslation to be used in the proof we need to prove its correctness, i.e., that α**<sup>L</sup><sup>P</sup>** LU generates a context <sup>A</sup> that, together with <sup>C</sup>, generates a trace α related to the given target trace α.

Theorem 2 (·**<sup>L</sup><sup>P</sup>** LU is correct)

*if* **A** C LU **L<sup>P</sup>** <sup>α</sup>==<sup>⇒</sup> **<sup>Ω</sup>** *then* <sup>∃</sup><sup>A</sup> ∈ α**<sup>L</sup><sup>P</sup>** LU .A[C] <sup>α</sup>==<sup>⇒</sup> <sup>Ω</sup> *and* <sup>α</sup> <sup>≈</sup><sup>β</sup> <sup>α</sup> *and* <sup>Ω</sup> <sup>≈</sup><sup>β</sup> **<sup>Ω</sup>**.

This theorem immediately implies that · LU **<sup>L</sup><sup>P</sup>** : *PF*-*RSC* , which, by Theorem <sup>3</sup> below, implies that · LU **<sup>L</sup><sup>P</sup>** : *RSC* .

Theorem 3 (*PF-RSC* and *RSC* are equivalent for · LU **<sup>L</sup><sup>P</sup>** ).

$$\vdash \left\lVert \cdot \right\rVert\_{\text{LP}}^{\text{L}^{\text{U}}} : PF\text{-}RSC \iff \vdash \left\lVert \cdot \right\rVert\_{\text{LP}}^{\text{L}^{\text{U}}} : RSC$$

*Example 4 (Compiling a secure program).* To illustrate *RSC* at work, let us consider the following source component Ca, which manages an account whose balance is security-relevant. Accordingly, the balance is stored in a location (root that is tracked by the monitor. C<sup>a</sup> provides functions to deposit to the account as well as to print the account balance.

```
deposit(x) → let q=abs(x) in let amt = !root in root := amt + q
balance() → !root
```
C<sup>a</sup> never leaks any sensitive location (root) to an attacker. Additionally, an attacker has no way to decrement the amount of the balance since deposit only adds the absolute value abs(x) of its input x to the existing balance.

By compiling <sup>C</sup><sup>a</sup> with · LU **<sup>L</sup><sup>P</sup>** , we obtain the following target program.

```
deposit(x) → let q=abs(x) in
              let amt=!0 with kroot in 0 := amt + q with kroot
 balance() → !0 with kroot
```
Recall that location root is mapped to location **0** and protected by the **kroot** capability. In the compiled code, while location **0** is freely computable by a target attacker, capability **kroot** is not. Since that capability is not leaked to an attacker, an attacker will not be able to tamper with the balance stored in location **0**. -

# 4 *RSC* via Bisimulation

If the source language has a verification system that enforces robust safety, proving that a compiler attains *RSC* can be simpler than that of Sect. 3—it may not require a back translation. To demonstrate this, we consider a specific class of monitors, namely those that enforce type invariants on a specific set of locations. Our source language, L<sup>τ</sup> , is similar to L<sup>U</sup> but it has a type system that accepts only those source programs whose traces the source monitor never rejects. Our compiler · Lτ **<sup>L</sup>**<sup>π</sup> is directed by typing derivations, and its proof of *RSC* establishes a specific cross-language invariant on program execution, rather than a backtranslation. A second, independent goal of this section is to show that *RSC* is compatible with concurrency. Consequently, our source and target languages include constructs for forking threads.

# 4.1 The Source Language **L***<sup>τ</sup>*

<sup>L</sup><sup>τ</sup> extends <sup>L</sup><sup>U</sup> with concurrency, so it has a fork statement ( <sup>s</sup>), processes and process soups [19]. Components define a set of safety-relevant locations Δ, so <sup>C</sup> ::= <sup>Δ</sup>; <sup>F</sup>; <sup>I</sup> and heaps carry type information, so <sup>H</sup> ::= <sup>∅</sup> <sup>|</sup> <sup>H</sup>; <sup>→</sup> <sup>v</sup> : <sup>τ</sup> . <sup>Δ</sup> also specifies a type for each safety-relevant location, so <sup>Δ</sup> ::= <sup>∅</sup> <sup>|</sup> <sup>Δ</sup>; ( : <sup>τ</sup> ).

L<sup>τ</sup> has an unconventional type system that enforces *robust type safety* [1,14, 31,34,45,58], which means that no context can cause the static types of sensitive heap locations to be violated at runtime. Using a special type UN that is described below, a program component statically partitions heap locations it deals with into those it cares about (sensitive or "trusted" locations) and those it does not care about ("untrusted" locations). Call a value *shareable* if only untrusted locations can be extracted from it using the language's elimination constructs. The type system then ensures that a program component only ever shares shareable values with the context. This ensures that the context cannot violate any invariants (including static types) of the trusted locations, since it can never gets direct access to them.

Technically, the type system considers the types <sup>τ</sup> ::= Bool <sup>|</sup> Nat <sup>|</sup> <sup>τ</sup> <sup>×</sup> <sup>τ</sup> <sup>|</sup> Ref <sup>τ</sup> <sup>|</sup> UN and the following typing judgements (<sup>Γ</sup> maps variables to types).

Type UN stands for "untrusted" or "shareable" and contains all values that can be passed to the context. Every type that is not a subtype of UN is implicitly trusted and cannot be passed to the context. Untrusted locations are explicitly marked UN at their allocation points in the program. Other types are deemed shareable via subtyping. Intuitively, a type is safe if values in it can only yield locations of type UN by the language elimination constructs. For example, UN <sup>×</sup> UN is a subtype of UN. We write <sup>τ</sup> ◦ to mean that <sup>τ</sup> is a subtype of UN.

Further, L<sup>τ</sup> contains an *endorsement* statement (endorse x = e as ϕ in s) that dynamically checks the top-level constructor of a value of type UN and gives it a more precise superficial type <sup>ϕ</sup> ::= Bool <sup>|</sup> Nat <sup>|</sup> UN <sup>×</sup> UN <sup>|</sup> Ref UN [24]. This allows a program to safely inspect values coming from the context. It is similar to existing type casts [48] but it only inspects one structural layer of the value (this simplifies the compilation).

The operational semantics of L<sup>τ</sup> updates that of L<sup>U</sup> to deal with concurrency and endorsement. The latter performs a runtime check on the endorsed value [62].

Monitors <sup>M</sup> ::= ({<sup>σ</sup> ···} ,-, σ0, Δ, σc) check at runtime that the set of trusted heap locations Δ have values of their intended static types. Accordingly, the description of the monitor includes a list of trusted locations and their expected types (in the form of an environment Δ). The type τ of any location in <sup>Δ</sup> must be trusted, so <sup>τ</sup> ◦. To facilitate checks of the monitor, every heap location carries a type at runtime (in addition to a value). The monitor transitions should therefore be of the form (σ, Δ, σ), but since Δ never changes, we write the transitions as (σ, σ).

A monitor and a component agree if they have the same Δ: M C def = ({<sup>σ</sup> ···} ,-, σ0, Δ, σc)(Δ; F; I). Other definitions (safety, robust safety and actions) are as in Sect. 2. Importantly, a well-typed component generates traces that are always accepted, so every component typed at UN is robustly safe.

### Theorem 4 (Typability Implies Robust Safety in L<sup>τ</sup> )

*If* <sup>C</sup> : UN *and* <sup>C</sup> <sup>M</sup> *then* <sup>M</sup> <sup>C</sup> : rs

*Richer Source Monitors.* In L<sup>τ</sup> , source language monitors only enforce the property of type safety on specific memory locations (robustly). This can be generalized substantially to enforce arbitrary invariants other than types on locations. The only requirement is to find a type system (e.g., based on refinements or Hoare logics) that can enforce robust safety in the source (cf. [68]). Our compilation and proof strategy should work with little modification. Another easy generalization is allowing the set of locations considered by the monitor to grow over time, as in Sect. 3.

#### 4.2 The Target Language **L***<sup>π</sup>*

Our target language, **L**<sup>π</sup>, extends the previous target language **L<sup>P</sup>**, with support for concurrency (forking, processes and process soups), atomic co-creation of a protected location and its protecting capability (**let x** = **newhide e in s**) and for examining the top-level construct of a value (**destruct x** = **e as B in s or s** ) according to a pattern (**<sup>B</sup>** ::= **nat** <sup>|</sup> **pair**).

$$\begin{array}{c} \{\mathsf{EL}^{\mathsf{T}}\text{-destmut-nat}\} \\ \hline \mathsf{H}\rhd \mathsf{e} \stackrel{\scriptstyle}{\longleftarrow} \mathsf{e} \stackrel{\scriptstyle}{\longrightarrow} \mathsf{n} \\ \hline \mathsf{C}, \mathsf{H}\rhd \text{-destract } \mathsf{x} = \mathsf{e} \text{ as } \mathsf{nat} \text{ in } \mathsf{s} \text{ or } \mathsf{s}' \stackrel{\scriptstyle}{\longrightarrow} \mathsf{C}, \mathsf{H}\rhd \mathsf{s} \, \mathsf{[n/ \, \mathsf{x}\]} \\ \end{array} \\ \begin{array}{c} \{\mathsf{EL}^{\mathsf{T}}\text{-} \mathsf{new} \mathsf{y}\} \\ \hline \mathsf{H}\rhd \mathsf{x} = \mathsf{H}\_{1}; \mathsf{n} \mapsto \begin{pmatrix} \mathsf{v}, \mathsf{n} \\ \end{pmatrix} \quad \mathsf{H}\rhd \mathsf{e} \stackrel{\scriptstyle}{\longleftarrow} \mathsf{s} \, \mathsf{x}' \, \mathsf{dom}(\mathsf{H}) \\ \end{array} \begin{array}{c} \mathsf{s}' = \mathsf{s}[\langle \mathsf{n} + 1, \mathsf{k} \rangle / \langle \mathsf{x} \rangle] \\ \end{array} \\ \begin{array}{c} \mathsf{C}, \mathsf{H}\rhd \mathsf{x} = \mathsf{n} \, \mathsf{while} \, \mathsf{e} \, \mathsf{in} \, \mathsf{s} \, \ \mathsf{x} = \mathsf{e} \, \mathsf{N}, \mathsf{H} \mathsf{n} + 1 \leftrightarrow \mathsf{v} : \mathsf{k}; \mathsf{k} \mathsf{} \mathsf{s} \, \mathsf{s}' \} \\ \end{array} \end{array}$$

Monitors are also updated to consider a fixed set of locations (a heap **H0**), so **<sup>M</sup>** ::= ({<sup>σ</sup> ···} ,-, σ**0**, **H0**, σ**c**). The atomic creation of capabilities is provided to match modern security architectures such as Cheri [71] (which implement capabilities at the hardware level). This atomicity is not strictly necessary and we prove that *RSC* is attained both by a compiler relying on it and by one that allocates a location and then protects it non-atomically. The former compiler (with this atomicity in the target) is a bit easier to describe, so for space reasons, we only describe that here and defer the other one to the companion report [61].

# 4.3 Compiler from **L***<sup>τ</sup>* to **<sup>L</sup>***<sup>π</sup>*

The high-level structure of the compiler, · Lτ **<sup>L</sup>**<sup>π</sup> , is similar to that of our earlier compiler · LU **<sup>L</sup><sup>P</sup>** (Sect. 3.3). However, · Lτ **<sup>L</sup>**<sup>π</sup> is defined by induction on the type derivation of the component to be compiled. The case for allocation (presented below) explicitly uses type information to achieve security efficiently, protecting only those locations whose type is not UN.

 Δ, Γ e : τ C, Δ, Γ; x : Ref τ s C, Δ, Γ let x = new<sup>τ</sup> e in s Lτ **L**<sup>π</sup> = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎩ **let xo** <sup>=</sup> **new** Δ, <sup>Γ</sup> <sup>e</sup> : <sup>τ</sup> Lτ **L**<sup>π</sup> **in let x** <sup>=</sup> **xo**, **<sup>0</sup> in** C, <sup>Δ</sup>, <sup>Γ</sup>; <sup>x</sup> : Ref <sup>τ</sup> <sup>s</sup> Lτ **L**<sup>π</sup> if τ = UN **let x** <sup>=</sup> **newhide** Δ, <sup>Γ</sup> <sup>e</sup> : <sup>τ</sup> Lτ **L**<sup>π</sup> **in** C, <sup>Δ</sup>, <sup>Γ</sup>; <sup>x</sup> : Ref <sup>τ</sup> <sup>s</sup> Lτ **L**<sup>π</sup> otherwise

*New Monitor Relation.* As monitors have changed, we also need a new monitor relation <sup>M</sup> <sup>≈</sup>**M**. Informally, a source and a target monitor are related if the target monitor can always step whenever the target heap satisfies the types specified in the source monitor (up to renaming by the partial bijection β).

We write <sup>H</sup> : <sup>Δ</sup> to mean that for each location <sup>∈</sup> <sup>Δ</sup>, <sup>H</sup>() : <sup>Δ</sup>(). Given a partial bijection β from source to target locations, we say that a target monitor **<sup>M</sup>** <sup>=</sup> ({<sup>σ</sup> ···} ,-, σ**0**, **<sup>H</sup>0**, σ**c**) is good, written **<sup>M</sup>** : β, <sup>Δ</sup>, if for all <sup>σ</sup> <sup>∈</sup> {<sup>σ</sup> ···} and all <sup>H</sup> <sup>≈</sup><sup>β</sup> **<sup>H</sup>** such that <sup>H</sup> : <sup>Δ</sup>, there is a <sup>σ</sup> such that (σ, **<sup>H</sup>**, <sup>σ</sup> ) <sup>∈</sup> -. For a fixed partial bijection β<sup>0</sup> between the domains of Δ and **H0**, we say that the source monitor <sup>M</sup> and the target monitor **<sup>M</sup>** are related, written <sup>M</sup> <sup>≈</sup>**M**, if **<sup>M</sup>** : <sup>β</sup>0, <sup>Δ</sup> for the <sup>Δ</sup> in <sup>M</sup>. With this setup, we define *RSC* as in Sect. 2.

#### Theorem 5 (Compiler · Lτ **<sup>L</sup>**<sup>π</sup> attains *RSC*). · Lτ **<sup>L</sup>**<sup>π</sup> : *RSC*

To prove that · Lτ **<sup>L</sup>**<sup>π</sup> attains *RSC* we do not rely on a backtranslation. Here, we know statically which locations can be monitor-sensitive: they must all be trusted, i.e., must have a type <sup>τ</sup> satisfying <sup>τ</sup> ◦. Using this, we set up a simple cross-language relation and show it to be an invariant on runs of source and compiled target components. The relation captures the following:


We need to prove that this relation is preserved by reductions both in compiled and in attacker code. The former follows from source robust safety (Theorem 4). The latter is simple since all trusted locations are protected with capabilities, attackers have no access to trusted locations, and capabilities are unforgeable and unguessable (by the semantics of **L**π). At this point, knowing that monitors are related, and that source traces are always accepted by source monitors, we can conclude that target traces are always accepted by target monitors too. Note that this kind of an argument requires all compilable source programs to be robustly safe and is, therefore, impossible for our first compiler · LU **<sup>L</sup><sup>P</sup>** . Avoiding the backtranslation results in a proof much simpler than that of Sect. 3.

# 5 Fully Abstract Compilation

Our next goal is to compare *RSC* to *FAC* at an intuitive level. We first define fully abstract compilation or *FAC* (Sect. 5.1). Then, we present an example of how *FAC* may result in inefficient compiled code and use that to present in Sect. 5.2 what would be needed to write a fully abstract compiler from L<sup>U</sup> to **L<sup>P</sup>** (the languages of our first compiler). We use this example to compare *RSC* and *FAC* concretely, showing that, at least on this example, *RSC* permits more efficient code and affords simpler proofs that *FAC* .

However, this does not imply that one should always prefer *RSC* to *FAC* blindly. In some cases, one may want to establish full abstraction for reasons other than security. Also, when the target language is typed [10,11,21,50] or has abstractions similar to those of the source, full abstraction may have no downsides (in terms of efficiency of compiled code and simplicity of proofs) relative to *RSC*. However, in many settings, including those we consider, target languages are not typed, and often differ significantly from the source in their abstractions. In such cases, *RSC* is a worthy alternative.

#### 5.1 Formalising Fully Abstract Compilation

As stated in Sect. 1, *FAC* requires the preservation and reflection of observational equivalence, and most existing work instantiates observational equivalence with contextual equivalence (*ctx* ). Contextual equivalence and *FAC* are defined below. Informally, two components *C<sup>1</sup>* and *C<sup>2</sup>* are contextually equivalent if no context *A* interacting with them can tell them apart, i.e., they are *indistinguishable*. Contextual equivalence can encode security properties such as confidentiality, integrity, invariant maintenance and non-interference [6,9,53,60]. We do not explain this well-known observation here, but refer the interested reader to the survey of Patrignani *et al.* [54]. Informally, a compiler · S **<sup>T</sup>** is fully abstract if it translates (only) contextually-equivalent source components into contextuallyequivalent target ones.

#### Definition 4 (Contextual equivalence and fully abstract compilation).

*<sup>C</sup><sup>1</sup> ctx <sup>C</sup><sup>2</sup>* def <sup>=</sup> <sup>∀</sup>A.A [*C<sup>1</sup>* ] ⇑ ⇐⇒ <sup>A</sup> [*C<sup>2</sup>* ] ⇑, *where* ⇑ *means execution divergence* · S **<sup>T</sup>** : *FAC* def <sup>=</sup> <sup>∀</sup>C1, <sup>C</sup>2. <sup>C</sup><sup>1</sup> *ctx* <sup>C</sup><sup>2</sup> ⇐⇒ C1 S **<sup>T</sup>** *ctx* C2 S **T**

The security-relevant part of *FAC* is the ⇒ implication [29]. This part is security-relevant because the proof thesis concerns target contextual equivalence (*ctx* ). Unfolding the definition of *ctx* on the right of the implication yields a universal quantification over all possible target contexts **A**, which captures malicious attackers. In fact, there may be target contexts **A** that can interact with compiled code in ways that are impossible in the source language. Compilers that attain *FAC* with untyped target languages often insert checks in compiled code that detect such interactions and respond to them securely [60], often by halting the execution [6,9,29,37,39,42,53,54]. These checks are often inefficient, but must be performed even if the interactions are not security-relevant. We now present an example of this.

*Example 5 (Wrappers for heap resources).* Consider a password manager written in an object-oriented language that is compiled to an assembly-like language. The password manager defines a private List object where it stores the passwords locally. Shown below are two implementations of the newList method inside List which we call Cone and Ctwo. The only difference between Cone and Ctwo is that Ctwo allocates two lists internally; one of these (shadow) is used for internal purposes only.

```
1 public newList(): List{
2
3 ell = new List();
4 return ell;
5 }
```

```
1 public newList(): List{
2 shadow = new List(); // diff
3 ell = new List();
4 return ell;
5 }
```
Cone and Ctwo are equivalent in a source language that does not allow pointer comparison (like our source languages). To attain *FAC* when the target allows pointer comparisons (as in our target languages), the pointers returned by newList in the two implementations must be the same, but this is very difficult to ensure since the second implementation does more allocations. A simple solution to this problem is to wrap ell in a proxy object and return the proxy [9,47,53,59]. Compiled code needs to maintain a lookup table mapping the proxy to the original object and proxies must have allocation-independent addresses. Proxies work but they are inefficient due to the need to look up the table on every object access. -

In this example, *FAC* forces all privately allocated locations to be wrapped in proxies. However, *RSC* does not require this. Our target languages **L<sup>P</sup>** and **L**<sup>π</sup> support address comparison (addresses are natural numbers in their heaps) but · LU **<sup>L</sup><sup>P</sup>** and · Lτ **<sup>L</sup>**<sup>π</sup> just use capabilities to attain security efficiently while · Lτ **L<sup>I</sup>** relies on memory isolation. On the other hand, for attaining *FAC*, capabilities alone would be insufficient since they do not hide addresses. We explain this in detail in the next subsection.

*Remarks.* Our technical report lists many other cases of *FAC* forcing securityirrelevant inefficiency in compiled code [61]. All of these can be avoided by just replacing contextual equivalence with a different notion of equivalence in the statement of *FAC*. However, it is not clear how this can be done generally for any given kind of inefficiency, and what the security consequences of such instantiations of the statement of *FAC* are. On the other hand, *RSC* is *uniform* and it does not induce any of these inefficiencies.

A security issue that cannot be addressed just by tweaking equivalences is information leaks on side channels, as side channels are, by definition, not expressible in the language. Neither *FAC* nor *RSC* deals with side channels.

# 5.2 Towards a Fully Abstract Compiler from **LU** to **<sup>L</sup><sup>P</sup>**

To further compare *FAC* and *RSC*, we now sketch what *would* be needed to construct a fully abstract compiler from L<sup>U</sup> to **L<sup>P</sup>**. In particular, this compiler should not suffer from the "attack" described in Example 5.

*Inefficiency.* We denote with · LU **<sup>L</sup><sup>P</sup>** a (hypothetical) new compiler from <sup>L</sup><sup>U</sup> to **L<sup>P</sup>** that attains *FAC*. We describe informally what code generated by this compiler would have to do. We know that fully abstract compilation preserves *all* source abstractions in the target language. One abstraction that distinguishes **L<sup>P</sup>** from L<sup>U</sup> is that locations are abstract in **L<sup>P</sup>**, but concrete natural numbers in L<sup>U</sup>. Thus, locations allocated by compiled code must not be passed directly to the context as this would reveal the allocation order. Instead of passing the location **n**, **k** to the context, the compiler arranges for an opaque handle **n** , **kcom** (that cannot be used to access any location directly) to be passed. Such an opaque handle is often called a *mask* or *seal* in the literature [66]. 

To ensure that masking is done properly, · LU **<sup>L</sup><sup>P</sup>** can insert code at entry and exit points of compiled code, *wrapping* the compiled code in a way that enforces masking [32,59]. The wrapper keeps a list **L** of component-allocated locations that are shared with the context in order to know their masks. When a component-allocated location is shared, it is added to the list **L**. The mask of a location is its index in this list. If the same location is shared again it is not added again but its previous index is used. To implement lookup in **L** we must compare capabilities too, so we need to add that expression to the target language. To ensure capabilities do not leak to the context, the second field of the pair is a constant capability **kcom** which compiled code does not use otherwise. Clearly, this wrapping can increase the cost of all cross-component calls and returns.

However, this wrapping is not sufficient to attain *FAC*. A componentallocated location could be passed to the context on the heap, so before passing control to the context the compiled code needs to *scan the whole heap* where a location can be passed and mask all found component-allocated locations. Dually, when receiving control the compiled code must scan the heap to unmask any masked location so it can use the location. The problem now is determining what parts of the heap to scan and how. Specifically, the compiled code needs to keep track of all the locations (and related capabilities) that are shared, i.e., (i) passed from the context to the component and (ii) passed from the component to the context. Both keeping track of these locations as well as scanning them on every cross-component control transfer is likely to be *very* expensive.

Finally, masked locations cannot be used directly by the context to be read and written. Thus, compiled code must provide a **read** and a **write** function that implement reading and writing to masked locations. The additional unmasking in these functions (as opposed to native reads and writes) adds to the inefficiency.

It should be clear as opposed to the *RSC* compiler · LU **<sup>L</sup><sup>P</sup>** (Sect. 3), the *FAC* compiler · LU **<sup>L</sup><sup>P</sup>** just sketched is likely to generate far more inefficient code.

*Proof Difficulty.* Proving that · LU **<sup>L</sup><sup>P</sup>** attains *FAC* can only be done by backtranslating *traces*, not contexts alone, since the newly-added target expressions cannot be directly backtranslated to valid source ones [7,9,59]. For this, we need a trace semantics that captures all information available to the context. This is often called a fully abstract trace semantics [38,55,56]. However, the trace semantics we defined for **L<sup>P</sup>** is not fully abstract, as its actions record the entire heap in every action, including private parts of the heap. Hence, we cannot use this trace semantics for proving *FAC* and so we design a new one. Building a fully abstract trace semantics for **L<sup>P</sup>** is challenging because we have to keep track of locations that have been shared with the context in the past. This substantially complicates both the definition of traces and the proofs that build on the definition.

Finally, the source context that the backtranslation constructs from a target trace must simulate the shared part of the heap at every context switch. Since locations in the target may be masked, the source context has to maintain a map from the source locations to the corresponding masked target ones, which complicates the backtranslation and the proof substantially. 

To summarize, it should be clear that the proof of *FAC* for · LU **<sup>L</sup><sup>P</sup>** would be much harder than the proof of *RSC* for · LU **<sup>L</sup><sup>P</sup>** , even though the source and target languages are the same and so is the broad proof technique (backtranslation).

# 6 Related Work

Recent work [8,33] presents new criteria for secure compilation that ensure preservation of subclasses of hyperproperties. Hyperproperties [25] are a formal representation of predicates on programs, i.e., they are predicates on sets of traces. Hyperproperties capture many security-relevant properties including not just conventional safety and liveness, which are predicates on traces, but also properties like non-interference, which is a predicate on pairs of traces. Modulo technical differences, our definition of *RSC* coincides with the criterion of "robust safety property preservation" in [8,33]. We show, through concrete instances, that this criterion can be easily realized by compilers, and develop two proof techniques for establishing it. We further show that the criterion leads to more efficient compiled code than does *FAC*. Additionally, the criteria in [8,33] assume that behaviours in the source and target are represented using the same alphabet. Hence, the definitions (somewhat unrealistically or ideally) do not require a translation of source properties to target properties. In contrast, we consider differences in the representation of behaviour in the source and in the target and this is accounted for in our monitor relation <sup>M</sup> <sup>≈</sup>**M**. A slightly different account of this difference is presented by Patrignani and Garg [60] in the context of reactive black-box programs.

Abate *et al.* [7] define a variant of robustly-safe compilation called RSCC specifically tailored to the case where (source) components can perform undefined behaviour. RSCC does not consider attacks from arbitrary target contexts but from compiled components that can become compromised and behave in arbitrary ways. To demonstrate RSCC, Abate *et al.* [7] rely on two backends for their compiler: software fault isolation and tag-based monitors. On the other hand, we rely on capability machines and memory isolation (the latter in the companion report). RSCC also preserves (a form of) safety properties and can be achieved by relying on a trace-based backtranslation; it is unclear whether proofs can be simplified when the source is verified and concurrent, as in our second compiler.

ASLR [6,37], protected module architectures [9,42,53,59], tagged architectures [39], capability machines [69] and cryptographic primitives [4,5,22,26] have been used as targets for *FAC*. We believe all of these can also be used as targets of *RSC* -attaining compilers. In fact, some targets such as capability machines seem to be better suited to *RSC* than *FAC*, as we demonstrated.

Ahmed *et al.* prove full abstraction for several compilers between typed languages [10,11,50]. As compiler intermediate languages are often typed, and as these types often serve as the basis for complex static analyses, full abstraction seems like a reasonable goal for (fully typed) intermediate compilation steps. In the last few steps of compilation, where the target languages are unlikely to be typed, one could establish robust safety preservation and combine the two properties (vertically) to get an end-to-end security guarantee.

There are three other criteria for secure compilation that we would like to mention: securely compartmentalised compilation (SCC) [39], trace-preserving compilation (TPC) [60] and non-interference-preserving compilation (NIPC) [12, 15,16,27]. SCC is a re-statement of the "hard" part of full abstraction (the forward implication), but adapted to languages with undefined behaviour and a strict notion of components. Thus, SCC suffers from much of the same efficiency drawbacks as *FAC*. TPC is a stronger criterion than *FAC*, that most existing fully abstract compilers also attain. Again, compilers attaining TPC also suffer from the drawbacks of compilers attaining *FAC*.

NIPC preserves a single property: noninterference (NI). However, this line of work does not consider active target-level adversaries yet. Instead, the focus is on compiling whole programs. Since noninterference is not a safety property, it is difficult to compare NIPC to *RSC* directly. However, noninterference can also be approximated as a safety property [20]. So, in principle, *RSC* (with adequate massaging of observations) can be applied to stronger end-goals than NIPC.

Swamy *et al.* [67] embed an F<sup>∗</sup> model of a gradually and robustly typed variant of JavaScript into an F<sup>∗</sup> model of JavaScript. Gradual typing supports constructs similar to our endorsement construct in L<sup>τ</sup> . Their type-directed compiler is proven to attain memory isolation as well as static and dynamic memory safety. However, they do not consider general safety properties, nor a specific, general criterion for compiler security.

Two of our target languages rely on capabilities for restricting access to sensitive locations from the context. Although capabilities are not mainstream in any processor, fully functional research prototypes such as Cheri exist [71]. Capability machines have previously been advocated as a target for efficient secure compilation [30] and preliminary work on compiling C-like languages to them exists, but the criterion applied is *FAC* [69].

# 7 Conclusion

This paper has examined robustly safe compilation (*RSC* ), a soundness criterion for compilers with direct relevance to security. We have shown that the criterion is easily realizable and may lead to more efficient code than does fully abstract compilation wrt contextual equivalence. We have also presented two techniques for establishing that a compiler attains *RSC*. One is an adaptation of an existing technique, backtranslation, and the other is based on inductive invariants.

Acknowledgements. The authors would like to thank Dominique Devriese, Akram El-Korashy, Cătălin Hriţcu, Frank Piessens, David Swasey and the anonymous reviewers for useful feedback and discussions on an earlier draft.

This work was partially supported by the German Federal Ministry of Education and Research (BMBF) through funding for the CISPA-Stanford Center for Cybersecurity (FKZ: 13N1S0762).

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Compiling Sandboxes: Formally Verified Software Fault Isolation

Frédéric Besson1(B) , Sandrine Blazy<sup>1</sup> , Alexandre Dang<sup>1</sup>, Thomas Jensen<sup>1</sup>, and Pierre Wilke<sup>2</sup>

> <sup>1</sup> Inria, Univ Rennes, CNRS, IRISA, Rennes, France frederic.besson@inria.fr

<sup>2</sup> CentraleSupélec, Inria, Univ Rennes, CNRS, IRISA, Rennes, France

Abstract. Software Fault Isolation (SFI) is a security-enhancing program transformation for instrumenting an untrusted binary module so that it runs inside a dedicated isolated address space, called a sandbox. To ensure that the untrusted module cannot escape its sandbox, existing approaches such as Google's Native Client rely on a binary verifier to check that all memory accesses are within the sandbox. Instead of relying on *a posteriori* verification, we design, implement and prove correct a program instrumentation phase as part of the formally verified compiler CompCert that enforces a sandboxing security property *a priori*. This eliminates the need for a binary verifier and, instead, leverages the soundness proof of the compiler to prove the security of the sandboxing transformation. The technical contributions are a novel sandboxing transformation that has a well-defined C semantics and which supports arbitrary function pointers, and a formally verified C compiler that implements SFI. Experiments show that our formally verified technique is a competitive way of implementing SFI.

### 1 Introduction

Isolating programs with various levels of trustworthiness is a fundamental security concern, be it on a cloud computing platform running untrusted code provided by customers, or in a web browser running untrusted code coming from different origins. In these contexts, it is of the utmost importance to provide adequate isolation mechanisms so that a faulty or malicious computation cannot compromise the host or neighbouring computations.

There exists a number of mechanisms for enforcing isolation that intervene at various levels, from the hardware up to the operating system. Hypervisors [10], virtual machines [2] but also system processes [17] can ensure strong isolation properties, at the expense of costly context switches and limited flexibility in the interaction between components. Language-based techniques such as strong typing offer alternative techniques for ensuring memory safety, upon which access control policies and isolation can be implemented. This approach is implemented e.g. by the Java language for which it provides isolation guarantees, as proved by Leroy and Rouaix [21]. The isolation is fined-grained and very flexible but the security mechanisms, e.g. stack inspection, may be hard to reason about [7]. In the web browser realm, JavaScript is dynamically typed and also ensures memory safety upon which access control can be implemented [29].

#### 1.1 Software Fault Isolation

Software Fault Isolation (SFI) is an alternative for unsafe languages, e.g. C, where memory safety is not granted but needs to be enforced at runtime by program instrumentation. Pioneered by Wahbe *et al.* [35] and popularised by Google's Native Client [30,37,38], SFI is a program transformation which confines a software component to a memory sandbox. This is done by pre-fixing every memory access with a carefully designed code sequence which efficiently ensures that the memory access occurs within the sandbox. In practice, the sandbox is aligned and the sandbox addresses are thus of the form 0xY Z where Y is a fixed bit-pattern and <sup>Z</sup> is an arbitrary bit-pattern *i.e.*, <sup>Z</sup> <sup>∈</sup> [0x<sup>0</sup> ... <sup>0</sup>, <sup>0</sup>xF . . . F]. Hence, enforcing that memory accesses are within the sandbox range of addresses can be efficiently implemented by a *masking* operation which exploits the binary representation of pointers: it retains the lowest bits Z and sets the highest bits to the bit-pattern *Y* .

Traditionally, the SFI transformation is performed at the binary level and is followed by an *a posteriori* verification by a trusted SFI verifier [23,31,35]. Because the verifier can assume that the code has undergone the SFI transformation, it can be kept simple (almost syntactic), thereby reducing both verification time and the Trusted Computing Base (TCB). This approach to SFI can be viewed as a simple instance of Proof Carrying Code [25] where the compiler is untrusted and the binary verifier is either trusted or verified.

Traditional SFI is well suited for executing binary code from an untrusted origin that must, for an adequate user experience, start running as soon as possible. Google's Native Client [30,37] is a state-of-the-art SFI implementation which has been deployed in the Chrome web browser for isolating binary code in untrusted pages. ARMor [39] features the first fully verified SFI implementation where the TCB is reduced to the formal ARM semantics in the HOL proofassistant [9]. RockSalt [24] is a formally verified implementation of an SFI verifier for the x86 architecture, demonstrating that an efficient binary verifier can be obtained from a machine-checked specification.

#### 1.2 Software Fault Isolation Through Compilation

A downside of the traditional SFI approach is that it hinders most compiler optimisations because the optimised code no longer respects the simple properties that the SFI verifier is capable of checking. For example, the SFI verifier expects that every memory access is immediately preceded by a specific syntactic code pattern that implements the sandboxing operation. A semantically equivalent but syntactically different code sequence would be rejected. An alternative to the *a posteriori* binary verifier approach is Portable Software Fault Isolation (PSFI), proposed by Kroll *et al.* [16]. In this methodology, there is no verifier to trust. Instead isolation is obtained by compilation with a machine-checked compiler, such as CompCert [18]. Portability comes from the fact that PSFI can reuse existing compiler back-ends and therefore target all the architectures supported by the compiler without additional effort.

PSFI is applicable in scenarios where the source code is available or the binary code is provided by a trusted third-party that controls the build process. For example, the original motivation for Proof Carrying Code [25] was to provide safe kernel extensions [26] as binary code to replace scripts written in an interpreted language. This falls within the scope of PSFI. Another PSFI scenario is when the binary code is produced in a controlled environment and/or by a trusted party. In this case, the primary goal is not to protect against an attacker trying to insert malicious code but to prevent honest parties from exposing a host platform to exploitable bugs. This is the case *e.g.* in the avionics industry, where software from different third-parties is integrated on the same host that needs to ensure strong isolation properties between tasks whose levels of criticality differ. In those cases, PSFI can deliver both security and a performance advantage. In Sect. 8, we provide experimental evidence that PSFI is competitive and sometimes outperforms SFI in terms of efficiency of the binary code.

#### 1.3 Challenges in Formally Verified SFI

PSFI inserts the masking operations during compilation and does away with the *a posteriori* SFI verifier. The challenge is then to ensure that the security, enforced at an intermediate representation of the code, still holds for the running code. Indeed, compiler optimisation often breaks such security [33]. The insight of Kroll *et al.* is that a safety theorem of the compiled code (i.e., that its behaviour is well-defined) can be exploited to obtain a security theorem for that same compiled code, guaranteeing that it makes no memory accesses outside its sandbox. We explain this in more detail in Sect. 2.2.

One challenge we face with this approach is that it is far from evident that the sandboxing operations and hence the transformed program have well-defined behaviour. An unsafe language such as C admits undefined behaviours (e.g. bitwise operations on pointers), which means that it is possible for the observational behaviour of a program to differ depending on the level of optimisation. This is not a compiler bug: compilers only guarantee semantics preservation *if* the code to compile has a well-defined semantics [36]. Therefore, our SFI transformation must turn any program into a program with a well-defined semantics.

The seminal paper of Kroll *et al.* emphasises that the absence of undefined behaviour is a prerequisite but they do not provide a transformation that enforces this property. More precisely, their transformation may produce a program with undefined behaviours (*e.g.* because the input program had undefined behaviours). This fact was one of the motivation for the present work, and explains the need for a new PSFI technique. One difficulty is to remove undefined behaviours due to restrictions on pointer arithmetic. For example, bitwise operators on pointers have undefined C semantics, but traditional masking operations of SFI rely heavily on these operators. Another difficulty is to deal with indirect function calls and ensure that, as prescribed by the C standard, they are resolved to valid function pointers. To tackle these problems, we propose an original sandboxing transformation which unlike previous proposals is compliant with the C standard [13] and therefore has well-defined behaviour.

### 1.4 Contributions

We have developed and proved correct CompCertSfi, the first full-fledged, fully verified implementation of SFI inside a C compiler. The SFI transformation is performed early in the compilation chain, thereby permitting the generated code to benefit from existing optimisations that are performed by the back-end. The technical contributions behind CompCertSfi can be summarised as follows.


The rest of the paper is organised as follows. In Sect. 2, we present background information about the CompCert compiler (Sect. 2.1) and the PSFI approach (Sect. 2.2). Section 3 provides an overview of the layout of the sandbox and the masking operations implementing our SFI. In Sect. 4 we explain how to overcome the problem with undefined pointer arithmetic and define masking operations with a well-defined C semantics. Section 5 describes how control-flow integrity in the presence of function pointers can be achieved by a sligthly more flexible SFI policy which allows reads in well-defined areas outside the sandbox. Section 6 specifies the SFI policy in more detail, and describes the formal Coq proofs of safety and security. Section 7 presents the design of our runtime library and how it exploits compiler support. Experimental results are detailed in Sect. 8. Section 9 presents related work and Sect. 10 concludes.

# 2 Background

This section presents background information about the CompCert compiler [18] and the Portable Software Fault Isolation proposed by Kroll *et al.* [16].

### 2.1 CompCert

The CompCert compiler [18] is a machine-checked compiler programmed and proved correct using the Coq proof-assistant [22]. It compiles C programs down

$$\begin{array}{l}\text{constant}\ni e ::= :i 92 \mid i 64 \mid f \\$ \!\!\!/ 2 \mid f \\$ \!\!\!/ 4 \mid \&\!\!g \mid \!\!\!/ \\$\!\*st\!\!\!\!\/ \\\text{chunk}\ni \kappa ::= is\_8 \mid i u\_8 \mid i u\_{16} \mid i u\_{16} \mid i \_{32} \mid i\_{64} \mid f\_{92} \mid f\_{64} \\\quad \text{expr}\ni e ::= x \mid c \mid \rh \circ e \mid e\_1 \Box e\_2 \mid [e]\_\kappa \\\quad \text{stmt}\ni \ni s ::= \mathtt{skip} \mid x := e \mid [e\_1]\_\kappa := e\_2 \mid \mathtt{return} \; e \mid x := e (e\_1 \ldots \ldots e\_n)\_\sigma \\\quad \quad \mid \quad \mathtt{if} \; e \ \mathtt{then} \; s\_1 \; \mathtt{else} \; s\_2 \mid s\_1; s\_2 \mid \mathbf{loop} \; s \mid \{s\} \mid \mathtt{exit} \; n \mid \mathbf{got} \; lb \end{array}$$

Fig. 1. Cminor syntax

to assembly code through a succession of compiler passes which are shown to be semantics preserving. CompCert features an architecture independent frontend. The back-end supports four main architectures: x86, ARM, PowerPC and RiscV. To target all the back-ends without additional effort, our secure transformation is performed in the compiler front-end, at the level of the Cminor language that is the last architecture-independent language of the CompCert compiler chain. Our transformation can obviously be applied on C programs by first compiling them into Cminor, and then applying the transformation itself.

The Cminor language is a minimal imperative language with explicit stack allocation of certain local variables [19]. Its syntax is given in Fig. 1. Constants range over 32-bit and 64-bit integers but also IEEE floating-point numbers. It is possible to get the address of a global variable *gl* or the address of the stack allocated local variables (i.e., *stk* denotes the address of the current stack frame). In CompCert parlance, a memory chunk κ specifies how many bytes need to be read (resp. written) from (resp. to) memory and whether the result should be interpreted as a signed or unsigned quantity. For instance, the memory chunk is<sup>16</sup> denotes a 16-bit signed integer and f<sup>64</sup> denotes a 64-bit floatingpoint number. In Cminor, memory accesses, written [e]κ, are annotated with the relevant memory chunk κ. Expressions are built from pseudo-registers, constants, unary (-) and binary () operators. CompCert features the relevant unary and binary operators needed to encode the semantics of C. Expressions are side-effect free but may contain memory reads.

Instructions are fairly standard. Similarly to a memory read, a memory store [e1]<sup>κ</sup> = e<sup>2</sup> is annotated by a memory chunk κ. In Cminor, a function call such as e(e<sup>1</sup> ...,en)<sup>σ</sup> represents an indirect function call through a function pointer denoted by the expression e, σ is the signature of the function and e<sup>1</sup> ...,e<sup>n</sup> are the arguments. A direct call is a special case where the expression e is a constant (function) pointer. Cminor is a structured language and features a conditional, a block construct {s} and an infinite loop **loop** <sup>s</sup>. Exiting the n*th* enclosing loop or block can be done using an **exit** n instruction. Cminor is structured but **goto**s towards a symbolic label *lb* are also possible. Returning from a function is done by a return instruction. Cminor is equipped with a small-step operational semantics. The intra-procedural and inter-procedural control flows are modelled using an explicit continuation which therefore contains a call stack.

CompCert Soundness Theorem. Each compiler pass is proved to be semantics preserving using a simulation argument. Theorem 1 states semantics preservation.

Theorem 1 (Semantics Preservation). *If the compilation of program* p *succeeds and generates a target program tp, then for any behaviour beh of program* tp *there exists a behaviour of* p*, beh , such that beh* improves *beh .*

In this statement, a behaviour is a trace of observable events that are typically generated when performing external function calls. CompCert classifies behaviours depending on whether the program terminates normally, diverges or goes wrong. A *goes wrong* behaviour corresponds to a situation where the program semantics gets stuck (i.e., has an undefined behaviour). In this situation, the compiler has the liberty to generate a program with an *improved* behaviour i.e., the semantics of the transformed program may be more defined (i.e., it may not get stuck at all or may get stuck later on).

The consequence is that Theorem 1 is not sufficient to preserve a safety property because the target program *tp* may have behaviours that are not accounted for in the program p and could therefore violate the property. Corollary 1 states that in the absence of going-wrong behaviour, the behaviours of the target program are a subset of the behaviours of the source program.

Corollary 1 (Safety preservation). *Let* p *be a program and tp be a target program. Consider that none of the behaviours of* p *is a going-wrong behaviour. If the compilation of* p *succeeds and generates a target program tp, then any behaviour of program tp is a behaviour of* p*.*

As a consequence, any (safety) property of the behaviours of p is preserved by the target program *tp*. In Sect. 2.2, we show how the PSFI approach leverages Corollary 1 to transfer an isolation property obtained at the Cminor level to the assembly code.

Going-wrong behaviours in CompCert. As safety is an essential property of our PSFI transformation, we give below a detailed account of the going-wrong behaviours of the CompCert languages with a focus on Cminor.

*Undefined evaluation of expressions.* CompCert's runtime values are dynamically typed and defined below:

```
values  v ::= undef | int(i32) | long(i64) | single(f32) | float(f64) | ptr(b, o)
```
Values are built from numeric values (32-bit and 64-bit integers and floating point numbers), the **undef** value representing an indeterminate value, and pointer values made of a pair (b, o) where b is a memory block identifier and o is an offset which, depending on the architecture, is either a 32-bit or a 64-bit integer.

For Cminor, like all languages of CompCert, the unary (-) and binary () operators are not total. They may directly produce going-wrong behaviours *e.g.* in case of division by **int**(0). They may also return **undef** if (i) the arguments are not in the right range *e.g.* the left-shift **int**(i) << **int**(32); or (ii) the arguments are not well-typed *e.g.* **int**(i) +int **float**(f). Pointer arithmetic is strictly conforming to the C standard [13] and any pointer operation that is implementation-defined according to the standard returns **undef**.

$$\begin{array}{lcl} \texttt{ptr}(b,o) \pm \texttt{long}(l) &= \texttt{ptr}(b,o \pm l) \\ \texttt{ptr}(b,o) - \texttt{ptr}(b,o') &= \texttt{long}(o-o') \\ \texttt{ptr}(b,o) \mathbin{\rule{0.0pt}{0.0pt}{0.0pt}}(0) &= \texttt{txt} \end{array} \\ \begin{array}{lcl} \texttt{ptr}(b,o) == \texttt{long}(0) &= \texttt{tif} \end{array} & \mbox{if } W(b,o) \\ \texttt{ptr}(b,o) == \texttt{integer}(b,o') &= o \star o' \quad \mbox{if } W(b,o) \wedge W(b,o') \\ \texttt{ptr}(b,o) == \texttt{ptr}(b',o') &= \texttt{ff} \quad \mbox{if } b \neq b' \wedge V(b,o) \wedge V(b',o') \\ \texttt{ptr}(b,o) \mathbin{\rule{0.0pt}{0.0pt}}(b',o') &= \texttt{txt} \quad \mbox{if } b \neq b' \wedge V(b,o) \wedge V(b',o') \\ \texttt{query} & \star \in \{<, , , =, , \ge, >, \mid !\} \end{array}$$

Fig. 2. Pointer arithmetic in CompCert

The precise semantics of pointer operations is given in Fig. 2. For simplicity, we provide the semantics for a 64-bit architecture. Pointer operations are often only defined provided that the pointers are valid, written V , or weakly valid, written W. This validity condition requires that the offset o of a pointer **ptr**(b, o) is strictly within the bounds of the block b. The weakly valid condition refers to a pointer whose offset is either valid or one-past-the-end of the block b. Any pointer arithmetic operation that is not listed in Fig. 2 returns **undef**. This is in particular the case for bitwise operations which are typically used for the masking operation needed to implement SFI.

The indeterminate value **undef** is not *per se* a going-wrong behaviour. Yet, branching over a test evaluating to **undef**, performing a memory access over an **undef** address and returning **undef** from the main function are going-wrong behaviours.

*Memory accesses* are ruled by a unified memory model [20] that is used throughout the whole compiler. The memory is made of a collection of separated blocks. For a given block, each offset o below the block size is given a permission <sup>p</sup> ∈ {**r**, **<sup>w</sup>**,... } and contains a memory value

$$\vdash mval \ni mv \mathrel{\vbox{\hbox{ $::$ }}} mv \mathrel{\vbox{\hbox{ $::$ }}} \mathtt{undef} \mid \mathtt{byte}(b) \mid [\mathtt{ptr}(b,o)]\_{n} \mathrel{\vbox{\hbox{ $::$ }}}$$

where <sup>b</sup> is a concrete byte value and [**ptr**(b, o)]<sup>n</sup> represents the nth byte of the pointer **ptr**(b, o) for <sup>n</sup> ∈ {<sup>1</sup> ... <sup>8</sup>}. A memory write *storev*(κ, m, a, v) is only defined if the address a is a pointer **ptr**(b, o) to an existing block b such that the memory locations (b, o),...,(b, o<sup>+</sup> <sup>|</sup> <sup>κ</sup> | −1) have the permission **<sup>w</sup>** and the offset o satisfies the alignment constraint of κ. A memory read loadv(κ, m, a) is only defined under similar conditions with the additional restriction that not reading all the consecutive fragments of a pointer returns **undef**.

*Control-flow transfers* may go-wrong if the target of the control-flow transfer is not well-defined. Hence, a **goto** *lb* instruction goes wrong if, in the current function, there is no statement labelled by *lb*; and an **exit** n instruction goes wrong if there are less than n enclosing blocks around the statement containing the exit instruction. A conditional **if** e **then** s<sup>1</sup> **else** s<sup>2</sup> goes wrong if the expression e does not evaluate to **int**(i) for some i. Also, the execution goes wrong if the last statement of a function is not a **return** instruction. Last but not least, a function call x := e(e<sup>1</sup> ...,en)<sup>σ</sup> goes wrong if the expression e does not evaluate to a pointer **ptr**(b, 0) where b is a function pointer with signature σ.

We show in Sect. 4 how our transformation ensures that pointer arithmetic and memory accesses are always well-defined. Section 5 shows how we make sure indirect calls are always correctly resolved. Section 6 shows that, together with other statically checkable verifications, our PSFI transformation rules out all possible going-wrong behaviours.

#### 2.2 Portable Software Fault Isolation

Kroll, Stewart and Appel have pioneered the concept of Portable Software Fault Isolation (PSFI) [16] whereby SFI is enforced by a pass of the compiler front-end that is architecture independent. The main expected advantage is that isolation is implemented, once and for all, for any target architecture. Moreover, the generated code is optimised by the back-end passes of the compiler. Compared to traditional SFI, there is no architecture-specific binary verifier but instead the compiler enters the TCB. The key insight of Kroll *et al.* is to leverage a formally verified compiler, namely CompCert, to transfer a security proof of isolation obtained at the Cminor level through the compiler back-end, with minimal proof effort. In the following, we recall the only basic properties that a Cminor SFI transformation needs to satisfy so that isolation holds at assembly level.

In CompCert's terms, the sandbox is identified by a dedicated memory block *sb*. A Cminor program is secure (Property 1) under the condition that all its memory accesses are performed within the sandbox.

*Property 1 (Program security).* A Cminor program p is secure if all its memory accesses are within the sandbox block *sb*.

After compilation, the assembly code is secure if its observable behaviours are the same as the observable behaviours of the Cminor program. In order to apply CompCert's semantics preservation theorem (more precisely Corollary 1), it remains to ensure that the Cminor program has a well-defined semantics (Property 2).

*Property 2 (Program safety).* A Cminor program p is safe if all its behaviours are well-defined, i.e., not wrong.

Kroll *et al.* state Property 1 by means of an instrumented Cminor semantics which gets stuck in case of memory accesses outside the sandbox. They prove formally that the additional semantic safeguards are never triggered for a transformed program.

Kroll *et al.* also sketch some necessary steps to prove the Property 2 of safety but do not propose a formal proof. This leaves open a number of challenging issues such as whether it is feasible to define a masking operation that has a defined Cminor semantics and how to deal with indirect function calls through function pointers, More generally, the work leaves open whether a formal proof of Property 2 on safety is possible given the restrictions of CompCert's semantics (notably pointer arithmetic) and without relying on axioms asserting properties of an external masking primitive. One of the central contributions of this work is to provide a positive answer to this question and propose solutions to these issues where neither the sandboxing of memory accesses nor the sandboxing of function pointers is part of a TCB. The transformation that circumvents the limitations imposed by pointer arithmetic is original and, we surmise, is a necessary component to transfer security down to assembly. For a precise comparison with Kroll *et al.* see Sect. 9).

### 3 A Thread-Aware Sandbox

The memory address space of a C program is partitioned into a runtime stack of frames, a heap and a dedicated space for global variables. The address space of a sandboxed program is re-organised to fit into a single global variable, *sb*, where the global variables, the heap and the stack frames are relocated. Figure 3a depicts the memory layout of the program after our SFI transformation. Each global variable is relocated and allocated in the sandbox at a given offset, and each global memory access of the program is translated into a memory access in the sandbox. For managing the heap it suffices to use a sandbox-aware malloc implementation that allocates memory inside the sandbox.

To prevent buffer overflows, a standard approach consists in introducing a so-called *shadow stack* that is used to store the function stack frames. Our implementation supports multi-threaded applications and therefore there are as many shadow stacks as there are threads. Upon thread creation, we allocate a novel shadow stack in the sandbox. The shadow-stack pointer is passed as an additional argument to each function call. This is efficient when arguments are passed by register, with the only drawback of reserving an additional register. Frames are allocated by incrementing the shadowstack pointer at function entry. All accesses to the original stack are then translated into accesses to the sandbox shadow stack. The following Example 1 and the code snippet in Fig. 3 illustrate the essence of the transformation.

Fig. 3. Sandbox transformation

*Example 1.* The Cminor program of Fig. 3b declares a global variable g initialised to the 64-bit integer 5. The function foo allocates a stack frame of 8 bytes that will be used to store a 64-bit local variable. By convention, the current stack frame is called stk. The function foo calls the function bar with as arguments the value of g and the address of the local variable stk; and returns the value, presumably updated by bar, of the local variable.

Syntactically, the program of Fig. 3c only performs memory accesses on the global sandbox sb variable. The size of sb variable is <sup>2</sup><sup>k</sup> for some predefined <sup>k</sup>. At thread creation, a shadow stack is allocated by our sandbox-aware malloc in the sandbox after the statically allocated global variables. For our program, the unique global variable g is stored at offset 0 and spans over 8 bytes. Therefore, the initial value of the shadowstack pointer sp is 8. After the transformation, the function foo reserves the space for the local variable stk by incrementing the pseudo-register sp. The function bar is called with the incremented shadow-stack pointer sp1, the value stored at offset 0 in the sandbox (i.e., the value of the global variable g) and the address of the local variable stk which is given by the value of the stack pointer sp. At function exit, the value of the local variable stk is returned by dereferencing the shadow-stack pointer sp.

Our SFI transformation enforces the isolation security policy stipulating that all memory accesses are performed within the sandbox sb—at the Cminor level. However, this holds because the semantics gets stuck (i.e., the semantics *goes wrong*) whenever the program performs an access outside the bounds of the sandbox. As explained earlier, the compiler is free to translate this into an insecure program that would escape the sandbox at runtime. To get a formal security guarantee, it is necessary to transform further the Cminor program to rule out any behaviour that *goes wrong* i.e., ensure Property 2. Given the numerous undefined behaviours of the C language, ruling out any *going-wrong* behaviour may seem a daunting task. In general, this requires to ensure both memory safety and control-flow integrity. The following two sections describe how we can exploit the SFI transformation and the knowledge that all memory accesses are inside the sandbox to ensure both memory safety and control-flow integrity.

# 4 Memory-Safe Masking

For SFI, memory safety is obtained by making sure that every memory access is performed inside the sandbox. Starting from an analysis of the standard SFI solution, we present our own design which satisfies the additional requirements of being compliant with the semantic restrictions of CompCert and with a strict interpretation of the C standard.

#### 4.1 Standard SFI Masking of Addresses

Standard SFI transformations ensure memory safety by masking memory accesses. The gist of it is to allocate a sandbox *sb* of size <sup>2</sup><sup>k</sup> at a <sup>2</sup><sup>k</sup> aligned memory address, say &sb <sup>=</sup> tag <sup>×</sup> <sup>2</sup><sup>k</sup>. Under those constraints, enforcing that an address <sup>A</sup> is within the bounds of the sandbox can essentially be done by replacing the high-address bits by those of *tag*. Using bitwise operations, this can be done by the expression (A&(2<sup>k</sup>−1))|tag×2<sup>k</sup>, where & is the bitwise *and* and <sup>|</sup> is the bitwise *or*. More visually, this can be written (A& 1 ··· <sup>1</sup> - k )|tag <sup>0</sup> ··· <sup>0</sup> - k .

At binary level, this masking transformation is defined and the cost is modest: two bitwise operations. However, this masking operation has no well-defined C semantics. This is also the case for the semantics of CompCert and in particular for the Cminor language. The reason is twofold: bitwise operations over pointer values return **undef** and concrete addresses (e.g. *tag* <sup>×</sup> *<sup>2</sup> <sup>k</sup>* ) are not pointers for CompCert where they are represented by a block and an offset (see Fig. 2).

#### 4.2 Specialised Masking for 32-Bit Sandboxes

For 32-bit sandboxes, there exists a variant of the sandboxing primitive which has the advantages (1) that the sandbox address does not need to be aligned; (2) that the cost of masking may be reduced to a single instruction. In its simplest form, the masking primitive is defined by

$$\begin{array}{cccc}\&sb+(A-\&sb)\_{64\to32\to64} \\ \dots & \dots & \dots \end{array}$$

where &sb is the symbolic address of the sandbox. The subtraction of &sb extracts the offset of the pointer and the double (unsigned) cast 64 → 32 → 64 has the effect of truncating the offset to a 32-bit quantity that is therefore within the bounds of a 32-bit sandbox. At first sight, this masking is less efficient than the standard masking but it is efficient for typical address computations which require both displacement and scaling (e.g. <sup>A</sup> <sup>=</sup> <sup>t</sup> <sup>+</sup> <sup>k</sup> <sup>+</sup> <sup>k</sup> <sup>∗</sup> <sup>i</sup><sup>32</sup>→<sup>64</sup> where <sup>t</sup> is a 64-bit address, <sup>k</sup> and <sup>k</sup> are constants and i is a 32-bit integer). Assuming that each cast or arithmetic operation is mapped to a single instruction<sup>1</sup>, the masked address A can be computed using 8 instructions: 4 instructions for computing the address A and 4 more for the sandboxing primitive. Using simple properties of modular arithmetic, it is possible to distribute the 64 → 32 cast over addition and multiplication to obtain the following equivalent formulation of the sandboxed address:

$$\&sb + A'\_{32\to 64} \quad with \quad A' = t\_{64\to 32} + c\_1 + c\_2 \* i$$

where <sup>c</sup><sup>1</sup> and <sup>c</sup><sup>2</sup> are compile-time constants: <sup>c</sup><sup>1</sup> = (<sup>k</sup> <sup>−</sup> &sb)<sup>64</sup>→<sup>32</sup> and <sup>c</sup><sup>2</sup> <sup>=</sup> <sup>k</sup> <sup>64</sup>→<sup>32</sup>. Using this formulation, the address A still requires 4 instructions but the cost of the sandboxing is reduced to 2 instructions making it on par with the standard sandboxing. On x86, 32-bit registers are just zero-extended 64-bit registers. Therefore, the cast A <sup>32</sup>→<sup>64</sup> is actually redundant and the overhead induced by the sandboxing is reduced to a single instruction. Our experiments (see Sect. 8.2) validate the practical advantage of this encoding.

Still, as for the standard sandboxing, this sanboxing primitive has no semantics in CompCert due to the limitations of pointer arithmetic. As a consequence, the solution of Kroll *et al.* [16] does not give actual code for the masking primitive, but rather axiomatise its behaviour as an external function. This prevents optimisations such as common subexpression elimination or function inlining from happening and induces the cost of a function call for each memory access.

#### 4.3 Towards Well-Defined Pointer Arithmetic

To illustrate the limitations of pointer arithmetic, we examine the semantic behaviour of the standard sandboxing primitive (the specialised sandboxing primitive has similar

<sup>1</sup> Some architecture have rich addressing modes allowing for more compact encodings.

issues). The standard sandboxing primitive can be written (A&(2<sup>k</sup>−1)) <sup>|</sup> &sb where &sb is the address of the sandbox variable. If *sb* is allocated at runtime at address *tag* <sup>×</sup> *<sup>2</sup> <sup>k</sup>* for some tag, this formulation is equivalent at binary level. Again, this heavily relies on pointer arithmetic that is undefined and on information about where the sandbox is linked at runtime.

Consider the alternative formulation (*A*&(*<sup>2</sup> <sup>k</sup>*−*<sup>1</sup>* )) + &*sb* where the bitwise <sup>|</sup> is replaced by a +. This formulation has the advantage that incrementing a pointer, here *sb*, is well-defined (see Fig. 2). As on modern hardware, both addition and bitwise operations take a single cycle, the difference in efficiency should be negligible. Moreover, at least for x86, the addition can be compiled into the addressing mode.

Still, this does not solve our issue. To understand this, suppose that A is a pointer. In this case, the bitwise &, whose purpose is to extract the pointer offset, is still undefined. Therefore, the whole expression (A&(2<sup>k</sup>−1)) + &sb is undefined. Because dereferencing an undefined expression is a *going-wrong* behaviour, the compiled program may have an arbitrary runtime behaviour and escape the sandbox. A prerequisite for our masking primitive is therefore to ensure that the evaluation is defined i.e., different from **undef**. As all the semantic operators of CompCert are strict in **undef** (if any argument is **undef**, so is the result), a necessary condition is that A is not **undef**. As A can be obtained from any expression, a challenge is to ensure that every expression evaluates to a defined value. A particular difficulty is that the many undefined pointer operations (see Fig. 2) cannot be detected by runtime checks.

#### 4.4 Arithmetisation of the Heap

To tackle this challenge and ensure that every computation is defined, we propose an original and radical approach which ensures syntactically that pointers are neither stored in memory nor in local variables. As a result, the program is only manipulating integer values and memory addresses are only constructed by the sandboxing primitives. This approach implies, as a side-effect, that our previously undefined masking primitives are defined. Let *asb* be the runtime address of the symbolic address &sb of the sandbox. The masking of an address A can be written

$$A' + \& sb$$

where <sup>A</sup> is either defined by <sup>A</sup> <sup>=</sup> <sup>A</sup>&(2<sup>k</sup>−1) or <sup>A</sup> = (*<sup>A</sup>* <sup>−</sup> *asb*)*<sup>64</sup>*→*32*→*<sup>64</sup>* . As <sup>A</sup> is necessarily an integer, A is necessarily a defined integer and therefore *<sup>A</sup>* +&*sb* returns a defined pointer **ptr**(sb, o) that is necessarily inside the sandbox.

An additional subtlety is that memory accesses are indexed by a memory chunk κ which mandates an alignment constraint (e.g. the chunk <sup>i</sup><sup>64</sup> mandates an 8-byte aligned address). As a result, the masking primitive is parameterised by the chunk κ and the masking primitive for <sup>i</sup><sup>64</sup> is *<sup>A</sup>* &*mski64* + &*sb* where *mski64* = (*<sup>2</sup> <sup>k</sup>*−*<sup>3</sup>*−*<sup>1</sup>* ) <sup>×</sup> *<sup>2</sup> <sup>3</sup>* .

Only computing over numeric values is facilitated by the fact that the sandboxed program is only manipulating pointers relative to a single object, the sandbox. Therefore, a solution could be to only compute with pointer offsets. This is not totally satisfactory because the null pointer (i.e., 0) would be undistinguishable from the base pointer **ptr**(*sb*, *<sup>0</sup>* ). Instead, we use the integer asb that is the integer runtime address of the sandbox (i.e., we have asb = &sb) and perform the following transformation t over program expressions.

$$\begin{array}{lll} t(\&sb) &= asb\\ t(c) &= c \text{ for } c \in \{i32, i64, f32, f64\}\\ t(\rhd e) &= \blacksquare t(e) \\ t(e\_1 \square e\_2) &= t(e\_1) \blacksquare t(e\_2) \\ t([e]\_\kappa) &= [msk\_\kappa(t(e))] \end{array}$$

The operators and ensure that, if the expressions are well-typed, they never return the **undef** value. Typical examples include division, modulus, and bitwise shifts. We transform expressions so that they evaluate to an arbitrary value when their original semantics is undefined. For example, we transform the left-shift operations on 32-bit integers so that the resulting expression always has a shift amount less than 32:

Similarly, we transform divisions and modulus in the following way, to rule out the undefined cases of division by zero and signed division of MIN\_SIGNED by -1:

> a/b -(a+(a==MIN\_SIGNED & b==-1))/(b+(b==0))*.*

We can prove that the resulting division expression is always defined. Most of the other expressions are always defined and do not need further transformations.

# 5 Enforcement of Control-Flow Integrity

Correct sandboxing of code requires some degree of control-flow integrity. Existing SFI implementations enforce a weak form of control-flow integrity which only ensures that jumps are aligned and within a sandbox of code. This is achieved by inserting a masking operation before indirect jumps, that will mask the target address to ensure that the jump is within the sandbox. Additional padding with no-ops is inserted to ensure that all the instructions are indeed aligned [30,37,38]. We enforce a stronger, more traditional, form of control-flow integrity where any control-flow transfer has a well-defined Cminor semantics.

### 5.1 Relaxation of the Cminor SFI Property

Intraprocedural control-flow integrity is ensured by simple syntactic checks. For instance, they ensure that a **goto** *lb* has a corresponding label *lb* and that an **exit** n has at least n enclosing blocks. The semantics of Cminor prescribes that function calls and returns necessarily match. For this to still hold at the assembly level where the return address is explicitly stored in the stack frame, it is sufficient to prove that the Cminor program has no *going-wrong* behaviour. To ensure control-flow integrity, the only remaining issue is due to indirect calls through function pointers. Our control-flow integrity counter-measure implements software trampolines and ensures that an indirect call with signature σ can only be resolved by a function pointer towards a function with signature σ.

For this purpose, the existing Cminor SFI security policy i.e., Property 1, which rules out any memory access outside the sandbox is too restrictive. As we shall see, the implementation of trampolines necessitates controlled memory reads, outside the sandbox, within compiler-generated variables. To accommodate for this extension, we propose a slightly relaxed SFI security property which, in addition to memory accesses inside the sandbox, authorises other memory reads in read-only regions.

*Property 3.* A Cminor program is secure if all its memory accesses are within either the sandbox block *sb* or some read-only memory.

This relaxed property still ensures the integrity of the runtime because all memory writes are confined to the sandbox. Note that Property 3 and Property 1 are equivalent if the trusted runtime library has no read-only memory. This can be achieved at modest cost by modifying slightly the source code and remove the C type qualifier const which instructs the compiler that the memory is read-only.

#### 5.2 Control-Flow Integrity of Indirect Calls

In Sect. 4, we have eluded the presence of function pointers. They actually perfectly fit our strategy of encoding pointers by integers. In this case, each function pointer is encoded as an index and the trampoline code translates the index into a valid function pointer.

Consider a function f of signature σ and suppose that the function pointer &f is compiled into the index i. The reverse mapping from indexes to function pointers is obtained from a compiler-generated array variable <sup>A</sup><sup>σ</sup> such that <sup>A</sup><sup>σ</sup>[i]=&f. The array variable <sup>A</sup><sup>σ</sup> is made of all the function pointers with signature <sup>σ</sup>. The array variable is also padded with a default function pointer such that its length is a power of two. At the call site, the instruction <sup>e</sup>(e<sup>1</sup> ...,e<sup>n</sup>)<sup>σ</sup> is transformed into [*te*&*msk*<sup>σ</sup> + &*A*σ](*te<sup>1</sup>* ,..., *te<sup>n</sup>* )<sup>σ</sup> where te, te<sup>1</sup> . . . , te<sup>n</sup> are transformed expressions such that all memory accesses are masked and *msk*<sup>σ</sup> is the binary mask ensuring that the index te is within the bounds of the variable A<sup>σ</sup>. In our actual implementation, we optimise direct calls and in this case bypass the trampoline. Therefore, when the expression e is a constant pointer &f to an existing function with signature σ, we generate directly (&f)(te<sup>1</sup> . . . , te<sup>n</sup>). As a result, only C code using indirect calls goes through the trampoline code.

Though our implementation only exploits the relaxation of Property 3 for the sake of trampolines, a more aggressive implementation could sometimes avoid to relocate readonly memory inside the sandbox. This could have a positive impact on optimisations which exploit the immutability of read-only memory.

### 6 Safety and Security Proofs

We next give an overview of our fully verified Coq proof of security and safety.

#### 6.1 Security Proof

Property 3 is an informal formulation of our security property that is formally stated as a Cminor instrumented semantics. This semantics mimics the Cminor semantics with the exception that memory accesses are restricted: a memory read is either performed within the sandbox or in a read-only memory region; a memory write is necessarily performed within the sandbox.

The goal of the security proof is to show that all the memory accesses abide by the restrictions of the instrumented semantics. This is stated by Theorem 2 which establishes that for a transformed program *tp*, no behaviour of the standard Cminor semantics gets stuck for the instrumented Cminor semantics.

Theorem 2 (Security). *For any transformed program tp, every behaviour of tp in the standard semantics of* Cminor *is also a behaviour of tp in the instrumented semantics.*

The proof is based on the standard technique of forward simulation that is used in CompCert to ensure the preservation of semantics by compiler passes. Here, the forward simulation has the distinctive feature of relating the same (transformed) program equipped with a standard and an instrumented semantics. Since the only difference between the two semantics is that memory accesses must be secure, the crux of the proof lies in the correctness of the masking primitive, as stated in the following lemma.

Lemma 1. *For any masked expression* e*, if* e *evaluates to some pointer* **ptr**(b, o)*, then* b *is the block of the sandbox i.e., sb.*

The proof relies on the definition of the masking primitive: a masked expression e is of the form e + &*sb*. Since &*sb* evaluates to the pointer **ptr**(*sb*, *<sup>0</sup>* ), then if the whole expression evaluates to a pointer **ptr**(b, o), necessarily b <sup>=</sup> *sb*.

#### 6.2 Safety Proof

In order to benefit from CompCert's semantic preservation theorem and transport our security proof to the compiled assembly program, we must also prove that the sandboxed program is safe, i.e., it never gets *stuck*. We address all the going-wrong behaviours that we enumerated in Sect. 2.1. The well-formedness properties of a program (calling only defined functions, accessing only defined variables, jumping only to defined labels, exiting from no more blocks than currently enclosed in) are checked statically and make the transformation fail if they are violated. Next, the memory accesses require the addresses to be valid and adequately aligned: our masking operation ensures that this is always the case. Then, the evaluation of expressions must always be defined: this has mostly been dealt with the arithmetisation of the memory (Sect. 4.4). Finally, function calls should always be performed with the appropriate number of well-typed arguments. This is easy to check statically for direct function calls, but requires trampolines (as described in Sect. 5.2) for indirect function calls. The following sandbox invariant encapsulates all these conditions.

Definition 1 (Sandbox Invariant). *A state* S *of program* P *satisfies the sandbox invariant if the following conditions are satisfied:*


Properties 1, 2, 3 are ensured by a set of syntactic checks over the bodies of all the functions of the program. Property 4 is enforced by our function transformation which inserts assignments that explicitly initialise all declared local variables. Property 5 is ensured by construction of the arrays for function pointers. All these properties can be established solely on the program body and do not change during the execution of the program. By contrast, Property 6 cannot be checked statically and depends on the state of the program at each point.

Safe Evaluation of Expressions. A necessary condition for the safe evaluation of expressions is that the program is well typed. CompCert does not generate these type guarantees so we have integrated a verified (simple) type-inference algorithm for Cminor programs. Type-checking alone is not sufficient to rule out undefined behaviours of C operators, but together with the transformations explained in Sect. 4.4, we prove the following lemma about the evaluation of transformed expressions.

Lemma 2 (Safe evaluation of expressions). *In a memory state and a well-typed environment for local variables containing only defined numerical values, the transformation of any well-typed expression* e *evaluates to a defined numerical value.*

Lemma 2 follows directly from the properties of our expression transformation.

Safety of Calls through Trampolines. As mentioned in Sect. 5, we implement software trampolines to secure function calls through function pointers. To ensure the safety of indirect function calls, we maintain a map *smap* from function signatures to the corresponding array identifier and the length of this array. The proof of safety relies on the fact that for every function f of signature σ present in a program, we have *smap*(σ)=(A<sup>σ</sup>, l<sup>σ</sup>) such that all offsets lower than <sup>l</sup><sup>σ</sup> in <sup>A</sup><sup>σ</sup> contain a pointer to a function of signature σ. The safety proof of indirect calls itself is not hard, but we need to set up this signature map and establish invariants relating it to the global environment of the program.

Safety Theorem. Considering the invariants defined in Definition 1, we prove Lemma 3 which is our main technical result.

Lemma 3 (Safety). *For any* Cminor *program state* S *that satisfies the invariants, either* S *is a final state or there exists a sequence of steps from* S *to some* S *such that* S *also satisfies the invariants.*

A subtlety of the proof is that at function entry, the local variables carry the value **undef** and therefore the sandbox invariant only holds after they have been initialised by a sequence of assignments (see Property 4 of Definition 1).

Using Lemma 3, we can show Property 2, in the form of Theorem 3.

Theorem 3 (Safety of the transformation). *All behaviours of the transformed program are well-defined, i.e., not wrong.*

*Proof.* A going-wrong behaviour occurs precisely when a state is reached, from which no further step can be taken, though it is not a final state. Lemma 3, together with a proof that the initial state of the transformed prorgam satisfies the invariants, tells us that no such reachable state exists, concluding the proof.

As a result, we benefit from CompCert's semantic preservation theorem and can transport the security proof down to the assembly program.

Theorem 4 (Security of the compiled program). *Let* p *be a transformed* Cminor *program. If* p *compiles into the assembly program tp, then tp is secure.*

The proof uses Corollary 1 and Theorem 2 to conclude that the behaviours of *tp* are the same as those of p, and hence secure.

# 7 SFI Runtime and Library

Our modified CompCert compiler, CompCertSfi, takes as input a C program unit in the form of a list of C files. Each C file is first compiled down to the Cminor language using the existing passes of the CompCert compiler. Then, all the Cminor programs are syntactically linked [14] together to form the program unit to be isolated inside the sandbox. CompCertSfi comes with a lightweight runtime and a generic support for interfacing with a trusted library (e.g. a libC). An originality of our approach is that the runtime is using a standard program loader. Moreover, the runtime gets some of its configuration through compiler-generated variables.

#### 7.1 Loading the SFI Application

The sandboxed code is linked with our runtime library by a linker script which specifies where to load at runtime the *sb* variable, viewed as the data segment. The compiler also emits a sandbox configuration map which contains the symbolic address of the sandbox, its numeric value at runtime, the total size of the sandbox and the range of addresses reserved for global variables.

Our runtime code is executed before starting the sandboxed main function. It first checks that the sandbox is properly linked according to the sandbox configuration map, sets the shadow-stack pointer and initialises the sandbox heap using our sandbox-aware implementation of malloc based on ptmalloc3<sup>2</sup>.

By construction, our runtime stack is free of buffer overruns. Yet, if the recursion is too deep, the stack may overflow. Therefore, the runtime inserts an unmapped page guard at the bottom of the stack and intercepts the segmentation fault. This protection suffices provided that the size of each function stack frame does not exceed a page; which can be checked at compile-time. Eventually, after copying its arguments inside the sandbox, the runtime calls the main function of the sandboxed application.

#### 7.2 Monitoring Calls to the Runtime Library

The runtime library is trusted and therefore part of the TCB. To ensure isolation, each call towards the runtime library is monitored to check the validity of the arguments. For this purpose, a call to a library function, say foo, is renamed in the object file into a call to a function sb\_foo which sanitises its arguments before really calling the function foo. The verifications are library specific but usually straightforward to implement. For stdio, the FILE structures are allocated by the runtime outside of the sandbox. Hence, the returned FILE\* cannot be dereferenced to corrupt the FILE structure. To prevent the sandboxed program to forge FILE\* pointers, the runtime maintains at all time the set of valid FILE\*. For variadic functions *e.g.*, printf, we statically compile the format into a sequence of safe primitive calls. (We reject programs using formats computed at runtime). For functions in string, we check beforehand that the range of memory accesses is within the range of the sandbox. We also allow callbacks and therefore a runtime function may take a function pointer as argument. To ensure that the function is valid, the runtime is using the trampoline programming pattern presented in Sect. 5.2.

<sup>2</sup> http://www.malloc.de/malloc/ptmalloc3-current.tar.gz.

#### 7.3 Communication via Global Variables

Programs may not only communicate *via* function calls but also directly *via* global variables. For the libC, this includes e.g. stdout or errno. To ensure isolation, CompCertSfi relocates those variables inside the sandbox but also generates a global variable map which is an array variable of the form

$$\{\&n\_1, o\_1, \dots, \&n\_i, o\_i, \dots, \&n\_m, o\_m\}$$

where &n<sup>i</sup> is the symbolic address of a global variable and <sup>o</sup><sup>i</sup> is its offset in the sandbox. Using this information, the runtime has the ability to synchronise the values of the variables inside and outside the sandbox. For example, at program startup, the value of stdout (a stream pointer) is copied inside the sandbox at the relevant offset. This allows the sandboxed program to call stdio functions but protects the integrity of the stream. For errno, it is the responsibility of each runtime library call to synchronise the value of errno in the sandbox.

# 8 Experiments

We have evaluated our PSFI approach over the CompCert benchmark suite and a port of Quake. All the experiments have been carried over a quad-core Intel 6600U laptop at 2.6 GHz with 16 GB of RAM running Linux Fedora 27. For Quake, we explain how to adapt the code to our runtime library and verify the absence of noticeable slowdown. For the other benchmarks, we make a more detailed performance evaluation and compare CompCertSfi with CompCert, gcc, clang but also the state-of-theart (P)NaCl implementation of SFI. In our experiments, all the benchmarks are ordered by increasing running time. Moreover, for computing a runtime overhead, the running time is obtained by taking the harmonic mean of 3 consecutive runs.

#### 8.1 Porting Quake

Quake engines come in various flavours and we use the tyr-quake<sup>3</sup> implementation linking with Xlib. The port requires the addition of several functions to our runtime library from Xlib and the libC. Most of them are not problematic and require no or little modification. For instance, the getopt function which is used to parse commandline options is using the global variables optarg, optind, opterr, and optopt. As explained in Sect. 7.3, the runtime library copies the values of these variables at reserved places inside the sandbox.

Other functions, *e.g.* gethostbyname, allocate memory on their own and return a pointer to this piece of data which is therefore not accessible to the sandboxed code. For the specific case of gethostbyname, the library provides the function gethostbyname\_r which, instead of allocating memory, takes as argument a data-structure that is filled by the function. In our case, we pass as argument a sandbox allocated piece of memory. This does not solve our problem entirely as inner pointers may still point outside the sandbox. To cope with this issue, we perform a deep copy of the relevant piece of data inside the sandbox.

A last issue is that the video memory is shared between the application and the X server using the system call shmat. Fortunately, the libC provides the relevant flags to

<sup>3</sup> https://disenchant.net/git/tyrquake.git.

bind shared memory at a specific address. Hence, we were able to allocate it inside the sandbox thus allowing a seamless communication with the X server. After these modifications, the sandboxed Quake runs without noticeable slowdown which is encouraging and an indication of the good overall performance of our sandboxing technique. In the following, we complement this with a more precise runtime evaluation for the CompCert benchmarks.

#### 8.2 PSFI Overhead: Impact of Sandboxing Primitives

Next, we compare the efficiency of a standard masking primitive (Sect. 4.1) with a specialised version for 32-bit sandboxes (Sect. 4.2).

Figure 4 shows the overhead of the standard sandboxing primitive with respect to the specialised sandboxing primitive. There are 6 benchmarks for which the overhead incurred by the standard sandboxing is above 10% reaching 40% for 2 benchmarks. These cases illustrate the significant performance advantage that is sometime obtained by the specialised sandboxing. For some benchmarks, the standard sandboxing outperforms our optimised sandboxing. Yet when it does it is by a very small margin (below 3%). Overall, for the vast majority of our benchmarks, the specialised sandboxing primitive is very competitive.

In Sect. 4.1, we gave theoretical arguments for the advantage of the specialised sandboxing. Another argument comes from the fact that the specialised sandboxing is easier to optimise. First, note that the standard and the specialised sandboxing primitives are both using a bitwise mask but for different purposes. For the standard primitive, it is used to enforce that the pointer is within the sandbox bounds but also to enforce alignment constraints. For the specialised primitive, it is only used to enforce alignment constraints. Using the existing CompCert dataflow framework, we have implemented an alignment analysis that is quite effective at removing redundant alignment masks. To enable more optimisations, we explicit alignment constraints in the Cminor code program (e.g. by specifying that function arguments of a pointer type are necessarily aligned). Thus, our experimental results are explained by both the theoretical advantages given in Sect. 4.2 and the effectiveness of our alignment analysis.

Fig. 4. Overhead of standard w.r.t specialised sandboxing

#### 8.3 PSFI Overhead: Impact of Compiler Back-End

As a second experiment, we evaluate the overhead of our PSFI transformation for various compilers: CompCert, gcc and clang. CompCert is a *moderately optimising compiler* and the benchmarks run significantly faster using gcc and clang. In Fig. 5, the baseline is given by the minimum of the execution times of the three compilers without PSFI instrumentation. The black bar is the overhead of a compiler (e.g. CompCert), with respect to the baseline and the grey bar is the overhead of the same compiler but with the PSFI transformation (e.g. CompCertSfi). In order to use gcc and clang, we implement a trusted decompiler from our secured Cminor programs to Clight, a subset of C in CompCert. These Clight programs are then compiled with gcc or clang.

For a fair comparison, we should compare programs for which we actually have a reasonable security guarantee. We have a formal proof of security and safety (see Sect. 6) for the sandboxed Cminor program, and we are confident that our syntaxdirected decompiler preserves this property. For CompCert, this would suffice to preserve the security of the compiled Clight code, but this is not the case for gcc and clang because of semantic discrepancies between the compilers. To limit this risk, we have set the compiler flags to instruct gcc and clang to adhere to the specificity of CompCert semantics: signed integer arithmetic is defined and so are wraps around (flag -fwrapv), strict aliasing is irrelevant (flag -fno-strict-aliasing), and floating-point arithmetic is strictly IEEE 754 compliant (flags -frounding-math and -fsignaling-nans). We also instruct the compilers to ignore any knowledge about the C library (-fno-builtin).

Our experimental results are shown in Fig. 5. In Fig. 5a, we have the overhead of CompCert and CompCertSfi. The overhead of CompCert over gcc and clang is expected and corroborates existing results<sup>4</sup>. For 10% of the benchmarks, the overhead CompCertSfi over CompCert is negligible and sometimes the PSFI transformation even improves performance. Those are programs for which the PSFI transformation introduces few masking operations, if any. For 41% of the benchmarks, the overhead is below 10% and can be considered, for most applications, a reasonable efficiency/security trade-off. For all the other benchmarks except binarytrees and vmach, the overhead is below 25%. The two remaining benchmarks have a significant overhead reaching 82% for binarytrees. This corresponds to programs which are memory intensive and where sandboxing cannot be optimised.

In Fig. 5b and c, we perform the same experiments but with gcc and clang. The results have some similarities but also have visible differences. For about 60% of the benchmarks the overhead is below 20%. Moreover, for both compilers, the average overhead is similar: 22% for gccSfi and 24% for clangSfi. Yet, on average gccSfi makes a better job at optimising our benchmarks and best clangSfi for about 75% of the benchmarks. For the rest of the benchmarks, we observe a significant overhead, up to 20%, indicating that the PSFI transformation hinders certain aggressive optimisations. The results also seem to indicate that optimisations are fragile as the overhead is not always consistent across compilers. The case of the integr benchmark is particularly striking because it runs with negligible overhead for clangSfi but exhibits the worst case overhead for gccSfi. The integr program is using a function pointer inside a loop and we suspect that gccSfi, unlike clangSfi, fails to optimise the program due to the inserted trampoline code. Though less striking, the benchmarks fftw and raytracer follow the opposite trend; these are programs where the overhead of clangSfi is much higher than gccSfi.

<sup>4</sup> http://compcert.inria.fr/compcert-C.html#perfs.

Fig. 5. Overhead of PSFI:CompCert, clang, gcc, (P)NaCl

#### 8.4 PSFI Versus (P)NaCl

We also compare our compiler-based SFI approach with (P)NaCl [30], which to our knowledge is one of the most mature implementations of SFI. Figure 5d shows the overhead of CompCertSfi, gccSfi, clangSfi with respect to (P)NaCl. The baseline is given by the best among NaCl and PNaCl. The best of clangSfi and gccSfi is given in dark gray and CompCertSfi is given in light grey.

We first analyse the results of CompCertSfi. Our benchmarks are ordered by increasing runtime. The first 5 benchmarks have a runtime below one second. They are not representative of the performance of both approaches but only illustrate the fact that (P)NaCl has a startup penalty due to the verification of the binary and the setup of the sandbox. The overhead peaks above 75% for two programs (i.e., fib and integr). As the PSFI transformation keeps fib unmodified and only inserts a trampoline call in integr, these programs only highlight the limited optimisations performed by CompCert. Of the remaining benchmarks, 40% of them run faster or have similar speed with CompCertSfi. For those benchmarks, the average overhead of CompCertSfi w.r.t (P)NaCl is around 9%. Except for a few programs whose overhead skyrockets due to CompCert not being specialised for speed, we can say that CompCertSfi performance is comparable to (P)NaCl, having programs with better speed in both sides and a large number having similar results.

We also matched gccSfi/clangSfi against (P)NaCl to compare the impact on performance of more aggressive optimisations. Here 60% of the programs are faster with gccSfi/clangSfi. Among the remaining programs, lzw and chomp are programs for which the (P)NaCl code runs faster than the optimised gcc clang code without the PSFI transformation. As (P)NaCl is based on clang, more investigation is needed to understand this paradox that may be explained by code running outside the sandbox *i.e.* the trusted runtime library. Among the remaining benchmarks, binarytrees and lists still show a noticeable overhead. Those are recursive micro-benchmarks for which our PSFI is costly (see Fig. 5). For lists, 99% of the time is spent in a tight loop where only a single address is masked. For binarytrees, 70% of the time is spent in the runtime code of malloc and free and therefore this highlights the fact that our implementation is less efficient than the (P)NaCl counterpart. Overall these results indicate that our implementation of SFI is competitive with (P)NaCl, given similar compilers. Furthermore speed can be improved with more sandbox-dedicated optimisations; these would be harder for (P)NaCl to check.

# 9 Related Work

Since Wahbe *et al.* [35] proposed their initial technique for SFI, there has been a number of proposals for efficiently confining untrusted software to a memory sandbox (see [23, 24,31,32,34,37,39]). One of the most prominent is Google's Native Client (NaCl) [37], which provides an infrastructure for executing untrusted native code in a web browser. NaCl was specifically targeted at executing computation-intensive applications without incurring a performance penalty. Certain features (in particular self-modifying code) were ruled out. These restrictions were addressed in a subsequent work [3].

RockSalt [24] is an SFI verifier for x86 code which has been developed and formally verified with the proof assistant Coq. The major contribution of RockSalt is to provide a formal model of the x86 architecture, from which it is possible to extract a decoder for a subset of the very rich set of x86 instructions, and build a verifier for the NaCl sandbox policy. Their experiments show that the formally verified checker performs marginally better than the NaCl verifier. In comparison, our approach avoids the complexities of the x86 instruction set by relying on the CompCert compiler back-end to produce binaries whose adherence to the sandbox policy is guaranteed by a combination of a sandbox verification at a higher level (Cminor) and the CompCert's correctness theorem.

ARMor [39] is using the binary rewriter Diablo [28] to implement SFI for ARM processors. Using an untrusted program analysis, a proof of SFI safety is automatically constructed using the HOL theorem prover. ARMor was tested with some programs of the MiBench benchmark [11], namely BitCount and StringSearch. These programs required 2.5 and 8 h respectively to prove the memory safety and control-flow integrity of the executables, which means that the approach is not practically viable as it is.

Kroll *et al.* [16] proposed PSFI as an alternative methodology to the standard, verification-based SFI. In PSFI, the sandbox is built by inserting the necessary masking instructions during compilation. This means that the correctness of the transformation can be argued at an intermediate stage in the compilation where the program representation retains a high-level structure. Our work extends the seminal proposal in a number of ways that we detail below. Unlike Kroll *et al.*, we exclude from the TCB the masking primitive and the trampoline mechanism for calling external functions. In our implementation, these crucial components are written entirely in Cminor and proved correct without introducing trusted, unproved, code. Kroll *et al.* sketch a proof of safety but do not identify the issue of pointer arithmetic. To sidestep the semantics limitation of pointer arithmetic, we introduce a compile-time encoding of pointer as integers. This transformation is instrumental for our Coq verified proof of safety, which itself is mandatory to transfer security down to assembly.

Since the seminal work of Norrish [27], several works propose formal semantics of the C language [8,12,15]. All these share the limitations of CompCert with respect to pointer arithmetic. Recent works specifically aim at providing a more defined semantics for pointers. The proposal of Besson *et al.* [4] is able to cope with most existing low-level pointer manipulations and has been ported to CompCert [5,6]. Yet, it has nonetheless limitations and the design of our PSFI transformation would not benefit from the increased expressiveness. The semantics of Kang *et al.* [14] is more permissive because, after a cast, a pointer is indistinguishable from an integer value. To our knowledge, their semantics has not been ported to the CompCert compiler. Our SFI transformation has the advantage of being compatible with the existing semantics of CompCert with the caveat that pointers needs to be explicitly compiled into integers.

#### 10 Conclusion

We have presented CompCertSfi, a formally verified implementation of Software Fault Isolation based on the CompCert compiler. Our approach provides security guarantees at runtime when the source code may be malicious or has security vulnerabilities but the build process is trusted. This is typically the case when a final product is built using code originating from multiple third parties. Our work shows that it is possible to perform security-enhancing compilation that is both formally verified and competitive with existing approaches in terms of efficiency. CompCertSfi does not rely on *a posteriori* binary verification for guaranteeing security, and hence has a reduced TCB compared to traditional SFI solutions. The reduction in TCB is obtained through a formal, machine-checked proof of the fact that the security guaranteed by our SFI transformation in the compiler front-end, still holds at the assembly level. Key to achieving this property has been to fine-tune the transformation (and in particular its pointer manipulations) to ensure that the secured program has a well-defined semantics.

The impact of SFI has been evaluated on a series of benchmarks, showing that the transformed code can in a few cases be more efficient, and that the average runtime overhead incurred is about 9%. We have evaluated the impact of back-end optimisation on the transformed code on three different compilers. The gains vary, with clang being more efficient than CompCert and gcc, and CompCert being slightly more efficient than gcc. The experiments show that CompCertSfi combined with an aggressive back-end optimiser can sometimes achieve performances superior to Native Client implementations. In addition, there is still room for further optimisation of the generated code. We have observed that existing optimisations are sometimes hindered by our SFI transformation, so we gain by having more optimisation before the SFI transformation. We also intend to investigate optimisations for removing redundant sandboxing operations and in particular hoisting sandboxing outside loops.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Fixing Incremental Computation Derivatives of Fixpoints, and the Recursive Semantics of Datalog**

Mario Alvarez-Picallo1(B) , Alex Eyers-Taylor2, Michael Peyton Jones2(B) , and C.-H. Luke Ong<sup>1</sup>

> <sup>1</sup> University of Oxford, Oxford, UK {mario.alvarez-picallo,luke.ong}@cs.ox.ac.uk <sup>2</sup> Semmle Ltd., Oxford, UK alexet@semmle.com, me@michaelpj.com

**Abstract.** Incremental computation has recently been studied using the concepts of *change structures* and *derivatives* of programs, where the derivative of a function allows updating the output of the function based on a change to its input. We generalise change structures to *change actions*, and study their algebraic properties. We develop change actions for common structures in computer science, including directed-complete partial orders and Boolean algebras. We then show how to compute derivatives of fixpoints. This allows us to perform incremental evaluation and maintenance of recursively defined functions with particular application generalised Datalog programs. Moreover, unlike previous results, our techniques are *modular* in that they are easy to apply both to variants of Datalog and to other programming languages.

**Keywords:** Incremental computation · Datalog · Semantics · Fixpoints

# **1 Introduction**

Consider the following classic Datalog program<sup>1</sup>, which computes the transitive closure of an edge relation e:

$$\begin{aligned} tc(x,y) &\leftarrow e(x,y) \\ tc(x,y) &\leftarrow e(x,z) \land tc(z,y) \end{aligned}$$

The semantics of Datalog tells us that the denotation of this program is the least fixpoint of the rule tc. Kleene's fixpoint Theorem tells us that we can compute this fixpoint by repeatedly applying the rule until the output stops changing, starting from the empty relation. For example, supposing that e = {(1, 2),(2, 3),(3, 4)}, we get the following evaluation trace:

<sup>1</sup> See [1, part D] for an introduction to Datalog.

c The Author(s) 2019

L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 525–552, 2019. https://doi.org/10.1007/978-3-030-17184-1\_19


At this point we have reached a fixpoint, and so we are done.

However, this process is quite wasteful. We deduced the fact (1, 2) at every iteration, even though we had already deduced it in the first iteration. Indeed, for a chain of n such edges we will deduce O(n<sup>2</sup>) facts along the way.

The standard improvement to this evaluation strategy is known as "seminaive" evaluation (see [1, section 13.1]), where we transform the program into a *delta* program with two parts:


In this case our delta rule is simple: we only get new transitive edges at iteration n + 1 if we can deduce them from transitive edges we deduced at iteration n.

$$\begin{aligned} \Delta t c\_0(x, y) &\leftarrow e(x, y) \\ \Delta t c\_{i+1}(x, y) &\leftarrow e(x, z) \land \Delta t c\_i(z, y) \\ t c\_0(x, y) &\leftarrow \Delta t c\_0(x, y) \\ t c\_{i+1}(x, y) &\leftarrow t c\_i(x, y) \lor \Delta t c\_{i+1}(x, y) \end{aligned}$$


This is much better—we have turned a quadratic computation into a linear one. The delta transformation is a kind of *incremental computation*: at each stage we compute the changes in the rule given the previous changes to its inputs.

But the delta rule translation works only for traditional Datalog. It is common to liberalise the formula syntax with additional features, such as disjunction, existential quantification, negation, and aggregation.<sup>2</sup> This allows us to

<sup>2</sup> See, for example, LogiQL [26,32], Datomic [18], Souffle [38,42], and DES [36], which between them have all of these features and more. We do not here explore supporting extensions to the syntax of rule *heads*, although as long as this can be given a denotational semantics in a similar style our techniques should be applicable.

write programs like the following, where we compute whether all the nodes in a subtree given by child have some property p:

$$treeP(x) \leftarrow p(x) \land \neg \exists y. (child(x, y) \land \neg treeP(y))$$

The body of this predicate amounts to recursion through an *universal* quantifier (encoded as ¬∃¬). We would like to be able to use semi-naive evaluation for this rule too, but the standard definition of semi-naive transformation is not well defined for the extended program syntax, and it is unclear how to extend it (and the correctness proof) to handle such cases.

It is possible, however, to write a delta program for treeP by hand; indeed, here is a definition for the delta predicate (the accumulator is as before):<sup>3</sup>

$$\begin{aligned} \Delta\_{i+1}treeP(x) &\leftarrow p(x) \\ &\land \exists y. (child(x, y) \land \Delta\_i treeP(y)) \\ &\land \neg \exists y. (child(x, y) \land \neg treeP\_i(y)) \end{aligned}$$

This is a *correct* delta program (in that using it to iteratively compute treeP gives the right answer), but it is not *precise* because it derives some facts repeatedly. We will show how to construct correct delta programs generally using a program transformation, and show how we have some freedom to optimize within a range of possible alternatives to improve precision or ease evaluation.

Handling extended Datalog is of more than theoretical interest—the research in this paper was carried out at Semmle, which makes heavy use of a commercial Datalog implementation to implement large-scale static program analysis [7,37, 39,40]. Semmle's implementation includes parity-stratified negation<sup>4</sup>, recursive aggregates [34], and other non-standard features, so we are faced with a dilemma: either abandon the new language features, or abandon incremental computation.

We can tell a similar story about *maintenance* of Datalog programs. Maintenance means updating the results of the program when its inputs change, for example, updating the value of tc given a change to e. Again, this is a kind of incremental computation, and there are known solutions for traditional Datalog [25], but these break down when the language is extended.

There is a piece of folkloric knowledge in the Datalog community that hints at a solution: the semi-naive translation of a rule corresponds to the *derivative* of that rule [8,9, section 3.2.2]. The idea of performing incremental computation using derivatives has been studied recently by Cai et al. [14], who give an account using *change structures*. They use this to provide a framework for incrementally evaluating lambda calculus programs.

<sup>3</sup> This rule should be read as: we can newly deduce that x is in treeP if x satisfies the predicate, and we have newly deduced that one of its children is in treeP, and we currently believe that all of its children are in treeP.

<sup>4</sup> Parity-stratified negation means that recursive calls must appear under an even number of negations. This ensures that the rule remains monotone, so the least fixpoint still exists.

However, Cai et al.'s work isn't directly applicable to Datalog: the tricky part of Datalog's semantics are recursive definitions and the need for the *fixpoints*, so we need some additional theory to tell us how to handle incremental evaluation and maintenance of fixpoint computations.

This paper aims to bridge that gap by providing a solid semantic foundation for the incremental computation of Datalog, and other recursive programs, in terms of changes and differentiable functions.

*Contributions.* We start by generalizing change structures to *change actions* (Sect. 2). Change actions are simpler and weaker than change structures, while still providing enough structure to handle incremental computation, and have fruitful interactions with a variety of structures (Sects. 3 and 6.1).

We then show how change actions can be used to perform incremental evaluation and maintenance of non-recursive program semantics, using the formula semantics of generalized Datalog as our primary example (Sect. 4). Moreover, the structure of the approach is modular, and can accommodate arbitrary additional formula constructs (Sect. 4.3).

We also provide a method of incrementally computing and maintaining fixpoints (Sect. 6.2). We use this to perform incremental evaluation and maintenance of *recursive* program semantics, including generalized recursive Datalog (Sect. 7). This provides, to the best of our knowledge, the world's first incremental evaluation and maintenance mechanism for Datalog that can handle negation, disjunction, and existential quantification.

We have omitted the proofs from this paper. Most of the results have routine proofs, but the proofs of the more substantial results (especially those in Sect. 6.2) are included in an extended report [3], along with some extended worked examples, and additional material on the precision of derivatives.

# **2 Change Actions and Derivatives**

Incremental computation requires understanding how values *change*. For example, we can change an integer by adding a natural to it. Abstractly, we have a set of values (the integers), and a set of changes (the naturals) which we can "apply" to a value (by addition) to get a new value.

This kind of structure is well-known—it is a set action. It is also very natural to want to combine changes sequentially, and if we do this then we find ourselves with a monoid action.

Using monoid actions for changes gives us a reason to think that change actions are an adequate representation of changes: any subset of A → A which is closed under composition can be represented as a monoid action on A, so we are able to capture all of these as change actions.

### **2.1 Change Actions**

**Definition 1.** *A* change action *is a tuple:*

$$\hat{A} := (A, \Delta A, \oplus\_A)$$

*where* A *is a set,* ΔA *is a monoid, and* ⊕<sup>A</sup> : A × ΔA → A *is a monoid action on* A*.* 5

*We will call* A *the base set, and* ΔA *the* change set *of the change action. We will use* · *for the monoid operation of* ΔA*, and* **0** *for its identity element. When there is no risk of confusion, we will simply write* ⊕ *for* ⊕A*.*

**Examples.** A typical example of a change action is (A∗, A∗, ++) where A<sup>∗</sup> is the set of finite words (or lists) of A. Here we represent changes to a word made by concatenating another word onto it. The changes themselves can be combined using ++ as the monoid operation with the empty word as the identity, and this is a monoid action: (a ++ b) ++ c = a ++ (b ++ c).

This is a very common case: any monoid (A, ·, **0**) can be seen as a change action (A,(A, ·, **0**), ·). Many practical change actions can be constructed in this way. In particular, for any change action (A, ΔA, ⊕), (ΔA, ΔA, ·) is also a change action. This means that we do not have to do any extra work to talk about changes to changes—we can always take ΔΔA = ΔA (although there may be other change actions available).

Three examples of change actions are of particular interest to us. First, whenever L is a Boolean algebra, we can give it the change actions (L, L,∨) and (L, L,∧), as well as a combination of these (see Sect. 3.2). Second, the natural numbers with addition have a change action <sup>N</sup><sup>ˆ</sup> := (N, <sup>N</sup>, +), which will prove useful during inductive proofs.

Another interesting example of change actions is *semiautomata*. A semiautomaton is a triple (Q, Σ, T), where Q is a set of states, Σ is a (non-empty) finite input alphabet and T : Q×Σ → Q is a transition function. Every semiautomaton corresponds to a change action (Q, Σ∗, T <sup>∗</sup>) on the free monoid over Σ∗, with T <sup>∗</sup> being the free extension of T. Conversely, every change action Aˆ whose change set ΔA is freely generated by a finite set corresponds to a semiautomaton.

Other recurring examples of change actions are:


These are particularly relevant because they are, in a sense, the "smallest" and "largest" change actions that can be imposed on an arbitrary set A.

Many other notions in computer science can be understood naturally in terms of change actions, *e.g.* databases and database updates, files and diffs, Git repositories and commits, even video compression algorithms that encode a frame as a series of changes to the previous frame.

<sup>5</sup> Why not just work with monoid actions? The reason is that while the category of monoid actions and the category of change actions have the same objects, they have different morphisms. See Sect. 8.1 for further discussion.

#### **2.2 Derivatives**

When we do incremental computation we are usually trying to save ourselves some work. We have an expensive function f : A → B, which we've evaluated at some point a. Now we are interested in evaluating f after some change δa to a, but ideally we want to avoid actually computing f(a ⊕ δa) directly.

A solution to this problem is a function f : A × ΔA → ΔB, which given a and δa tells us how to change f(a) to f(a ⊕ δa). We call this a *derivative* of a function.

**Definition 2.** *Let* Aˆ *and* Bˆ *be change actions. A* derivative *of a function* f : A → B *is a function* f : A × ΔA → ΔB *such that*

$$f(a \oplus\_A \delta a) = f(a) \oplus\_B f'(a, \delta a).$$

*A function which has a derivative is* differentiable*, and we will write* <sup>A</sup><sup>ˆ</sup> <sup>→</sup> <sup>B</sup><sup>ˆ</sup> *for the set of differentiable functions between* A *and* B*.* 6

Derivatives need not be unique in general, so we will speak of "a" derivative. Functions into "thin" change actions—where a ⊕ δa = a ⊕ δb implies δa = δb have unique derivatives, but many change actions are not thin. For example, (P(N),P(N),∩) is not thin because {0}∩{1} <sup>=</sup> {0}∩{2}.

Derivatives capture the structure of incremental computation, but there are important operational considerations that affect whether using them for computation actually saves us any work. As we will see in a moment (Proposition 1), for many change actions we will have the option of picking the "worst" derivative, which merely computes f(a ⊕ δa) directly and then works out the change that maps f(a) to this new value. While this is formally a derivative, using it certainly does not save us any work! We will be concerned with both the possibility of constructing correct derivatives (Sects. 3.2 and 6.2 in particular), and also in giving ourselves a range of derivatives to choose from so that we can soundly optimize for operational value.

For our Datalog case study, we aim to cash out the folkloric idea that incremental computation functions via a derivative. We will construct a derivative of the semantics of Datalog in stages: first the non-recursive formula semantics (Sect. 4); and later the full, recursive, semantics (Sect. 7).

#### **2.3 Useful Facts About Change Actions and Derivatives**

**The Chain Rule.** The derivative of a function can be computed compositionally, because derivatives satisfy the standard chain rule.

<sup>6</sup> Note that we do not require that f- (a, δa · δb) = f- (a, δa) · f- (a ⊕ δa, δb) nor that f- (a, **0**) = **0**. These are natural conditions, and all the derivatives we have studied also satisfy them, but none of the results on this paper require them to hold.

**Theorem 1 (The Chain Rule).** *Let* <sup>f</sup> : <sup>A</sup><sup>ˆ</sup> <sup>→</sup> <sup>B</sup>ˆ*,* <sup>g</sup> : <sup>B</sup><sup>ˆ</sup> <sup>→</sup> <sup>C</sup><sup>ˆ</sup> *be differentiable functions. Then* g ◦ f *is also differentiable, with a derivative given by*

$$(g \circ f)'(x, \delta x) = g'(f(x), f'(x, \delta x))$$

*or, in curried form*

$$(g \circ f)'(x) = g'(f(x)) \circ f'(x)$$

**Complete change actions and minus operators.** Complete change actions are an important class of change actions, because they have changes between *any* two values in the base set.

**Definition 3.** *A change action is* complete *if for any* a, b ∈ A*, there is a change* δa ∈ ΔA *such that* a ⊕ δa = b*.*

Complete change actions have convenient "minus operators" that allow us to compute the difference between two values.

**Definition 4.** *A* minus operator *is a function* : A × A → ΔA *such that* a ⊕ (b a) = b *for all* a, b ∈ A*.*

**Proposition 1.** *Given a minus operator , and a function* f*, let*

$$f'\_{\ominus}(a, \delta a) := f(a \oplus \delta a) \ominus f(a)$$

*Then* f *is a derivative for* f*.*

**Proposition 2.** *Let* Aˆ *be a change action. Then the following are equivalent:*


This last property is of the utmost importance, since we are often concerned with the differentiability of functions.

**Products and sums.** Given change actions on sets A and B, the question immediately arises of whether there are change actions on their Cartesian product A × B or disjoint union A + B. While there are many candidates, there is a clear "natural" choice for both.

**Proposition 3 (Products).** *Let* <sup>A</sup><sup>ˆ</sup> = (A, ΔA, <sup>⊕</sup>A) *and* <sup>B</sup><sup>ˆ</sup> = (B, ΔB, <sup>⊕</sup>B) *be change actions.*

*Then* <sup>A</sup><sup>ˆ</sup> <sup>×</sup> <sup>B</sup><sup>ˆ</sup> := (<sup>A</sup> <sup>×</sup> B, ΔA <sup>×</sup> ΔB, <sup>⊕</sup>×) *is a change action, where* <sup>⊕</sup><sup>×</sup> *is defined by:*

$$(a,b)\oplus\_{A\times B}(\delta a,\delta b) := (a\oplus\_A \delta a, b\oplus\_B \delta b),$$

*The projection maps* π1*,*π<sup>2</sup> *are differentiable with respect to it. Furthermore, a function* <sup>f</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> <sup>C</sup> *is differentiable from* <sup>A</sup><sup>ˆ</sup> <sup>×</sup> <sup>B</sup><sup>ˆ</sup> *into* <sup>C</sup><sup>ˆ</sup> *if and only if, for every fixed* a ∈ A *and* b ∈ B*, the partially applied functions*

$$\begin{aligned} f(a, \cdot) &: B \to C \\ f(\cdot, b) &: A \to C \end{aligned}$$

*are differentiable.*

Whenever f : A×B → C is differentiable, we will sometimes use ∂1f and ∂2f to refer to derivatives of the partially applied versions, i.e. if f <sup>a</sup> : B ×ΔB → ΔC and f <sup>b</sup> : A × ΔA → ΔC refer to derivatives for f(a, ·), f(·, b) respectively, then

$$\begin{aligned} \partial\_1 f: A \times \Delta A \times B &\to \Delta C \\ \partial\_1 f(a, \delta a, b) &:= f\_b'(a, \delta a) \\ \partial\_2 f: A \times B \times \Delta B &\to \Delta C \\ \partial\_2 f(a, b, \delta b) &:= f\_a'(b, \delta b) \end{aligned}$$

**Proposition 4 (Disjoint unions).** *Let* <sup>A</sup><sup>ˆ</sup> = (A, ΔA, <sup>⊕</sup>A) *and* <sup>B</sup><sup>ˆ</sup> <sup>=</sup> (B, ΔB, ⊕B) *be change actions.*

*Then* <sup>A</sup><sup>ˆ</sup> <sup>+</sup> <sup>B</sup><sup>ˆ</sup> := (<sup>A</sup> <sup>+</sup> B, ΔA <sup>×</sup> ΔB, <sup>⊕</sup>+) *is a change action, where* <sup>⊕</sup><sup>+</sup> *is defined as:*

$$\begin{aligned} \iota\_1 a \oplus\_+ (\delta a, \delta b) &:= \iota\_1 (a \oplus\_A \delta a) \\ \iota\_2 b \oplus\_+ (\delta a, \delta b) &:= \iota\_2 (b \oplus\_B \delta b) \end{aligned}$$

*The injection maps* ι1, ι<sup>2</sup> *are differentiable with respect to* Aˆ+Bˆ*. Furthermore, whenever* <sup>C</sup><sup>ˆ</sup> *is a change action and* <sup>f</sup> : <sup>A</sup> <sup>→</sup> C, g : <sup>B</sup> <sup>→</sup> <sup>C</sup> *are differentiable, then so is* [f,g]*.*

#### **2.4 Comparing Change Actions**

Much like topological spaces, we can compare change actions on the same base set according to coarseness. This is useful since differentiability of functions between change actions is characterized entirely by the coarseness of the actions.

**Definition 5.** *Let* Aˆ<sup>1</sup> *and* Aˆ<sup>2</sup> *be change actions on* A*. We say that* Aˆ<sup>1</sup> *is* coarser *than* <sup>A</sup>ˆ<sup>2</sup> *(or that* <sup>A</sup>ˆ<sup>2</sup> *is* finer *than* <sup>A</sup>ˆ1*) whenever for every* <sup>x</sup> <sup>∈</sup> <sup>A</sup> *and change* δa<sup>1</sup> ∈ ΔA1*, there is a change* δa<sup>2</sup> ∈ ΔA<sup>2</sup> *such that* x ⊕<sup>A</sup><sup>1</sup> δa<sup>1</sup> = x ⊕<sup>A</sup><sup>2</sup> δa2*.*

*We will write* <sup>A</sup>ˆ<sup>1</sup> <sup>≤</sup> <sup>A</sup>ˆ<sup>2</sup> *whenever* <sup>A</sup>ˆ<sup>1</sup> *is coarser than* <sup>A</sup>ˆ2*. If* <sup>A</sup>ˆ<sup>1</sup> *is both finer and coarser than* Aˆ2*, we will say that* Aˆ<sup>1</sup> *and* Aˆ<sup>2</sup> *are equivalent.*

The relation ≤ defines a preorder (but not a partial order) on the set of all change actions over a fixed set A. Least and greatest elements do exist up to equivalence, and correspond respectively to the empty change action <sup>A</sup>ˆ<sup>⊥</sup> and any complete change action, such as the full change action <sup>A</sup>ˆ, defined in Sect. 2.1.

**Proposition 5.** *Let* <sup>A</sup>ˆ<sup>2</sup> <sup>≤</sup> <sup>A</sup>ˆ1*,* <sup>B</sup>ˆ<sup>1</sup> <sup>≤</sup> <sup>B</sup>ˆ<sup>2</sup> *be change actions, and suppose the function* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *is differentiable as a function from* <sup>A</sup>ˆ<sup>1</sup> *into* <sup>B</sup>ˆ1*. Then* <sup>f</sup> *is differentiable as a function from* Aˆ<sup>2</sup> *into* Bˆ2*.*

A consequence of this fact is that whenever two change actions are equivalent they can be used interchangeably without affecting which functions are differentiable. One last parallel with topology is the following result, which establishes a simple criterion for when a change action is coarser than another:

**Proposition 6.** *Let* Aˆ1, Aˆ<sup>2</sup> *be change actions on* A*. Then* Aˆ<sup>1</sup> *is coarser than* <sup>A</sup>ˆ<sup>2</sup> *if and only if the identity function* id : <sup>A</sup> <sup>→</sup> <sup>A</sup> *is differentiable from* <sup>A</sup>ˆ<sup>1</sup> *to* <sup>A</sup>ˆ2*.*

# **3 Posets and Boolean Algebras**

The semantic domain of Datalog is a complete Boolean algebra, and so our next step is to construct a good change action for Boolean algebras. Along the way, we will consider change actions over posets, which give us the ability to *approximate* derivatives, which will turn out to be very important in practice.

#### **3.1 Posets**

Ordered sets give us a constrained class of functions: monotone functions. We can define *ordered* change actions, which are those that are well-behaved with respect to the order on the underlying set.<sup>7</sup>

**Definition 6.** *A change action* Aˆ *is* ordered *if*


In fact, any change action whose base set is a poset induces a partial order on the corresponding change set:

**Definition 7.** δa ≤<sup>Δ</sup> δb *iff for all* a ∈ A *it is the case that* a ⊕ δa ≤ a ⊕ δb*.*

**Proposition 7.** *Let* Aˆ *be a change action on a set* A *equipped with a partial order* <sup>≤</sup> *such that* <sup>⊕</sup> *is monotone in its first argument. Then* <sup>A</sup><sup>ˆ</sup> *is an ordered change action when* ΔA *is equipped with the partial order* ≤Δ*.*

In what follows, we will extend the partial order ≤<sup>Δ</sup> on some change set ΔB pointwise to functions from some A into ΔB. This pointwise order interacts nicely with derivatives, in that it gives us the following lemma:

<sup>7</sup> If we were giving a presentation that was generic in the base category, then this would simply be the definition of being a change action in the category of posets and monotone maps.

**Theorem 2 (Sandwich lemma).** *Let* Aˆ *be a change action, and* Bˆ *be an ordered change action, and let* f : A → B *and* g : A × ΔA → ΔB *be function. If* f<sup>↑</sup> *and* f<sup>↓</sup> *are derivatives for* f *such that*

$$f\_{\downarrow} \leq\_{\Delta} g \leq\_{\Delta} f\_{\uparrow}$$

*then* g *is a derivative for* f*.*

If unique minimal and maximal derivatives exist, then this gives us a characterisation of all the derivatives for a function.

**Theorem 3.** *Let* <sup>A</sup><sup>ˆ</sup> *and* <sup>B</sup><sup>ˆ</sup> *be change actions, with* <sup>B</sup><sup>ˆ</sup> *ordered, and let* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *be a function. If there exist* f↓↓ *and* f↑↑ *which are unique minimal and maximal derivatives of* f*, respectively, then the derivatives of* f *are precisely the functions* f *such that*

f↓↓ ≤<sup>Δ</sup> f ≤<sup>Δ</sup> f↑↑

This theorem gives us the leeway that we need when trying to pick a derivative: we can pick out the bounds, and that tells us how much "wiggle room" we have above and below.

#### **3.2 Boolean Algebras**

Complete Boolean algebras are a particularly nice domain for change actions because they have a negation operator. This is very helpful for computing differences, and indeed Boolean algebras have a complete change action.

**Proposition 8 (Boolean algebra change actions).** *Let* L *be a complete Boolean algebra. Define*

$$\hat{L}\_{\bowtie \lnot} := (L, L \bowtie L, \oplus\_{\bowtie \lnot})$$

*where*

$$\begin{aligned} L \ltimes L &:= \{ (a, b) \in L \times L \mid a \wedge b = \bot \} \\ a \oplus\_{\lnot \lnot} (p, q) &:= (a \vee p) \wedge \neg q \end{aligned}$$

$$((p,q)\cdot (r,s) := ((p\land\neg s)\lor r, (q\land\neg r)\lor s)$$

*with identity element* (⊥, ⊥)*.*

*Then* Lˆ *is a complete change action on* L*.*

We can think of Lˆ as tracking changes as pairs of "upwards" and "downwards" changes, where the monoid action simply applies one after the other, with an adjustment to make sure that the components remain disjoint.<sup>8</sup> For example,

<sup>8</sup> The intuition that Lˆ is made up of an "upwards" and a "downwards" change action glued together can in fact be made precise, but the specifics are outside the scope of this paper.

in the powerset Boolean algebra <sup>P</sup>(N), a change to {1, <sup>2</sup>} might consist of *adding* {3} and *removing* {1}, producing {2, <sup>3</sup>}. In <sup>P</sup>(N) this would be represented as ({1, 2}) ⊕ ({3}, {1}) = {2, 3}.

Boolean algebras also have unique maximal and minimal derivatives, under the usual partial order based on implication. The change set is, as usual, given the change partial order, which in this case corresponds to the natural order on <sup>L</sup> <sup>×</sup> <sup>L</sup>op.

**Proposition 9.** *Let* L *be a complete Boolean algebra with the* Lˆ *change action, and* f : A → L *be a function. Then, the following are minus operators:*

$$\begin{aligned} a \ominus\_{\perp} b &= (a \wedge \neg b, \neg a) \\ a \ominus\_{\top} b &= (a, b \wedge \neg a) \end{aligned}$$

*Additionally,* f <sup>⊥</sup> *and* f *define unique least and greatest derivatives for* f*.*

Theorem 3 then gives us bounds for all the derivatives on Boolean algebras:

**Corollary 1.** *Let* L *be a complete Boolean algebra with the corresponding change action* <sup>L</sup>ˆ*,* <sup>A</sup><sup>ˆ</sup> *be an arbitrary change action, and* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>L</sup> *be a function. Then the derivatives of* f *are precisely those functions* f : A × ΔA → ΔA *such that*

$$f'\_{\ominus\_\perp} \le\_\Delta f' \le\_\Delta f'\_{\ominus\_\perp}$$

This makes Theorem 3 actually usable in practice, since we have concrete definitions for our bounds (which we will make use of in Sect. 4.2).

# **4 Derivatives for Non-recursive Datalog**

We now want to apply the theory we have developed to the specific case of the semantics of Datalog. Giving a differentiable semantics for Datalog will lead us to a strategy for performing incremental evaluation and maintenance of Datalog programs. To begin with, we will restrict ourselves to the non-recursive fragment of the language—the formulae that make up the right hand sides of Datalog rules. We will tackle the full program semantics in a later section, once we know how to handle fixpoints.

Although the techniques we are using should work for any language, Datalog provides a non-trivial case study where the need for incremental computation is real and pressing, as we saw in Sect. 1.

#### **4.1 Semantics of Datalog Formulae**

Datalog is usually given a logical semantics where formulae are interpreted as first-order logic predicates and the semantics of a program is the set of models of its constituent predicates. We will instead give a simple denotational semantics (as is typical when working with fixpoints, see e.g. [17]) that treats a Datalog formula as directly denoting a relation, i.e. a set of named tuples, with variables ranging over a finite schema.

**Definition 8.** *A* schema Γ *is a finite set of names. A* named tuple *over* Γ *is an assignment of a value* v<sup>i</sup> *for each name* x<sup>i</sup> *in* Γ*. Given disjoint schemata* Γ = {x1,...,xn} *and* Σ = {y1,...,ym}*, the* selection function σ<sup>Γ</sup> *is defined as*

σ<sup>Γ</sup> ({x<sup>1</sup> → v1,...,x<sup>n</sup> → vn, y<sup>1</sup> → w1,...,y<sup>m</sup> → wm}) := {x<sup>1</sup> → v1,...,x<sup>n</sup> → vn}

*i.e.* σ<sup>Γ</sup> *restricts a named tuple over* Γ ∪ Σ *into a tuple over* Γ *with the same values for the names in* Γ*. We denote the elementwise extension of* σ<sup>Γ</sup> *to sets of tuples also as* σ<sup>Γ</sup> *.*

We will adopt the usual closed-world assumption to give a denotation to negation.

**Definition 9.** *For any schema* Γ*, there exists a universal relation* U<sup>Γ</sup> *. Negation on relations can then be defined as*

$$\neg R := \mathcal{U}\_{\Gamma} \nmid R$$

This makes **Rel**<sup>Γ</sup> , the set of all subsets of U<sup>Γ</sup> , a complete Boolean algebra.

**Definition 10.** *A Datalog formula* T *whose free term variables are contained in* Γ *denotes a function from* **Rel**<sup>n</sup> <sup>Γ</sup> *to* **Rel**<sup>Γ</sup> *.*

$$\llbracket . \rrbracket\_{\Gamma} : \mathbf{Formula} \to \mathbf{Rel}\_{\Gamma}^{n} \to \mathbf{Rel}\_{\Gamma}$$

*If* R = (R1,..., Rn) *is a choice of a relation* R<sup>i</sup> *for each of the variables* Ri*,* -T(R) *is inductively defined according to the rules in Fig. 1.*

**Fig. 1.** Formula semantics for Datalog

Since **Rel**<sup>Γ</sup> is a complete Boolean algebra, and so is **Rel**<sup>n</sup> <sup>Γ</sup> , -T<sup>Γ</sup> is a function between complete Boolean algebras. For brevity, we will often leave the schema implicit, as it is clear from the context.

#### **4.2 Differentiability of Datalog Formula Semantics**

In order to actually perform our incremental computation, we first need to provide a concrete derivative for the semantics of Datalog formulae. Of course, since -T<sup>Γ</sup> is a function between the complete Boolean algebras **Rel**<sup>n</sup> <sup>Γ</sup> and **Rel**<sup>Γ</sup> , and


**Fig. 2.** Upwards and downwards formula derivatives for Datalog

we know that the corresponding change actions **Rel** n <sup>Γ</sup> and **Rel** -<sup>Γ</sup> are complete, this guarantees the existence of a derivative for -T.

Unfortunately, this does not necessarily provide us with an *efficient* derivative for -T. The derivatives that we know how to compute (Corollary 1) rely on computing f(a ⊕ δa) itself, which is the very thing we were trying to avoid computing!

Of course, given a concrete definition of a derivative we can simplify this expression and hopefully make it easier to compute. But we also know from Corollary 1 that *any* function bounded by f <sup>⊥</sup> and f is a valid derivative, and we can therefore optimize anywhere within that range to make a trade-off between ease of computation and precision.<sup>9</sup>

There is also the question of how to compute the derivative. Since the change set for **Rel** is a subset of **Rel** × **Rel**, it is possible and indeed very natural to compute the two components via a pair of Datalog formulae, which allows us to reuse an existing Datalog formula evaluator. Indeed, if this process is occurring in an optimizing compiler, the derivative formulae can themselves be optimized. This is very beneficial in practice, since the initial formulae may be quite complex.

This does give us additional constraints that the derivative formulae must satisfy: for example, we need to be able to evaluate them; and we may wish to pick formulae that will be easy or cheap for our evaluation engine to compute, even if they compute a less precise derivative.

The upshot of these considerations is that the optimal choice of derivatives is likely to be quite dependent on the precise variant of Datalog being evaluated, and the specifics of the evaluation engine. Here is one possibility, which is the one used at Semmle.

<sup>9</sup> The idea of using an approximation to the precise derivative, and a soundness condition, appears in Bancilhon [9].

**A concrete Datalog formula derivative.** In Fig. 2, we define a "symbolic" derivative operator as a pair of mutually recursive functions, Δ and ∇, which turn a Datalog formula T into new formulae that compute the upwards and downwards parts of the derivative, respectively. Our definition uses an auxiliary function, X, which computes the "neXt" value of a term by applying the upwards and downwards derivatives. As is typical for a derivative, the new formulae will have additional free relation variables for the upwards and downwards derivatives of the free relation variables of T, denoted as ΔR and ∇R respectively. Evaluating the formula as a derivative means evaluating it as a normal Datalog formula with the new relation variables set to the input relation changes.

While the definitions mostly exhibit the dualities we would expect between corresponding operators, there are a few asymmetries to explain.

The asymmetry between the cases for Δ(T ∨ U) and ∇(T ∧ U) is for operational reasons. The symmetrical version of Δ(T ∨U) is (Δ(T)∧¬U)∨(Δ(U)∧¬T) (which is also precise). The reason we omit the negated conjuncts is simply that they are costly to compute and not especially helpful to our evaluation engine.

The asymmetry between the cases for ∃ is because our dialect of Datalog does not have a primitive universal quantifier. If we did have one, the cases for ∃ would be dual to the corresponding cases for ∀.

**Theorem 4 (Concrete Datalog formula derivatives).** *Let* Δ*,* ∇*,* X : Formula → Formula *be mutually recursive functions defined by structural induction as in Fig. 2.*

*Then* Δ(T) *and* ∇(T) *are disjoint, and for any schema* Γ *and any Datalog formula* T *whose free term variables are contained in* Γ*,* -T <sup>Γ</sup> := (-Δ(T)<sup>Γ</sup> , -∇(T)<sup>Γ</sup> ) *is a derivative for* -T<sup>Γ</sup> *.*

We can give a derivative for our treeP predicate by mechanically applying the recursive functions defined in Fig. 2.

$$\begin{aligned} &\Delta(treeP(x)) \\ &= p(x) \land \exists y. (child(x,y) \land \Delta(treeP(y))) \land \neg \exists y. (child(x,y) \land \neg \mathsf{X}(treeP(y))) \end{aligned}$$

∇(treeP(x)) = p(x) ∧ ∃y.(child(x, y) ∧ ∇(treeP(y)))

The upwards difference in particular is not especially easy to compute. If we naively compute it, the third conjunct requires us to recompute the whole of the recursive part. However, the second conjunct gives us a guard: if it is empty we then the whole formula will be, so we only need to evaluate the third conjunct if the second conjunct is non-empty, i.e if there is *some* change in the body of the existential.

This shows that our derivatives aren't a panacea: it is simply *hard* to compute downwards differences for ∃ (and, equivalently, upwards differences for ∀) because we must check that there is no other way of deriving the same facts.<sup>10</sup> However,

<sup>10</sup> The "support" data structures introduced by [25] are an attempt to avoid this issue by tracking the number of derivations of each tuple.

we can still avoid the re-evaluation in many cases, and the inefficiency is local to this subformula.

#### **4.3 Extensions to Datalog**

Our formulation of Datalog formula semantics and derivatives is generic and modular, so it is easy to extend the language with new formula constructs: all we need to do is add cases for Δ and ∇.

In fact, because we are using a complete change action, we can *always* do this by using the maximal or minimal derivative. This justifies our claim that we can support *arbitrary* additional formula constructs: although the maximal and minimal derivatives are likely to be impractical, having them available as options means that we will never be completely stymied.

This is important in practice: here is a real example from Semmle's variant of Datalog. This includes a kind of aggregates which have well-defined recursive semantics. Aggregates have the form

$$r = \text{agg}(p)(vs \mid T \mid U)$$

where agg refers to an aggregation function (such as "sum" or "min"), vs is a sequence of variables, p and r are variables, T is a formula possibly mentioning vs, and U is a formula possibly mentioning vs and p. The full details can been found in Moor and Baars [34], but for example this allows us to write

$$\begin{aligned} height(n,h) \leftarrow \neg \exists c. (child(n,c)) \land h &= 0\\ \lor \exists h'. (h' = \max(p)(c \mid child(n,c) \mid height(c,p)) \land h &= h' + 1) \end{aligned}$$

which recursively computes the height of a node in a tree.

Here is an upwards derivative for an aggregate formula:

$$\Delta(r = \text{agg}(p)(vs \mid T \mid U)) := \exists v s. (T \land \Delta U) \land r = \text{agg}(p)(vs \mid T \mid U)$$

While this isn't a precise derivative, it is still substantially cheaper than reevaluating the whole subformula, as the first conjunct acts as a guard, allowing us to skip the second conjunct when U has not changed.

#### **5 Changes on Functions**

So far we have defined change actions for the kinds of things that typically make up *data*, but we would also like to have change actions on *functions*. This would allow us to define derivatives for higher-order languages (where functions are first-class); and for semantic operators like fixpoint operators **fix** : (A → A) → A, which also operate on functions.

Function spaces, however, differ from products and disjoint unions in that there is no obvious "best" change action on A → B. Therefore instead of trying to define a single choice of change action, we will instead pick out subsets of function spaces which have "well-behaved" change actions.

**Definition 11 (Functional Change Action).** *Given change actions* Aˆ *and* Bˆ *and a set* <sup>U</sup> <sup>⊆</sup> <sup>A</sup> <sup>→</sup> <sup>B</sup>*, a change action* <sup>U</sup><sup>ˆ</sup> = (U, ΔU, <sup>⊕</sup><sup>U</sup> ) *is* functional *whenever the evaluation map* ev : U × A → B *is differentiable, that is to say, whenever there exists a function* ev : (U × A) × (ΔU × ΔA) → ΔB *such that:*

$$(f \oplus\_U \delta f)(a \oplus\_A \delta a) = f(a) \oplus\_B \text{ev}'((f, a), (\delta f, \delta a))$$

*We will write* <sup>U</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>A</sup><sup>ˆ</sup> <sup>⇒</sup> <sup>B</sup><sup>ˆ</sup> *whenever* <sup>U</sup> <sup>⊆</sup> <sup>A</sup> <sup>→</sup> <sup>B</sup> *and* <sup>U</sup><sup>ˆ</sup> *is functional.*

There are two reasons why functional change actions are usually associated with a *subset* of U ⊆ A → B. Firstly, it allows us to restrict ourselves to spaces of monotone or continuous functions. But more importantly, functional change actions are necessarily made up of differentiable functions, and thus a functional change action may not exist for the entire function space A → B.

**Proposition 10.** *Let* <sup>U</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>A</sup><sup>ˆ</sup> <sup>⇒</sup> <sup>B</sup><sup>ˆ</sup> *be a functional change action. Then every* f ∈ U *is differentiable, with a derivative* f *given by:*

$$f'(x, \delta x) = \text{ev}'((f, x), (\mathbf{0}, \delta x))$$

#### **5.1 Pointwise Functional Change Actions**

Even if we restrict ourselves to the differentiable functions between Aˆ and Bˆ it is hard to find a concrete functional change action for this set. Fortunately, in many important cases there is a simple change action on the set of differentiable functions.

**Definition 12 (Pointwise functional change action).** *Let* Aˆ *and* Bˆ *be change actions. The* pointwise functional change action <sup>A</sup><sup>ˆ</sup> <sup>⇒</sup>pt <sup>B</sup>ˆ*, when it is defined, is given by* (A<sup>ˆ</sup> <sup>→</sup> B,A <sup>ˆ</sup> <sup>→</sup> ΔB, <sup>⊕</sup>→)*, with the monoid structure* (A → ΔB, ·→, **0**→) *and the action* ⊕<sup>→</sup> *defined by:*

$$\begin{aligned} (f \oplus\_{\rightarrow} \delta f)(x) &:= f(x) \oplus\_{B} \delta f(x) \\ (\delta f \multimap \delta g)(x) &:= \delta f(x) \cdot\_{B} \delta g(x) \\ \mathbf{0} \multimap (x) &:= \mathbf{0}\_{B} \end{aligned}$$

That is, a change is given pointwise, mapping each point in the domain to a change in the codomain.

The above definition is not always well-typed, since given <sup>f</sup> : <sup>A</sup><sup>ˆ</sup> <sup>→</sup> <sup>B</sup><sup>ˆ</sup> and δf : A → ΔB there is no guarantee that f ⊕<sup>→</sup> δf is differentiable. We present two sufficient criteria that guarantee this.

**Theorem 5.** *Let* Aˆ *and* Bˆ *be change actions, and suppose that* Bˆ *satisfies one of the following conditions: – The change action* ΔB-


*Then the pointwise functional change action* (A<sup>ˆ</sup> <sup>→</sup> B,A <sup>ˆ</sup> <sup>→</sup> ΔB, <sup>⊕</sup>→) *is well defined.*<sup>11</sup>

As a direct consequence of this, it follows that whenever L is a Boolean algebra (and hence has a complete change action), the pointwise functional change action <sup>A</sup><sup>ˆ</sup> <sup>⇒</sup>pt <sup>L</sup>ˆ is well-defined.

Pointwise functional change actions are functional in the sense of Definition 11. Moreover, the derivative of the evaluation map is quite easy to compute.

**Proposition 11 (Derivatives of the evaluation map).** *Let* Aˆ *and* Bˆ *be change actions such that the pointwise functional change action* <sup>A</sup><sup>ˆ</sup> <sup>⇒</sup>pt <sup>B</sup><sup>ˆ</sup> *is well defined, and let* <sup>f</sup> : <sup>A</sup><sup>ˆ</sup> <sup>→</sup> <sup>B</sup>ˆ*,* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*,* δa <sup>∈</sup> ΔA*,* δf <sup>∈</sup> <sup>A</sup> <sup>→</sup> ΔB*.*

*Then the following are both derivatives of the evaluation map:*

$$\begin{aligned} \text{ev}\_1'((f,a),(\delta f,\delta a)) &:= f'(a,\delta a) \cdot \delta f(a \oplus \delta a) \\ \text{ev}\_2'((f,a),(\delta f,\delta a)) &:= \delta f(a) \cdot (f \oplus \delta f)'(a,\delta a) \end{aligned}$$

A functional change action merely tells us that a derivative of the evaluation map exists—a pointwise change action actually gives us a definition of it. In practice, this means that we will only be able to use the results in Sect. 6.2 (incremental computation and derivatives of fixpoints) when we have pointwise change actions, or where we have some other way of computing a derivative of the evaluation map.

# **6 Directed-Complete Partial Orders and Fixpoints**

Directed-complete partial orders (dcpos) equipped with a least element, are an important class of posets. They allow us to take *fixpoints* of (Scott-)continuous maps, which is important for interpreting recursion in program semantics.

#### **6.1 Dcpos**

As before, we can define change actions on dcpos, rather than sets, as change actions whose base and change sets are endowed with a dcpo structure, and where the monoid operation and action are (Scott-)continuous.

**Definition 13.** *A change action* Aˆ *is* continuous *if*


<sup>11</sup> Either of these conditions is enough to guarantee that the pointwise functional change action is well defined, but it can be the case that Bˆ satisfies neither and yet pointwise change actions into Bˆ do exist. A precise account of when pointwise functional change actions exist is outside the scope of this paper.

Unlike posets, the change order ≤<sup>Δ</sup> does *not*, in general, induce a dcpo on ΔA. As a counterexample, consider the change action (N, <sup>N</sup>, +), where <sup>N</sup> denotes the dcpo of natural numbers extended with positive infinity.

A key example of a continuous change action is the Lˆ change action on Boolean algebras.

**Proposition 12 (Boolean algebra continuity).** *Let* L *be a Boolean algebra. Then* Lˆ *is a continuous change action.*

For a general overview of results in domain theory and dcpos, we refer the reader to an introductory work such as [2], but we state here some specific results that we shall be using, such as the following, whose proof can be found in [2, Lemma 3.2.6]:

**Proposition 13.** *A function* f : A × B → C *is continuous iff it is continuous in each variable separately.*

It is a well-known result in standard calculus that the limit of an absolutely convergent sequence of differentiable functions {fi} is itself differentiable, and its derivative is equal to the limit of the derivatives of the fi. A consequence of Proposition 13 is the following analogous result:

**Corollary 2.** *Let* <sup>A</sup><sup>ˆ</sup> *and* <sup>B</sup><sup>ˆ</sup> *be change actions, with* <sup>B</sup><sup>ˆ</sup> *continuous and let* {fi} *and* {f <sup>i</sup>} *be* I*-indexed directed sets of functions in* A → B *and* A × ΔA → ΔB *respectively.* <sup>i</sup> *is a derivative of* <sup>f</sup>i*, then is a derivative of*

*Then, if for every* i ∈ I *it is the case that* f <sup>i</sup>∈<sup>I</sup> <sup>f</sup> i <sup>i</sup>∈<sup>I</sup> <sup>f</sup>i*.*

#### **6.2 Fixpoints**

Fixpoints appear frequently in the semantics of languages with recursion. If we can give a generic account of how to compute fixpoints using change actions, then this gives us a compositional way of extending a derivative for the nonrecursive semantics of a language to a derivative that can also handle recursion. We will later apply this technique to create a derivative for the semantics of full recursive Datalog (Sect. 7.2).

**Iteration functions.** Over directed-complete partial orders we can define a least fixpoint operator **lfp** in terms of the iteration function **iter**: **iter** : (<sup>A</sup> <sup>→</sup> <sup>A</sup>) <sup>×</sup> <sup>N</sup> <sup>→</sup> <sup>A</sup>

$$\begin{aligned} \textbf{iter}: (A \to A) \times \mathbb{N} &\to A\\ \textbf{iter}(f, 0) &:= \bot \\ \textbf{iter}(f, n) &:= f^n(\bot) \\ \textbf{flip}: (A \to A) &\to A \\ \textbf{flip}(f) &:= \bigsqcup\_{n \in \mathbb{N}} \textbf{iter}(f, n) \end{aligned}$$

The iteration function is the basis for all the results in this section: we can take a partial derivative with respect to n, and this will give us a way to get to the next iteration incrementally; and we can take the partial derivative with respect to f, and this will give us a way to get from iterating f to iterating f ⊕ δf.

**Incremental computation of fixpoints.** The following theorems provide a generalization of semi-naive evaluation to any differentiable function over a continuous change action. Throughout this section we will assume that we have a continuous change action <sup>A</sup>ˆ, and any reference to the change action <sup>N</sup><sup>ˆ</sup> will refer to the monoidal change action on the naturals defined in Sect. 2.1.

Since we are trying to incrementalize the iterative step, we start by taking the partial derivative of **iter** with respect to n.

**Proposition 14 (Derivative of the iteration map with respect to** n**).** *Let* <sup>A</sup><sup>ˆ</sup> *be a complete change action and let* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> *be a differentiable function. Then* **iter** *is differentiable with respect to its second argument, and a partial derivative is given by:*

$$\begin{aligned} &i \le b \land \\ &\partial\_2 \mathtt{iter}: (A \to A) \times \mathbb{N} \times \Delta \mathbb{N} \to \Delta A \\ &\partial\_2 \mathtt{iter}(f, \mathbf{0}, m) := \mathtt{iter}(f, m) \ominus \mathtt{iter}(f, 0) \\ &\partial\_2 \mathtt{iter}(f, n+1, m) := f'(\mathtt{iter}(f, n), \partial\_2 \mathtt{iter}(f, n, m)) \end{aligned}$$

By using the following recurrence relation, we can then compute ∂2**iter** along with **iter** simultaneously:

$$\begin{aligned} \mathbf{recur}\_f &: A \times \Delta A \to A \times \Delta A \\ \mathbf{recur}\_f(\bot, \bot) &:= (\bot, f(\bot) \ominus \bot) \\ \mathbf{recur}\_f(a, \delta a) &:= (a \oplus \delta a, f'(a, \delta a)) \end{aligned}$$

Which has the property that

$$\mathbf{recur}\_f^n(\bot,\bot) = (\mathbf{iter}(f,n), \partial\_2 \mathbf{iter}(f,n,1))$$

This gives us a way to compute a fixpoint incrementally, by adding successive changes to an accumulator until we reach it. This is exactly how semi-naive evaluation works: you compute the delta relation and the accumulator simultaneously, adding the delta into the accumulator at each stage until it becomes the final output.

**Theorem 6 (Incremental computation of least fixpoints).** *Let* Aˆ *be a complete, continuous change action,* <sup>f</sup> : <sup>A</sup><sup>ˆ</sup> <sup>→</sup> <sup>A</sup><sup>ˆ</sup> *be continuous and differentiable. Then* **lfp**(f) = <sup>n</sup>∈<sup>N</sup>(π1(**recur**<sup>n</sup> <sup>f</sup> (⊥, ⊥)))*.* 12

<sup>12</sup> Note that we have *not* taken the fixpoint of **recur**<sup>f</sup> , since it is not continuous.

**Derivatives of fixpoints.** In the previous section we have shown how to use derivatives to compute fixpoints more efficiently, but we also want to take the derivative of the fixpoint operator itself. A typical use case for this is where we have calculated some fixpoint

$$F\_E := \mathbf{fix}(\lambda X.F(E, X)),$$

then update the parameter E with some change δE and wish to compute the new value of the fixpoint, i.e.

$$F\_{E \oplus \delta E} := \mathbf{fix}(\lambda X.F(E \oplus \delta E, X)),$$

This can be seen as applying a change to the *function* whose fixpoint we are taking. We go from computing the fixpoint of F(E, ) to computing the fixpoint of F(E ⊕ δE, ). If we have a pointwise functional change action then we can express this change as a function giving the change at each point, that is:

$$
\lambda X.F(E \oplus \delta E, X) \ominus F(E, X).$$

In Datalog this would allow us to update a recursively defined relation given an update to one of its non-recursive dependencies, or the extensional database. For example, we might want to take the transitive closure relation and update it by changing the edge relation e.

However, to compute these examples would requires us to provide a derivative for the fixpoint operator **fix**: we want to know how the resulting fixpoint changes given a change to its input function.

**Definition 14 (Derivatives of fixpoints).** *Let* Aˆ *be a change action, let* <sup>U</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>A</sup><sup>ˆ</sup> <sup>⇒</sup> <sup>A</sup><sup>ˆ</sup> *be a functional change action (not necessarily pointwise) and suppose* **fix**<sup>U</sup> *and* **fix**ΔA *are fixpoint operators for endofunctions on* U *and* ΔA *respectively.*

*Then we define*

$$\begin{aligned} \mathsf{adjust} &: U \times \Delta U \to (\Delta A \to \Delta A) \\ \mathsf{adjust}(f, \delta f) &:= \lambda \,\delta a. \mathrm{ev}'((f, \mathbf{fix}\_U(f)), (\delta f, \delta a)) \\ \mathsf{fix}'\_U &: U \times \Delta U \to \Delta A \\ \mathsf{fix}'\_U(f, \delta f) &:= \mathsf{fix}\_{\Delta A}(\mathsf{adjust}(f, \delta f)) \end{aligned}$$

The suggestively named **fix** <sup>U</sup> will in fact turn out to be a derivative—for *least* fixpoints. The appearance of ev , a derivative of the evaluation map, in the definition of **adjust** is also no coincidence: as evaluating a fixpoint consists of many steps of applying the evaluation map, so computing the derivative of a fixpoint consists of many steps of applying the derivative of the evaluation map.<sup>13</sup>

<sup>13</sup> Perhaps surprisingly, the authors first discovered an expanded version of this formula, and it was only later that we realised the remarkable connection to ev- .

Since **lfp** is characterized as the limit of a chain of functions, Corollary 2 suggests a way to compute its derivative. It suffices to find a derivative **iter** <sup>n</sup> of each iteration map such that the resulting set {**iter** <sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>} is directed, which will entail that <sup>n</sup>∈<sup>N</sup> **iter** <sup>n</sup> is a derivative of **lfp**.

These correspond to the first partial derivative of **iter**—this time with respect to f. While we are differentiating with respect to f, we are still going to need to define our derivatives inductively in terms of n.

**Proposition 15 (Derivative of the iteration map with respect to** f**). iter** *is differentiable with respect to its first argument and a derivative is given by:*

$$\begin{aligned} &\text{ifferentible with respect to its first argument and a derivative is} \\ &\partial\_1 \text{iter}: (A \to A) \times \Delta (A \to A) \times \mathbb{N} \to \Delta A \\ &\partial\_1 \text{iter}(f, \delta f, \mathbf{0}) := \perp\_{\Delta A} \\ &\partial\_1 \text{iter}(f, \delta f, n+1) := \text{ev}'((f, \text{iter}(f, n)), (\delta f, \partial\_1 \text{iter}(f, \delta f, n))) \end{aligned}$$

As before, we can now compute ∂1**iter** together with **iter** by mutual recursion.<sup>14</sup>

$$\begin{aligned} \mathbf{recur}\_{f, \delta f} &: A \times \Delta A \to A \times \Delta A \\ \mathbf{recur}\_{f, \delta f}(a, \delta a) &:= (f(a), \mathrm{ev}'((f, a), (\delta f, \delta a))) \end{aligned}$$

Which has the property that

$$\mathbf{recur}\_{f, \delta f}^{n}(\bot, \bot) = (\mathbf{iter}(f, n), \partial\_1 \mathbf{iter}(f, \delta f, n)).$$

This indeed provides us with a function whose limit we can take. If we do so we will discover that it is exactly **lfp** (defined as in Definition 14), showing that **lfp** is a true derivative.

#### **Theorem 7 (Derivatives of least fixpoint operators).** *Let*


*Then* **lfp** *is a derivative of* **lfp***.*

Computing this derivative still requires computing a fixpoint—over the change lattice—but this may still be significantly less expensive than recomputing the full new fixpoint.

<sup>14</sup> In fact, the recursion here is not *mutual*: the first component does not depend on the second. However, writing it in this way makes it amenable to computation by fixpoint, and we will in fact be able to avoid the recomputation of **iter**<sup>n</sup> when we show that it is equivalent to **lfp**- .

# **7 Derivatives for Recursive Datalog**

Given the non-recursive semantics for a language, we can extend it to handle recursive definitions using fixpoints. Section 6.2 lets us extend our derivative for the non-recursive semantics to a derivative for the recursive semantics, as well as letting us compute the fixpoints themselves incrementally.

Again, we will demonstrate the technique with Datalog, although the approach is generic.

#### **7.1 Semantics of Datalog Programs**

First of all, we define the usual "immediate consequence operator" which computes "one step" of our program semantics. **Definition 15.** *Given a program* <sup>P</sup> = (P1,...,Pn)*, where* <sup>P</sup><sup>i</sup> *is a predicate, with*

*schema* <sup>Γ</sup>i*, the* immediate consequence operator <sup>I</sup> : **Rel**<sup>n</sup> <sup>→</sup> **Rel**<sup>n</sup> *is defined as follows:*

I(R1,..., Rn)=(-P1<sup>Γ</sup><sup>1</sup> (R1,..., Rn),..., -Pn<sup>Γ</sup>*<sup>n</sup>* (R1,..., Rn))

That is, given a value for the program, we pass in all the relations to the denotation of each predicate, to get a new tuple of relations. **Definition 16.** *The semantics of a program* <sup>P</sup> *is defined to be*

P


*and may be calculated by iterative application of* I *to* ⊥ *until fixpoint is reached.*

Whether or not this program semantics exists will depend on whether the fixpoint exists. Typically this is ensured by constraining the program such that I is monotone (or, in the context of a dcpo, continuous). We do not require monotonicity to apply Theorem 6 (and hence we can incrementally compute fixpoints that happen to exist even though the generating function is not monotonic), but it is required to apply Theorem 7.

#### **7.2 Incremental Evaluation of Datalog**

We can easily extend a derivative for the formula semantics to a derivative for the immediate consequence operator I. Putting this together with the results from Sect. 6.2, we have now created *modular* proofs for the two main results, which allows us to preserve them in the face of changes to the underlying language.

**Corollary 3.** *Datalog program semantics can be evaluated incrementally.*

**Corollary 4.** *Datalog program semantics can be incrementally maintained with changes to relations.*

Note that our approach makes no particular distinction between changes to the *extensional* relations (adding or removing facts), and changes to the *intensional* relations (changing the definition). The latter simply amounts to a change to the denotation of that relation, which can be incrementally propagated in exactly the same way as we would propagate a change to the extensional relations.

# **8 Related Work**

#### **8.1 Change Actions and Incremental Computation**

**Change structures.** The seminal paper in this area is Cai et al. [14]. We deviate from that excellent paper in three regards: the inclusion of minus operators, the nature of function changes, and the use of dependent types.

We have omitted minus operators from our definition because there are many interesting change actions that are not complete and so cannot have a minus operator. Where we can find a change structure with a minus operator, often we are forced to use unwieldy representations for change sets, and Cai et al. cite this as their reason for using a dependent type of changes. For example, the monoidal change actions on sets and lists are clearly useful for incremental computation on streams, yet they do not admit minus operators—instead, one would be forced to work with e.g. multisets admitting negative arities, as Cai et al. do.

Our function changes (when well behaved) correspond to what Cai et al. call *pointwise differences* (see [14, section 2.2]). As they point out, you can reconstruct their function changes from pointwise changes and derivatives, so the two formulations are equivalent.

The equivalence of our presentations means that our work should be compatible with their Incremental Lambda Calculus (see [14, section 3]). The derivatives we give in Sect. 4.2 are more or less a "change semantics" for Datalog (see [14, section 3.5]).

**S-acts.** S-acts (i.e the category of monoid actions on sets) and their categorical structure have received a fair amount of attention over the years (Kilp, Knauer, and Mikhalev [30] is a good overview). However, there is a key difference between change actions considered as a category (**CAct**) and the category of S-acts (**SAct**): the objects of **SAct** all maintain the same monoid structure, whereas we are interested in changing both the base set *and* the structure of the action.

**Derivatives of fixpoints.** Arntzenius [5] gives a derivative operator for fixpoints based on the framework in Cai et al. [14]. However, since we have different notions of function changes, the result is inapplicable as stated. In addition, we require a somewhat different set of conditions; in particular, we do not require our changes to always be increasing.

#### **8.2 Datalog**

**Incremental evaluation.** The earliest interpretation of semi-naive evaluation as a derivative appears in Bancilhon [8]. The idea of using an approximate derivative and the requisite soundness condition appears as a throwaway comment in Bancilhon and Ramakrishnan [9, section 3.2.2], and it would appear that nobody has since developed that approach.

As far as we know, traditional semi-naive is the state of the art in incremental, bottom-up, Datalog evaluation, and there are no strategies that accommodate additional language features such as parity-stratified negation and aggregates.

**Incremental maintenance.** There is existing literature on incremental maintenance of relational algebra expressions.

Griffin, Libkin, and Trickey [24] following Qian and Wiederhold [35] compute differences with both an "upwards" and a "downwards" component, and produce a set of rules that look quite similar to those we derive in Theorem 4. However, our presentation is significantly more generic, handles recursive expressions, and works on set semantics rather than bag semantics.<sup>15</sup>

Several approaches [25,27]—most notably DReD—remove facts until one can start applying the rules again to reach the new fixpoint. Given a good way of deciding what facts to remove this can be quite efficient. However, such techniques tend to be tightly coupled to the domain. Although we know of no theoretical reason why either approach should give superior performance when both are applicable, an empirical investigation of this could prove interesting.

Other approaches [19,43] consider only restricted subsets of Datalog, or incur other substantial constraints.

**Embedding Datalog.** Datafun (Arntzenius and Krishnaswami [6]) is a functional programming language that embeds Datalog, allowing significant improvements in genericity, such as the use of higher-order functions. Since we have directly defined a change action and derivative operator for Datalog, our work could be used as a "plugin" in the sense of Cai et al., allowing Datafun to compute its internal fixpoints incrementally, but also allowing Datafun expressions to be fully incrementally maintained.

In a different direction, Cathcart Burn, Ong, and Ramsay [15] have proposed *higher-order constrained Horn clauses* (HoCHC), a new class of constraints for the automatic verification of higher-order programs. HoCHC may be viewed as a higher-order extension of Datalog. Change actions can be readily applied to organise an efficient semi-naive method for solving HoCHC systems.

#### **8.3 Differential** *λ***-calculus**

Another setting where derivatives of arbitrary higher-order programs have been studied is the *differential* λ*-calculus* [20,21]. This is a higher-order, simply-typed

<sup>15</sup> The same approach of finding derivatives would work with bag semantics, although unfortunately the Boolean algebra structure is missing.

λ-calculus which allows for computing the derivative of a function, in a similar way to the notion of derivative in Cai's work and the present paper.

While there are clear similarities between the two systems, the most important difference is the properties of the derivatives themselves: in the differential λ-calculus, derivatives are guaranteed to be linear in their second argument, whereas in our approach derivatives do not have this restriction but are instead required to satisfy a strong relation to the function that is being differentiated (see Definition 2).

Families of denotational models for the differential λ-calculus have been studied in depth [12,13,16,29], and the relationship between these and change actions is the subject of ongoing work.

#### **8.4 Higher-Order Automatic Differentiation**

Automatic differentiation [23] is a technique that allows for efficiently computing the derivative of arbitrary programs, with applications in probabilistic modeling [31] and machine learning [10] among other areas. In recent times, this technique has been successfully applied to higher-order languages [11,41]. While some approaches have been suggested [28,33], a general theoretical framework for this technique is still a matter of open research.

To this purpose, some authors have proposed the incremental λ-calculus as a foundational framework on which models of automatic differentiation can be based [28]. We believe our change actions are better suited to this purpose than the incremental λ-calculus, since one can easily give them a synthetic differential geometric reading (by interpreting Aˆ as an Euclidean module and ΔA as its corresponding spectrum, for example).

### **9 Conclusions and Future Work**

We have presented change actions and their properties, and used them to provide novel, compositional, strategies for incrementally evaluating and maintaining recursive functions, in particular the semantics of Datalog.

The main avenue for future theoretical work is the categorical structure of change actions. This has begun to be explored by the authors in [4], where change actions are generalized to arbitrary Cartesian base categories and a construction is provided to obtain "canonical" Cartesian closed categories of change actions and differentiable maps.

We hope that these generalizations would allow us to extend the theory of change actions towards other classes of models, such as synthetic differential geometry and domain theory. Some early results in [4] also indicate a connection between 2-categories and change actions which has yet to be fully mapped.

The compositional nature of these techniques suggest that an approach like that used in [22] could be used for an even more generic approach to automatic differentiation.

In addition, there is plenty of scope for practical application of the techniques given here to languages other than Datalog.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Incremental *λ*-Calculus in Cache-Transfer Style Static Memoization by Program Transformation

Paolo G. Giarrusso1(B) , Yann Régis-Gianas<sup>2</sup>, and Philipp Schuster<sup>3</sup>

> <sup>1</sup> LAMP—EPFL, Lausanne, Switzerland <sup>2</sup> IRIF, University of Paris Diderot, Inria, Paris, France <sup>3</sup> University of Tübingen, Tübingen, Germany

Abstract. Incremental computation requires propagating changes and reusing intermediate results of base computations. Derivatives, as produced by static differentiation [7], propagate changes but do not reuse intermediate results, leading to wasteful recomputation. As a solution, we introduce conversion to *Cache-Transfer-Style*, an additional program transformations producing purely incremental functional programs that create and maintain nested tuples of intermediate results. To prove CTS conversion correct, we extend the correctness proof of static differentiation from STLC to untyped λ-calculus via *step-indexed logical relations*, and prove sound the additional transformation via simulation theorems.

To show ILC-based languages can improve performance relative to from-scratch recomputation, and that CTS conversion can extend its applicability, we perform an initial performance case study. We provide derivatives of primitives for operations on collections and incrementalize selected example programs using those primitives, confirming expected asymptotic speedups.

# 1 Introduction

After computing a base output from some base input, we often need to produce updated outputs corresponding to updated inputs. Instead of rerunning the same *base program* on the updated input, incremental computation transforms the input change to an output change, potentially reducing asymptotic time complexity and significantly improving efficiency, especially for computations running on large data sets.

Incremental λ-Calculus (ILC) [7] is a recent framework for *higher-order* incremental computation. ILC represents changes from a base value *v*<sup>1</sup> to an updated value *v*<sup>2</sup> as a first-class *change value dv*. Since functions are first-class values, change values include *function changes*.

ILC also statically transforms *base programs* to *incremental programs* or *derivatives*, that are functions mapping input changes to output changes. Incremental language designers can then provide their language with (higher-order) primitives (with their derivatives) that efficiently encapsulate incrementalizable

computation skeletons (such as tree-shaped folds), and ILC will incrementalize higher-order programs written in terms of these primitives.

Alas, ILC only incrementalizes efficiently *self-maintainable computations* [7, Sect. 4.3], that is, computations whose output changes can be computed using only input changes, but not the inputs themselves [11]. Few computations are selfmaintainable: for instance, mapping self-maintainable functions on a sequence is self-maintainable, but dividing numbers is not! We elaborate on this problem in Sect. 2.1. In this paper, we extend ILC to non-self-maintainable computations. To this end, we must enable derivatives to reuse intermediate results created by the base computation.

Many incrementalization approaches remember intermediate results through dynamic memoization: they typically use hashtables to memoize function results, or dynamic dependence graphs [1] to remember a computation trace. However, looking up intermediate results in such dynamic data structure has a runtime cost that is hard to optimize; and reasoning on dynamic dependence graphs and computation traces is often complex. Instead, ILC produces purely functional programs, suitable for further optimizations and equational reasoning.

To that end, we replace dynamic memoization with *static memoization*: following Liu and Teitelbaum [20], we transform programs to *cache-transfer style (CTS)*. A CTS function outputs their primary result along with *caches* of intermediate results. These caches are just nested tuples whose structure is derived from code, and accessing them does not involve looking up keys depending on inputs. Instead, intermediate results can be fetched from these tuples using statically known locations. To integrate CTS with ILC, we extend differentiation to produce *CTS derivatives*: these can extract from caches any intermediate results they need, and produce updated caches for the next computation step.

The correctness proof of static differentiation in CTS is challenging. First, we must show a forward simulation relation between two triples of reduction traces (the first triple being made of the source base evaluation, the source updated evaluation and the source derivative evaluation; the second triple being made of the corresponding CTS-translated evaluations). Dealing with six distinct evaluation environments at the same time was error prone on paper and for this reason, we conducted the proof using Coq [26]. Second, the simulation relation must not only track values but also caches, which are only partially updated while in the middle of the evaluation of derivatives. Finally, we study the translation for an untyped λ-calculus, while previous ILC correctness proofs were restricted to simply-typed λ-calculus. Hence, we define which changes are valid via a *logical relation* and show its *fundamental property*. Being in an untyped setting, our logical relation is not indexed by types, but *step-indexed*. We study an untyped language, but our work also applies to the erasure of typed languages. Formalizing a type-preserving translation is left for future work because giving a type to CTS programs is challenging, as we shall explain.

In addition to the correctness proof, we present preliminary experimental results from three case studies. We obtain efficient incremental programs even on non self-maintainable functions.

We present our contributions as follows. First, we summarize ILC and illustrate the need to extend it to remember intermediate results via CTS (Sect. 2). Second, in our mechanized formalization (Sect. 3), we give a novel proof of correctness for ILC differentiation for untyped λ-calculus, based on step-indexed logical relations (Sect. 3.4). Third, building on top of ILC differentiation, we show how to transform untyped higher-order programs to CTS (Sect. 3.5) and we show that CTS functions and derivatives *simulate* correctly their non-CTS counterparts (Sect. 3.7). Finally, in our case studies (Sect. 4), we compare the performance of the generated code to the base programs. Section 4.4 discusses limitations and future work. Section 5 discusses related work and Sect. 6 concludes. Our mechanized proof in Coq, the case study material, and the extended version of this paper with appendixes are available online at https://github.com/ yurug/cts.

### 2 ILC and CTS Primer

In this section we exemplify ILC by applying it on an average function, show why the resulting incremental program is asymptotically inefficient, and use CTS conversion and differentiation to incrementalize our example efficiently and speed it up asymptotically (as confirmed by benchmarks in Sect. 4.1). Further examples in Sect. 4 apply CTS to higher-order programs and suggest that CTS enables incrementalizing efficiently some core database primitives such as joins.

#### 2.1 Incrementalizing *average* via ILC

Our example computes the average of a bag of numbers. After computing the *base output y*<sup>1</sup> of the average function on the *base input* bag *xs*1, we want to update the output in response to a stream of updates to the input bag. Here and throughout the paper, we contrast *base* vs *updated* inputs, outputs, values, computations, and so on. For simplicity, we assume we have two *updated inputs xs*<sup>2</sup> and *xs*<sup>3</sup> and want to compute two *updated outputs y*<sup>2</sup> and *y*3. We express this program in Haskell as follows:

```
average :: Bag Z → Z
average xs = let s = sum xs; n = length xs; r = div s n in r
average3 = let y1 = average xs1; y2 = average xs2; y3 = average xs3
             in (y1, y2, y3)
```
To compute the updated outputs *y*<sup>2</sup> and *y*<sup>3</sup> in *average*<sup>3</sup> faster, we try using ILC. For that, we assume that we receive not only updated inputs *xs*<sup>2</sup> and *xs*<sup>3</sup> but also *input change dxs*<sup>1</sup> from *xs*<sup>1</sup> to *xs*<sup>2</sup> and input change *dxs*<sup>2</sup> from *xs*<sup>2</sup> to *xs*3. A change *dx* from *x*<sup>1</sup> to *x*<sup>2</sup> describes the changes from base value *x*<sup>1</sup> to updated value *<sup>x</sup>*2, so that *<sup>x</sup>*<sup>2</sup> can be computed via the *update operator* <sup>⊕</sup> as *<sup>x</sup>*<sup>1</sup> <sup>⊕</sup> *dx* . A nil change **0***<sup>x</sup>* is a change from base value *x* to updated value *x* itself.

ILC differentiation automatically transforms the *average* function to its derivative *daverage* :: *Bag* <sup>Z</sup> <sup>→</sup> <sup>Δ</sup>(*Bag* <sup>Z</sup>) <sup>→</sup> <sup>Δ</sup>Z. A derivative maps input changes to output changes: here, *dy*<sup>1</sup> = *daverage xs*<sup>1</sup> *dxs*<sup>1</sup> is a change from base output *y*<sup>1</sup> = *average xs*<sup>1</sup> to updated output *y*<sup>2</sup> = *average xs*2, hence *<sup>y</sup>*<sup>2</sup> <sup>=</sup> *<sup>y</sup>*<sup>1</sup> <sup>⊕</sup> *dy*1.

Thanks to *daverage*'s correctness, we can rewrite *average*<sup>3</sup> to avoid expensive calls to *average* on updated inputs and use *daverage* instead:

```
incrementalAverage3 :: (Z, Z, Z)
incrementalAverage3 =
  let y1 = average xs1; dy1 = daverage xs1 dxs1
      y2 = y1 ⊕ dy1; dy2 = daverage xs2 dxs2
      y3 = y2 ⊕ dy2
  in (y1, y2, y3)
```
In general, also the value of a function *<sup>f</sup>* :: *<sup>A</sup>* <sup>→</sup> *<sup>B</sup>* can change from a base value *f*<sup>1</sup> to an updated value *f*2, mainly when *f* is a closure over changing data. In that case, the change from base output *f*<sup>1</sup> *x*<sup>1</sup> to updated output *f*<sup>2</sup> *x*<sup>2</sup> is given by *df x*<sup>1</sup> *dx* , where *df* :: *<sup>A</sup>* <sup>→</sup> <sup>Δ</sup>*<sup>A</sup>* <sup>→</sup> <sup>Δ</sup>*<sup>B</sup>* is now a *function change* from *<sup>f</sup>*<sup>1</sup> to *f*2. Above, *average* exemplifies the special case where *f*<sup>1</sup> = *f*<sup>2</sup> = *f* : then the function change *df* is a nil change, and *df x*<sup>1</sup> *dx* is a change from *f*<sup>1</sup> *x*<sup>1</sup> = *f x*<sup>1</sup> and *f*<sup>2</sup> *x*<sup>2</sup> = *f x*2. That is, a nil function change for *f* is a derivative of *f* .

#### 2.2 Self-maintainability and Efficiency of Derivatives

Alas, derivatives are efficient only if they are *self-maintainable*, and *daverage* is not, so *incrementalAverage*<sup>3</sup> is no faster than *average*3! Consider the result of differentiating *average*:

*daverage* :: *Bag* <sup>Z</sup> <sup>→</sup> <sup>Δ</sup>(*Bag* <sup>Z</sup>) <sup>→</sup> <sup>Δ</sup><sup>Z</sup> *daverage xs dxs* = **let** *s* = *sum xs*; *ds* = *dsum xs dxs*; *n* = *length xs*; *dn* = *dlength xs dxs*; *r* = *div s n*; *dr* = *ddiv s ds n dn* **in** *dr*

Just like *average* combines *sum*, *length*, and *div*, its derivative *daverage* combines those functions and their derivatives. *daverage* recomputes base intermediate results *s*, *n* and *r* exactly as done in *average*, because they might be needed as base inputs of derivatives. Since *r* is unused, its recomputation can be dropped during later optimizations, but expensive intermediate results *s* and *n* are used by *ddiv*:

$$\begin{array}{l} ddiv :: \mathbb{Z} \to \Delta \mathbb{Z} \to \mathbb{Z} \to \Delta \mathbb{Z} \to \Delta \mathbb{Z} \\\ ddiv a \; a \; da \; b \; db = \operatorname{div} \left(a \oplus da\right) \left(b \oplus db\right) - \operatorname{div} \; a \; b \end{array}$$

Function *ddiv* computes the difference between the updated and the original result, so it needs its base inputs *a* and *b*. Hence, *daverage* must recompute *s* and *n* and will be slower than *average*!

Typically, ILC derivatives are only efficient if they are *self-maintainable*: a self-maintainable derivative does not inspect its base inputs, but only its change inputs, so recomputation of its base inputs can be elided. Cai et al. [7] leave efficient support for non-self-maintainable derivatives for future work.

But this problem is fixable: executing *daverage xs dxs* will compute exactly the same *s* and *n* as executing *average xs*, so to avoid recomputation we must simply save *s* and *n* and reuse them. Hence, we CTS-convert each function *f* to a *CTS function fC* and a *CTS derivative dfC* : CTS function *fC* produces, together with its final result, a *cache* containing intermediate results, that the caller must pass to CTS derivative *dfC* .

CTS-converting our example produces the following code, which requires no wasteful recomputation.

```
type AverageC = (Z, SumC, Z, LengthC, Z, DivC)
averageC :: Bag Z → (Z, AverageC )
averageC xs =
  let (s, cs1) = sumC xs; (n, cn1) = lengthC xs; (r, cr 1) = divC s n
  in (r, (s, cs1, n, cn1, r, cr 1))
daverageC :: Bag Z → Δ(Bag Z) → AverageC → (ΔZ, AverageC )
daverageC xs dxs (s, cs1, n, cn1, r, cr 1) =
  let (ds, cs2) = dsumC xs dxs cs1
      (dn, cn2) = dlengthC xs dxs cn1
      (dr, cr 2) = ddivC s ds n dn cr 1
  in (dr, ((s ⊕ ds), cs2, (n ⊕ dn), cn2, (r ⊕ dr), cr 2))
```
For each function *f* , we introduce a type *FC* for its cache, such that a CTS function *fC* has type *<sup>A</sup>* <sup>→</sup> (*B*, *FC* ) and CTS derivative *dfC* has type *<sup>A</sup>* <sup>→</sup> <sup>Δ</sup>*<sup>A</sup>* <sup>→</sup> *FC* <sup>→</sup> (Δ*B*, *FC* ). Crucially, CTS derivatives like *daverageC* must return an updated cache to ensure correct incrementalization, so that application of further changes works correctly. In general, if (*y*1, *c*1) = *fC x*<sup>1</sup> and (*dy*, *c*2) = *dfC x*<sup>1</sup> *dx c*1, then (*y*<sup>1</sup> <sup>⊕</sup> *dy*, *<sup>c</sup>*2) must equal the result of the base function *fC* applied to the updated input *<sup>x</sup>*<sup>1</sup> <sup>⊕</sup> *dx* , that is (*y*<sup>1</sup> <sup>⊕</sup> *dy*, *<sup>c</sup>*2) = *fC* (*x*<sup>1</sup> <sup>⊕</sup> *dx* ).

For CTS-converted functions, the cache type *FC* is a tuple of intermediate results and caches of subcalls. For primitive functions like *div*, the cache type *DivC* could contain information needed for efficient computation of output changes. In the case of *div*, no additional information is needed. The definition of *divC* uses *div* and produces an empty cache, and the definition of *ddivC* follows the earlier definition for *ddiv*, except that we now pass along an empty cache.

```
data DivC = DivC
divC :: Z → Z → (Z, DivC)
divC a b = (div a b, DivC)
ddivC :: Z → ΔZ → Z → ΔZ → DivC → (ΔZ, DivC)
ddivC a da b db DivC = (div (a ⊕ da) (b ⊕ db) − div a b, DivC)
```
Finally, we can rewrite *average*<sup>3</sup> to incrementally compute *y*<sup>2</sup> and *y*3:

```
ctsIncrementalAverage3 :: (Z, Z, Z)
ctsIncrementalAverage3 =
  let (y1, c1) = averageC xs1; (dy1, c2) = daverageC xs1 dxs1 c1
      y2 = y1 ⊕ dy1; (dy2, c3) = daverageC xs2 dxs2 c2
      y3 = y2 ⊕ dy2
  in (y1, y2, y3)
```
Since functions of the same type translate to CTS functions of different types, in a higher-order language CTS translation is not always type-preserving; however, this is not a problem for our case studies (Sect. 4); Sect. 4.1 shows how to map such functions, and we return to this problem in Sect. 4.4.

# 3 Formalization

We now formalize CTS-differentiation for an untyped Turing-complete λcalculus, and formally prove it sound with respect to differentiation. We also give a novel proof of correctness for differentiation itself, since we cannot simply adapt Cai et al. [7]'s proof to the new syntax: Our language is untyped and Turing-complete, while Cai et al. [7]'s proof assumed a strongly normalizing simply-typed λ-calculus and relied on its naive set-theoretic denotational semantics. Our entire formalization is mechanized using Coq [26]. For reasons of space, some details are deferred to the appendix.


Fig. 1. Our language λ*<sup>L</sup>* of lambda-lifted programs. Tuples can be nullary.

*Transformations.* We introduce and prove sound three term transformations, namely differentiation, CTS translation and CTS differentiation, that take a function to its corresponding (non-CTS) derivative, CTS function and CTS derivative. Each CTS function produces a base output and a cache from a base input, while each CTS derivative produces an output change and an updated cache from an input, an input change and a base cache.

*Proof technique.* To show soundness, we prove that CTS functions and derivatives simulate respectively non-CTS functions and derivatives. In turn, we formalize (non-CTS) differentiation as well, and we prove differentiation sound with respect to non-incremental evaluation. Overall, this shows that CTS functions and derivatives are sound relatively to non-incremental evaluation. Our presentation proceeds in the converse order: first, we present differentiation, formulated as a variant of Cai et al. [7]'s definition; then, we study CTS differentiation.

By using logical relations, we simplify significantly the setup of Cai et al. [7]. To handle an untyped language, we employ *step-indexed* logical relations. Besides, we conduct our development with big-step operational semantics because that choice simplifies the correctness proof for CTS conversion. Using big-step semantics for a Turing complete language restricts us to terminating computations. But that is not a problem: to show incrementalization is correct, we need only consider computations that terminate on both old and new inputs, following Acar et al. [3] (compared with in Sect. 5).

*Structure of the formalization.* Section 3.1 introduces the syntax of the language λ*<sup>L</sup>* we consider in this development, and introduces its four sublanguages λ*AL*, λ*IAL*, λ*CAL* and λ*ICAL*. Section 3.2 presents the syntax and the semantics of λ*AL*, the source language for our transformations. Section 3.3 defines differentiation and its target language λ*IAL*, and Sect. 3.4 proves differentiation correct. Section 3.5 defines CTS conversion, comprising CTS translation and CTS differentiation, and their target languages λ*CAL* and λ*ICAL*. Section 3.6 presents the semantics of λ*CAL*. Finally, Sect. 3.7 proves CTS conversion correct.

*Notations.* We write X for a sequence of X of some unspecified length X1,...,Xm.

#### 3.1 Syntax for *λ<sup>L</sup>*

*A superlanguage.* To simplify our transformations, we require input programs to have been lambda-lifted [15] and converted to A'-normal form (A'NF). Lambdalifted programs are convenient because they allow us to avoid a specific treatment for free variables in transformations. A'NF is a minor variant of ANF [24], where every result is bound to a variable before use; unlike ANF, we also bind the result of the tail call. Thus, every result can thus be stored in a cache by CTS conversion and reused later (as described in Sect. 2). This requirement is not onerous: A'NF is a minimal variant of ANF, and lambda-lifting and ANF conversion are routine in compilers for functional languages. Most examples we show are in this form.

In contrast, our transformation's outputs are lambda-lifted but not in A'NF. For instance, we restrict base functions to take exactly one argument—a base input. As shown in Sect. 2.1, CTS functions take instead two arguments—a base input and a cache—and CTS derivatives take three arguments—an input, an input change, and a cache. We could normalize transformation outputs to inhabit the source language and follow the same invariants, but this would complicate our proofs for little benefit. Hence, we do not *prescribe* transformation outputs

to satisfy the same invariants, and we rather *describe* transformation outputs through separate grammars.

As a result of this design choice, we consider languages for base programs, derivatives, CTS programs and CTS derivatives. In our Coq mechanization, we formalize those as four separate languages, saving us many proof steps to check the validity of required structural invariants. For simplicity, in this paper we define a single language called λ*<sup>L</sup>* (for λ-Lifted). This language satisfies invariants common to all these languages (including some of the A'NF invariants). Then, we define *sublanguages* of λ*L*. We describe the semantics of λ*<sup>L</sup>* informally, and we only formalize the semantics of its sublanguages.

*Syntax for terms.* The λ*<sup>L</sup>* language is a relatively conventional lambda-lifted λcalculus with a limited form of pattern matching on tuples. The syntax for terms and values is presented in Fig. 1. We separate terms and values in two distinct syntactic classes because we use big-step operational semantics. Our **let**-bindings are non-recursive as usual, and support shadowing. Terms cannot contain λexpressions directly, but only refer to closures through the environment, and similarly for literals and primitives; we elaborate on this in Sect. 3.2. We do not introduce case expressions, but only bindings that destructure tuples, both in **let**-bindings and λ-expressions of closures. Our semantics does not assign meaning to match failures, but pattern-matchings are only used in generated programs and our correctness proofs ensure that the matches always succeed. We allow tuples to contain terms of form *<sup>x</sup>* <sup>⊕</sup> *dx* , which update base values *<sup>x</sup>* with changes in *dx* , because A'NF-converting these updates is not necessary to the transformations. We often inspect the result of a function call "*f x* ", which is not a valid term in our syntax. Hence, we write "@(*f* , *x* )" as a syntactic sugar for "**let** *y* = *f x* **in** *y*" with *y* chosen fresh.

*Syntax for closed values.* A closed value is either a closure, a tuple of values, a literal, a primitive, a nil change for a primitive or a replacement change. A closure is a pair of an evaluation environment E and a λ-abstraction closed with respect to E. The set of available literals is left abstract. It may contain usual first-order literals like integers. We also leave abstract the primitives **p** like if-then-else or projections of tuple components. Each primitive **p** comes with a nil change, which is its derivative as explained in Sect. 2. A change value can also represent a replacement by some closed value av. Replacement changes are not produced by static differentiation but are useful for clients of derivatives: we include them in the formalization to make sure that they are not incompatible with our system. As usual, environments E map variables to closed values.

*Sublanguages of* λ*L*. The source language for all our transformations is a sublanguage of λ*<sup>L</sup>* named λ*AL*, where A stands for A'NF. To each transformation we associate a target language, which matches the transformation image. The target language for CTS conversion is named λ*CAL*, where "C" stands for CTS. The target languages of differentiation and CTS differentiation are called, respectively, λ*IAL* and λ*ICAL*, where the "I" stands for incremental.

#### 3.2 The Source Language *λAL*

We show the syntax of λ*AL* in Fig. 2. As said above, λ*AL* is a sublanguage of λ*<sup>L</sup>* denoting lambda-lifted base terms in A'NF. With no loss of generality, we assume that all bound variables in λ*AL* programs and closures are distinct. The step-indexed big-step semantics (Fig. 3) for base terms is defined by the judgment written <sup>E</sup> <sup>t</sup> ⇓<sup>n</sup> *<sup>v</sup>* (where <sup>n</sup> can be omitted) and pronounced "Under environment E, base term t evaluates to closed value *v* in n steps." Intuitively, our step-indexes count the number of "nodes" of a big-step derivation.<sup>1</sup> As they are relatively standard, we defer the explanations of these rules to Appendix B.

$$\begin{array}{l} \text{Term differentiation} \begin{array}{l} \text{\$d\$} = \mathcal{D}^{\epsilon}(t) \text{\$} \\ \mathcal{D}^{\epsilon}(x) = dx \\ \mathcal{D}^{\epsilon}(\text{let } y = f \ x \text{ in } t) = \\ \text{let } y = f \ x, dy = df \ x \text{ in } \mathcal{D}^{\epsilon}(t) \\ \mathcal{D}^{\epsilon}(\text{let } y = (\overline{x}) \text{ in } t) = \\ \text{let } y = (\overline{x}), dy = (\overline{dx}) \text{ in } \mathcal{D}^{\epsilon}(t) \\ \text{Value differentiation} \begin{array}{l} dv = \mathcal{D}^{\epsilon}(v) \\ \hline \\ \mathcal{D}^{\epsilon}(E\_{f}[\lambda x. t]) = \overline{\mathcal{D}^{\epsilon}(E\_{f})} \end{array} \\ \mathcal{D}^{\epsilon}(E\_{f}[\lambda x. t]) = \mathcal{D}^{\epsilon}(E\_{f})[\lambda x \text{ d} \mathcal{D}^{\epsilon}(t)] \\ \mathcal{D}^{\epsilon}(\ell) = \text{nil } \ell \\ \mathcal{D}^{\epsilon}(\mathbf{p}) = 0\_{\mathbf{p}} \end{array} \end{array}$$

$$\begin{aligned} \mathcal{D}'(E; x = v) &= \mathcal{D}'(E); x = v; dx = \mathcal{D}'(v) \\ \text{Base/updated environment} & \boxed{E = \lfloor dE \rfloor\_i} \\ \left\lfloor \bullet \rfloor\_i &= \bullet \quad \text{i = 1, 2} \\ dE; x = v; dx = dv \right\rfloor\_i &= \lfloor dE \rfloor\_i; x = v' \\ & \qquad \qquad v' = v \text{ if } i = 1 \text{ or } \\ & \qquad v' = v \oplus dv \text{ if } i = 2 \end{aligned}$$

$$\begin{array}{l} \text{in } dt\\ \mid \quad \text{let } y = (\overline{x}), dy = (\overline{dx})\\ \text{in } dt \end{array}$$

Fig. 2. Static differentiation <sup>D</sup><sup>ι</sup> (–); syntax of its target language λ*IAL*, tailored to the output of differentiation; syntax of its source language λ*AL*. We assume that in λ*IAL* the same **let** binds both *y* and *dy* and that α-renaming preserves this invariant. We also define the *base environment dE*<sup>1</sup> and the *updated environment dE*<sup>2</sup> of a change environment *dE*.

*Expressiveness.* A closure in the base environment can be used to represent a top-level definition. Since environment entries can point to primitives, we need no syntax to directly represent calls of primitives in the syntax of base terms. To encode in our syntax a program with top-level definitions and a term to be evaluated representing the entry point, one can produce a term t representing the

<sup>1</sup> It is more common to count instead small-step evaluation steps [3,4], but our choice simplifies some proofs and makes a minor difference in others.

$$\begin{array}{c} \text{[SVAR]} \begin{array}{c} \text{[STMPLE]} \\ \hline E \vdash x \Downarrow \text{1}\_{n} \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TIVEC} \text{ALL]} \\ \hline E; y = (E(\mathbb{Z})) \vdash t \Downarrow \text{n} \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TIVEC} \text{ALL]} \\ \hline E(f) = \textbf{p} \\ E \vdash \textbf{let} \; y = (\mathbb{Z}) \text{in} \; t \Downarrow \text{n} + 1 \; v \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TIVEC} \text{ALL]} \\ \hline E(f) = \textbf{p} \\ E \vdash \textbf{let} \; y = f \; \textbf{n} \; t \Downarrow \text{n} + 1 \; v \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TIVEC} \text{ALL]} \\ \hline E(f) = \textbf{p} \\ E \vdash \textbf{let} \; y = f \; \textbf{n} \; t \Downarrow \text{n} + 1 \; v \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TIVEC} \text{ALL]} \\ \hline E(f) = \textbf{p} \\ E \vdash \textbf{let} \; y = f \; \textbf{n} \; t \Downarrow \text{n} + 1 \; v \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TUEC} \text{ALL]} \\ \hline E(f) = \textbf{p} \\ E \vdash \textbf{let} \; y = f \; \textbf{n} \; t \Downarrow \text{n} + 1 \; v \end{array} \quad \begin{array}{c} \text{[SPR]} \text{TUEC} \text{ALL]} \\ \hline E(f) = \textbf{p} \\ E \vdash \textbf{p}$$

Fig. 3. Step-indexed big-step semantics for base terms of source language λ*AL*.

entry point together with an environment E containing as values any top-level definitions, primitives and literals used in the program. Semi-formally, given an environment E<sup>0</sup> mentioning needed primitives and literals, and a list of top-level function definitions D = *f* = λ*x* . t defined in terms of E0, we can produce a base environment <sup>E</sup> <sup>=</sup> <sup>L</sup>(D), with <sup>L</sup> defined by:

$$\mathcal{L}(\bullet) = E\_0 \text{ and } \mathcal{L}(D, f = \lambda x. t) = E, f = E[\lambda x. t] \text{ where } \mathcal{L}(D) = E$$

Correspondingly, we extend all our term transformations to values and environments to transform such encoded top-level definitions.

Our mechanization can encode n-ary functions "λ(x1, x2,...,xn). t" through unary functions that accept tuples; we encode partial application using a **curry** primitive such that, essentially, **curry** fxy = f (x, y); suspended partial applications are represented as closures. This encoding does not support currying efficiently, we further discuss this limitation in Sect. 4.4.

Control operators, like recursion combinators or branching, can be introduced as primitive operations as well. If the branching condition changes, expressing the output change in general requires replacement changes. Similarly to branching we can add tagged unions.

To check the assertions of the last two paragraphs, the Coq development contains the definition of a **curry** primitive as well as a primitive for a fixpoint combinator, allowing general recursion and recursive data structures as well.

#### 3.3 Static Differentiation from *λAL* to *λIAL*

Previous work [7] defines static differentiation for simply-typed λ-calculus terms. Figure 2 transposes differentiation as a transformation from λ*AL* to λ*IAL* and defines λ*IAL*'s syntax.

Differentiating a base term <sup>t</sup> produces a change term <sup>D</sup><sup>ι</sup> (t), its *derivative*. Differentiating final result variable *x* produces its change variable *dx* . Differentiation copies each binding of an intermediate result *y* to the output and adds a new binding for its change *dy*. If *y* is bound to tuple (*x* ), then *dy* will be bound to the change tuple (*dx* ). If *y* is bound to function application "*f x* ", then *dy* will be bound to the application of function change *df* to input *x* and its change *dx* . We explain differentiation of environments <sup>D</sup><sup>ι</sup> (E) later in this section.


Evaluating <sup>D</sup><sup>ι</sup> (t) recomputes all intermediate results computed by t. This recomputation will be avoided through cache-transfer style in Sect. 3.5. A comparison with the original static differentiation [7] can be found in Appendix A.

*Semantics for* λ*IAL*. We move on to define how λ*IAL* change terms evaluate to change values. We start by defining necessary definitions and operations on changes, such as define *change values dv*, *change environments dE*, and the *update operator* ⊕.

Closed change values *dv* are particular λ*<sup>L</sup>* values av. They are either a closure change, a tuple change, a literal change, a replacement change or a primitive nil change. A closure change is a closure containing a change environment *dE* and a λ-abstraction expecting a value and a change value as arguments to evaluate a change term into an output change value. An evaluation environment *dE* follows the same structure as **let**-bindings of change terms: it binds variables to closed values and each variable *x* is immediately followed by a binding for its associated change variable *dx* . As with **let**-bindings of change terms, α-renamings in an environment *dE* must rename *dx* into *dy* if *x* is renamed into *y*. We define the *update operator* ⊕ to update a value with a change. This operator is a partial function written "*<sup>v</sup>* <sup>⊕</sup> *dv*", defined as follows:

$$\begin{array}{ccc} v\_1 \oplus \, !v\_2 & = v\_2 \\ \ell \oplus d\ell & = \delta\_\oplus(\ell, d\ell) \\ E[\lambda x.t] \oplus dE[\lambda x.dx.dt] & = (E \oplus dE)[\lambda x.t] \\ v(v\_1, \dots, v\_n) \oplus (dv\_1, \dots, dv\_n) & = (v\_1 \oplus dv\_1, \dots, v\_n \oplus dv\_n) \\ \mathbf{p} \oplus 0\_\mathbf{p} & = \mathbf{p} \end{array}$$

where (E; *<sup>x</sup>* <sup>=</sup> *<sup>v</sup>*) <sup>⊕</sup> (*dE*; *<sup>x</sup>* <sup>=</sup> *<sup>v</sup>*; *dx* <sup>=</sup> *dv*) = ((<sup>E</sup> <sup>⊕</sup> *dE*); *<sup>x</sup>* = (*<sup>v</sup>* <sup>⊕</sup> *dv*)).

Replacement changes can be used to update all values (literals, tuples, primitives and closures), while tuple changes can only update tuples, literal changes can only update literals, primitive nil can only update primitives and closure changes can only update closures. A replacement change overrides the current value *v* with a new one *v* . On literals, ⊕ is defined via some interpretation function <sup>δ</sup>⊕, which takes a literal and a literal change to produce an updated literal. Change update for a closure ignores *dt* instead of computing something like dE[<sup>t</sup> <sup>⊕</sup> *dt*]. This may seem surprising, but we only need <sup>⊕</sup> to behave well for valid changes (as shown by Theorem 3.1): for valid closure changes, *dt* must behave anyway similarly to <sup>D</sup><sup>ι</sup> (t), which Cai et al. [7] show to be a nil change. Hence, <sup>t</sup> ⊕ D<sup>ι</sup> (t) and <sup>t</sup> <sup>⊕</sup> *dt* both behave like <sup>t</sup>, so <sup>⊕</sup> can ignore *dt* and only consider environment updates. This definition also avoids having to modify terms at runtime, which would be difficult to implement safely. We could also implement *<sup>f</sup>* <sup>⊕</sup> *df* as a function that invokes both *<sup>f</sup>* and *df* on its argument, as done by Cai et al. [7], but we believe that would be less efficient when ⊕ is used at runtime. As we discuss in Sect. 3.4, we restrict validity to avoid this runtime overhead.

Having given these definitions, we show in Fig. 4 a step-indexed big-step semantics for change terms, defined through judgment *dE dt* ⇓<sup>n</sup> *dv* (where <sup>n</sup> can be omitted). This judgment is pronounced "Under the environment *dE*, the change term *dt* evaluates into the closed change value *dv* in n steps." Rules [SDVar] and [SDTuple] are unsurprising. To evaluate function calls in **let**bindings "**let** *y* = *f x* , *dy* = *df x dx* **in** *dt*" we have three rules, depending on the shape of *dE*(*df* ). These rules all recompute the value *v<sup>y</sup>* of *y* in the original environment, but compute differently the change *dy* to *y*. If *dE*(*df* ) replaces the value of *f* , [SDReplaceCall] recomputes *v <sup>y</sup>* <sup>=</sup> *f x* from scratch in the new environment, and bind *dy* to !*v <sup>y</sup>* when evaluating the **let** body. If *dE*(*df* ) is the nil change for primitive **p**, [SDPrimitiveNil] computes *dy* by running **p**'s derivative through function Δ**p**(–). If *dE*(*df* ) is a closure change, [SDClosureChange] invokes it normally to compute its change *dv <sup>y</sup>* . As we show, if the closure change is valid, its body behaves like *f* 's derivative, hence incrementalizes *f* correctly.

Closure changes with non-nil environment changes represent partial application of derivatives to non-nil changes; for instance, if *f* takes a pair and *dx* is a non-nil change, 0**curry** *f df x dx* constructs a closure change containing *dx* , using the derivative of **curry** mentioned in Sect. 3.2. In general, such closure changes do not arise from the rules we show, only from derivatives of primitives.

#### 3.4 A New Soundness Proof for Static Differentiation

In this section, we show that static differentiation is sound (Theorem 3.3) and that Eq. (1) holds:

$$f \ a\_2 = f \ a\_1 \oplus \mathcal{D}^\iota(f) \ a\_1 \ da \tag{1}$$

whenever *da* is a valid change from a<sup>1</sup> to a<sup>2</sup> (as defined later). One might want to prove this equation assuming only that <sup>a</sup><sup>1</sup> <sup>⊕</sup> *da* <sup>=</sup> <sup>a</sup>2, but this is false in general. A direct proof by induction on terms fails in the case for application (ultimately because <sup>f</sup><sup>1</sup> <sup>⊕</sup> *df* <sup>=</sup> <sup>f</sup><sup>2</sup> and <sup>a</sup><sup>1</sup> <sup>⊕</sup> *da* <sup>=</sup> <sup>a</sup><sup>2</sup> do not imply that <sup>f</sup><sup>1</sup> <sup>a</sup><sup>1</sup> <sup>⊕</sup> *df* <sup>a</sup><sup>1</sup> *da* <sup>=</sup> f<sup>2</sup> a2). As usual, this can be fixed by introducing a logical relation. We call ours *validity*: a function change is valid if it turns valid input changes into valid output changes.


Fig. 5. Step-indexed validity, through judgments for values and for terms.

Static differentiation is only sound on input changes that are *valid*. Cai et al. [7] show soundness for a strongly normalizing simply-typed λ-calculus using denotational semantics. Using an operational semantics, we generalize this result to an untyped and Turing-complete language, so we must turn to a *step-indexed* logical relation [3,4].

*Validity as a step-indexed logical relation.* We say that "*dv* is a valid change from *v*<sup>1</sup> to *v*2, up to k steps" and write

> *dv* <sup>k</sup> *<sup>v</sup>*<sup>1</sup> <sup>→</sup> *<sup>v</sup>*<sup>2</sup>

to mean that *dv* is a change from *v*<sup>1</sup> to *v*<sup>2</sup> and that *dv* is a *valid* description of the differences between *v*<sup>1</sup> and *v*2, with validity tested with up to k steps. This relation *approximates* validity; if a change *dv* is valid at all approximations, it is simply valid (between *v*<sup>1</sup> and *v*2); we write then *dv <sup>v</sup>*<sup>1</sup> <sup>→</sup> *<sup>v</sup>*<sup>2</sup> (omitting the step-index k) to mean that validity holds at all step-indexes. We similarly omit step-indexes k from other step-indexed relations when they hold for all k.

To justify this intuition of validity, we show that a valid change from *v*<sup>1</sup> to *v*<sup>2</sup> goes indeed from *v*<sup>1</sup> to *v*<sup>2</sup> (Theorem 3.1), and that if a change is valid up to k steps, it is also valid up to fewer steps (Lemma 3.2).

# Theorem 3.1 (⊕ agrees with validity)

*If dv* <sup>k</sup> *<sup>v</sup>*<sup>1</sup> <sup>→</sup> *<sup>v</sup>*<sup>2</sup> *holds for all* k > <sup>0</sup>*, then <sup>v</sup>*<sup>1</sup> <sup>⊕</sup> *dv* <sup>=</sup> *<sup>v</sup>*2*.*

#### Lemma 3.2 (Downward-closure)

*If* <sup>N</sup> <sup>≥</sup> <sup>n</sup>*, then dv* -<sup>N</sup> *<sup>v</sup>*<sup>1</sup> <sup>→</sup> *<sup>v</sup>*<sup>2</sup> *implies dv* <sup>n</sup> *<sup>v</sup>*<sup>1</sup> <sup>→</sup> *<sup>v</sup>*2*.*

Crucially, Theorem 3.1 enables (a) computing *v*<sup>2</sup> from a valid change and its source, and (b) showing Eq. (1) through validity. As discussed, ⊕ ignores changes to closure bodies to be faster, which is only sound if those changes are nil; to ensure Theorem 3.1 still holds, validity on closure changes must be adapted accordingly and forbid non-nil changes to closure bodies. This choice, while unusual, does not affect our results: if input changes do not modify closure bodies, intermediate changes will not modify closure bodies either. Logical relation experts might regard this as a domain-specific invariant we add to our relation. Alternatives are discussed by Giarrusso [10, Appendix C].

As usual with step-indexing, validity is defined by well-founded induction over naturals ordered by <; to show well-foundedness we observe that evaluation always takes at least one step.

Validity for values, terms and environments is formally defined by cases in Fig. 5. First, a literal change *<sup>d</sup>* is a valid change from to <sup>⊕</sup> *<sup>d</sup>* <sup>=</sup> <sup>δ</sup>⊕(, *<sup>d</sup>*). Since the function <sup>δ</sup><sup>⊕</sup> is partial, the relation only holds for the literal changes *d* which are valid changes for . Second, a replacement change !*v*<sup>2</sup> is always a valid change from any value *v*<sup>1</sup> to *v*2. Third, a primitive nil change is a valid change between any primitive and itself. Fourth, a tuple change is valid up to step n, if each of its components is valid up to any step strictly less than n. Fifth, we define validity for closure changes. Roughly speaking, this statement means that a closure change is valid if (i) its environment change *dE* is valid for the original closure environment E<sup>1</sup> and for the new closure environment E2; and (ii) when applied to related values, the closure *bodies* t are related by *dt*, as defined by the auxiliary judgment (*dE dt*) <sup>n</sup> (E<sup>1</sup> <sup>t</sup>1) <sup>→</sup> (E<sup>2</sup> <sup>t</sup>2) for validity between terms under related environments (defined in Appendix C). As usual with step-indexed logical relations, in the definition for this judgment about terms, the number k of steps required to evaluate the term t<sup>1</sup> is subtracted from the number of steps n that can be used to relate the outcomes of the term evaluations.

*Soundness of differentiation.* We can state a soundness theorem for differentiation without mentioning step-indexes; thanks to this theorem, we can compute the updated result *v*<sup>2</sup> not by rerunning a computation, but by updating the base result *v*<sup>1</sup> with the result change *dv* that we compute through a derivative on the input change. A corollary shows Eq. (1).

Theorem 3.3 (Soundness of differentiation in λ*AL*). *If dE is a valid change environment from base environment* E<sup>1</sup> *to updated environment* E2*, that is dE* - <sup>E</sup><sup>1</sup> <sup>→</sup> <sup>E</sup>2*, and if* <sup>t</sup> *converges both in the base and updated environment, that is* <sup>E</sup><sup>1</sup> <sup>t</sup> ⇓ *<sup>v</sup>*<sup>1</sup> *and* <sup>E</sup><sup>2</sup> <sup>t</sup> ⇓ *<sup>v</sup>*2*, then* <sup>D</sup><sup>ι</sup> (t) *evaluates under the change environment dE to a valid change dv between base result v*<sup>1</sup> *and updated result <sup>v</sup>*2*, that is dE* D<sup>ι</sup> (t) ⇓ *dv, dv <sup>v</sup>*<sup>1</sup> <sup>→</sup> *<sup>v</sup>*<sup>2</sup> *and <sup>v</sup>*<sup>1</sup> <sup>⊕</sup> *dv* <sup>=</sup> *<sup>v</sup>*2*.*

We must first show that derivatives map input changes valid up to k steps to output changes valid up to k steps, that is, the *fundamental property* of our step-indexed logical relation:

Lemma 3.4 (Fundamental Property) *For each* n*, if dE* <sup>n</sup> <sup>E</sup><sup>1</sup> <sup>→</sup> <sup>E</sup><sup>2</sup> *then* (*dE* D<sup>ι</sup> (t)) <sup>n</sup> (E<sup>1</sup> <sup>t</sup>) <sup>→</sup> (E<sup>2</sup> <sup>t</sup>)*.*

Fig. 6. Cache-Transfer Style translation and syntax of its target language λ*CAL*.

#### 3.5 CTS Conversion

Figures 6 and 7 define both the syntax of λ*CAL* and λ*ICAL* and CTS conversion. The latter comprises CTS differentiation <sup>D</sup>(–), from <sup>λ</sup>*AL* to <sup>λ</sup>*ICAL*, and CTS translation <sup>T</sup> (–), from <sup>λ</sup>*AL* to <sup>λ</sup>*CAL*.

*Syntax definitions for the target languages* λ*CAL* and λ*ICAL*. Terms of λ*CAL* follow again λ-lifted A'NF, like λ*AL*, except that a **let**-binding for a function application "*f x* " now binds an extra *cache identifier* c *y fx* besides output *<sup>y</sup>*. Cache identifiers have non-standard syntax: it can be seen as a triple that refers to the value identifiers *f* , *x* and *y*. Hence, an α-renaming of one of these three identifiers must refresh the cache identifier accordingly. Result terms explicitly return cache C through syntax (*x* , C). Caches are encoded through nested tuples, but they are in fact a tree-like data structure that is isomorphic to an execution trace. This trace contains both immediate values and the execution traces of nested function calls.

The syntax for λ*ICAL* matches the image of the CTS derivative and witnesses the CTS discipline followed by the derivatives: to determine *dy*, the derivative of *f* evaluated at point *x* with change *dx* expects the cache produced by evaluating *y* in the base term. The derivative returns the updated cache which contains the intermediate results that would be gathered by the evaluation of *<sup>f</sup>* (*<sup>x</sup>* <sup>⊕</sup> *dx* ). The result term of every change term returns the computed change and a cache update *dC* , where each value identifier *x* of the input cache is updated with its corresponding change *dx* .

$$\begin{array}{c} \mathcal{T}((\overline{dv})) = (\overline{\mathcal{T}(dv)}) \\ \mathcal{T}(dE[\lambda x \, dx. \mathcal{D}^{\prime}(t)]) = \mathcal{T}(dE)[\lambda x \, dx. (\mathcal{C}(t)). \mathcal{D}\_t(t)] \\ \mathcal{T}(!v) = !\mathcal{T}(v) \\ \mathcal{T}(d\ell) = d\ell \\ \mathcal{T}(0\_\mathsf{p}) = 0\_\mathsf{p} \end{array}$$

$$dC ::= \begin{pmatrix} dC, c\_{fx}^{\vee} \end{pmatrix} \mid (dC, x \oplus dx)$$

Fig. 7. CTS differentiation and syntax of its target language λ*ICAL*. Beware <sup>T</sup> (*dE*[λ*x dx* . <sup>D</sup><sup>ι</sup> (t)]) applies a left-inverse of <sup>D</sup><sup>ι</sup> (t) during pattern matching.

*CTS conversion and differentiation.* These translations use two auxiliary functions: <sup>C</sup>(t) which computes the cache term of a <sup>λ</sup>*AL* term <sup>t</sup>, and <sup>U</sup>(t), which computes the cache update of t's derivative.

CTS translation on terms, <sup>T</sup>t(<sup>t</sup> ), accepts as inputs a *global* term t and a subterm t of t. In tail position (t = *x* ), the translation generates code to return both the result *<sup>x</sup>* and the cache <sup>C</sup>(t) of the global term <sup>t</sup>. When the transformation visits **let**-bindings, it outputs extra bindings for caches c *y fx* on function calls and visits the **let**-body.

Similarly to <sup>T</sup>t(<sup>t</sup> ), CTS derivation <sup>D</sup>t(<sup>t</sup> ) accepts a global term t and a subterm t of t. In tail position, the translation returns both the result change *dx* and the cache update <sup>U</sup>(t). On **let**-bindings, it *does not* output bindings for *<sup>y</sup>* but for *dy*, it outputs extra bindings for c *y fx* as in the previous case and visits the **let**-body.

To handle function definitions, we transform the base environment E through <sup>T</sup> (E) and <sup>T</sup> (D<sup>ι</sup> (E)) (translations of environments are done pointwise, see Appendix D). Since <sup>D</sup><sup>ι</sup> (E) includes <sup>E</sup>, we describe <sup>T</sup> (D<sup>ι</sup> (E)) to also cover <sup>T</sup> (E). Overall, <sup>T</sup> (D<sup>ι</sup> (E)) CTS-converts each source closure *f* = E[λ*x* . t] to a CTStranslated function, with body <sup>T</sup>t(t), and to the CTS derivative *df* of *<sup>f</sup>* . This CTS derivative pattern matches on its input cache using cache pattern <sup>C</sup>(t). That way, we make sure that the shape of the cache expected by *df* is consistent with the shape of the cache produced by *f* . The body of derivative *df* is computed by CTS-deriving *<sup>f</sup>* 's body via <sup>D</sup>t(t).

#### 3.6 Semantics of *λCAL* and *λICAL*

An evaluation environment F of λ*CAL* contains both values and cache values. Values V resemble λ*AL* values *v*, cache values V<sup>c</sup> match cache terms C and change values *dV* match λ*IAL* change values *dv*. Evaluation environments *dF* for change terms must also bind change values, so functions in change closures take not just a base input *x* and an input change *dx* , like in λ*IAL*, but also an input cache C. By abuse of notation, we reuse the same syntax C to both deconstruct and construct caches.

Base terms of the language are evaluated using a conventional big-step semantics, consisting of two judgments. Judgment "<sup>F</sup> <sup>M</sup> ⇓ (V,Vc)" is read "Under evaluation environment F, base term M evaluates to value V and cache Vc". The semantics follows the one of λ*AL*; since terms include extra code to produce and carry caches along the computation, the semantics evaluates that code as well. For space reasons, we defer semantic rules to Appendix E. Auxiliary judgment "<sup>F</sup> <sup>C</sup> ⇓ <sup>V</sup>c" evaluates cache terms into cache values: It traverses a cache term and looks up the environment for the values to be cached.

Change terms of λ*ICAL* are also evaluated using a big-step semantics, which resembles the semantics of λ*IAL* and λ*CAL*. Unlike those semantics, evaluating cache updates (*dC* , *<sup>x</sup>* <sup>⊕</sup> *dx* ) is evaluated using the <sup>⊕</sup> operator (overloaded on λ*CAL* values and λ*ICAL* changes). By lack of space, its rules are deferred to Appendix E. This semantics relies on three judgments. Judgment "*dF dM* ⇓ (*dV* , Vc)" is read "Under evaluation environment F, change term *dM* evaluates to change value *dV* and updated cache <sup>V</sup>c". The first auxiliary judgment "*dF dC* ⇓ <sup>V</sup>c" defines evaluation of cache update terms. The final auxiliary judgment "V<sup>c</sup> <sup>∼</sup> <sup>C</sup> <sup>→</sup> *dF*" describes a limited form of pattern matching used by CTS derivatives: namely, how a cache pattern C matches a cache value V<sup>c</sup> to produce a change environment *dF*.

#### 3.7 Soundness of CTS Conversion

The proof is based on a simulation in lock-step, but two subtle points emerge. First, we must relate λ*AL* environments that do not contain caches, with λ*CAL* environments that do. Second, while evaluating CTS derivatives, the evaluation environment mixes caches from the base computation and updated caches computed by the derivatives.

Theorem 3.7 follows because differentiation is sound (Theorem 3.3) and evaluation commutes with CTS conversion; this last point requires two lemmas. First, CTS translation of base terms commutes with our semantics:

#### Lemma 3.5 (Commutation for base evaluations)

*For all* E, t *and <sup>v</sup>, if* <sup>E</sup> <sup>t</sup> ⇓ *<sup>v</sup>, there exists* <sup>V</sup>c*,* <sup>T</sup> (E) Tt(t) ⇓ (<sup>T</sup> (*v*), Vc)*.*

Second, we need a corresponding lemma for CTS translation of differentiation results: intuitively, evaluating a derivative and CTS translating the resulting change value must give the same result as evaluating the CTS derivative. But to formalize this, we must specify which environments are used for evaluation, and this requires two technicalities.

Assume derivative <sup>D</sup><sup>ι</sup> (t) evaluates correctly in some environment *dE*. Evaluating CTS derivative <sup>D</sup>t(t) requires cache values from the base computation, but they are not in <sup>T</sup> (*dE*)! Therefore, we must introduce a judgment to complete a CTS-translated environment with the appropriate caches (see Appendix F).

Next, consider evaluating a change term of the form *dM* = C[*dM* ], where C is a standard single-hole change-term context—that is, for λ*ICAL*, a sequence of **let**-bindings. When evaluating *dM* , we eventually evaluate *dM* in a change environment *dF* updated by C: the change environment *dF* contains both the updated caches coming from the evaluation of C and the caches coming from the base computation (which will be updated by the evaluation of *dM* ). Again, a new judgment, given in Appendix F, is required to model this process.

With these two judgments, the second key Lemma stating the commutation between evaluation of derivatives and evaluation of CTS derivatives can be stated. We give here an informal version of this Lemma, the actual formal version can be found in Appendix F.

#### Lemma 3.6 (Commutation for derivatives evaluation)

*If the evaluation of* <sup>D</sup><sup>ι</sup> (t) *leads to an environment dE*<sup>0</sup> *when it reaches the differentiated context* <sup>D</sup><sup>ι</sup> (C) *where* t = C[t ]*, and if the CTS conversion of* t *under this environment completed with base (resp. changed) caches evaluates into a base value* <sup>T</sup> (*v*) *(resp. a changed value* <sup>T</sup> (*v* )*) and a base cache value* V<sup>c</sup> *(resp. an updated cache value* V <sup>c</sup> *), then under an environment containing the caches already updated by the evaluation of* <sup>D</sup><sup>ι</sup> (C) *and the base caches to be updated, the CTS derivative of* t *evaluates to* <sup>T</sup> (*dv*) *such that <sup>v</sup>* <sup>⊕</sup> *dv* <sup>=</sup> *<sup>v</sup> and to the updated cache* V c *.*

Finally, we can state soundness of CTS differentiation. This theorem says that CTS derivatives not only produce valid changes for incrementalization but that they also correctly consume and update caches.

#### Theorem 3.7 (Soundness of CTS differentiation)

*If the following hypotheses hold:*


*then there exists dv,* Vc*,* V <sup>c</sup> *and* F<sup>0</sup> *such that:*

*1.* <sup>T</sup> (E) T (t) ⇓ (<sup>T</sup> (*v*), Vc) *2.* <sup>T</sup> (E ) T (t) ⇓ (<sup>T</sup> (*v* ), V c ) *3.* <sup>C</sup>(t) <sup>∼</sup> <sup>V</sup><sup>c</sup> <sup>→</sup> <sup>F</sup><sup>0</sup> *4.* <sup>T</sup> (*dE*); <sup>F</sup><sup>0</sup> Dt(t) ⇓ (<sup>T</sup> (*dv*), V c ) *5. <sup>v</sup>* <sup>⊕</sup> *dv* <sup>=</sup> *<sup>v</sup>*

#### 4 Incrementalization Case Studies

In this section, we investigate two questions: whether our transformations can target a typed language like Haskell and whether automatically transformed programs can perform well. We implement by hand primitives on sequences, bags and maps in Haskell. The input terms in all case studies are written in a deep embedding of λ*AL* into Haskell. The transformations generate Haskell code that uses our primitives and their derivatives.

We run the transformations on three case studies: a computation of the average value of a bag of integers, a nested loop over two sequences and a more involved example inspired by Koch et al. [17]'s work on incrementalizing database queries. For each case study, we make sure that results are consistent between from scratch recomputation and incremental evaluation; we measure the execution time for from scratch recomputation and incremental computation as well as the space consumption of caches. We obtain efficient incremental programs, that is ones for which incremental computation is faster than from scratch recomputation. The measurements indicate that we do get the expected asymptotic improvement in time of incremental computation over from scratch recomputation by a linear factor while the caches grows in a similar linear factor.

Our benchmarks were compiled by GHC 8.2.2 and run on a 2.20 GHz hexa core Intel(R) Xeon(R) CPU E5-2420 v2 with 32 GB of RAM running Ubuntu 14.04. We use the *criterion* [21] benchmarking library.

#### 4.1 Averaging Bags of Integers

Section 2.1 motivates our transformation with a running example of computing the average over a bag of integers. We represent bags as maps from elements to (possibly negative) multiplicities. Earlier work [7,17] represents bag changes as bags of removed and added elements. We use a different representation of bag changes that takes advantage of the changes to elements and provide primitives on bags and their derivatives. The CTS variant of *map*, that we call *mapC* , takes a function *fC* in CTS and a bag *as* and produces a bag and a cache. The cache stores for each invocation of *fC* , and therefore for each distinct element in *as*, the result of *fC* of type *b* and the cache of type *c*.

Inspired by Rossberg et al. [23], all higher-order functions (and typically, also their caches) are parametric over cache types of their function arguments. Here, functions *mapC* and *dmapC* and cache type *MapC* are parametric over the cache type *c* of *fC* and *dfC* .

```
map :: (a → b) → Bag a → Bag b
data MapC a b c = MapC (Map a (b, c))
mapC :: (a → (b, c)) → Bag a → (Bag b, MapC a b c)
dmapC :: (a → (b, c)) → (a → Δa → c → (Δb, c)) → Bag a → Δ(Bag a) →
  MapC a b c → (Δ(Bag b), MapC a b c)
```
We wrote the *length* and *sum* functions used in our benchmarks in terms of primitives *map* and *foldGroup* and had their CTS function and CTS derivative generated automatically.

We evaluate whether we can produce an updated result with *daverageC* shown in Sect. 2.1 faster than by from scratch recomputation with *average*. We expect the speedup of *daverageC* to depend on the size of the input bag *n*. We fix an input bag of size *n* as the bag containing the numbers from 1 to *n*. We define a change that inserts the integer 1 into the bag. To measure execution time of from scratch recomputation, we apply *average* to the input bag updated with the change. To measure execution time of the CTS function *averageC* , we apply *averageC* to the input bag updated with the change. To measure execution time of the CTS derivative *daverageC* , we apply *daverageC* to the input bag, the change and the cache produced by *averageC* when applied to the input bag. In all three cases we ensure that all results and caches are fully forced so as to not hide any computational cost behind laziness.

Fig. 8. Benchmark results for *average* and *totalPrice*

The plot in Fig. 8a shows execution time versus the size *n* of the base input. To produce the base result and cache, the CTS transformed function *averageC* takes longer than the original *average* function takes to produce just the result. Producing the updated result incrementally is slower than from scratch recomputation for small input sizes, but because of the difference in time complexity becomes faster as the input size grows. The size of the cache grows linearly with the size of the input, which is not optimal for this example. We leave optimizing the space usage of examples like this to future work.

#### 4.2 Nested Loops over Two Sequences

Next, we consider CTS differentiation on a higher-order example. To incrementalize this example efficiently, we have to enable detecting nil function changes at runtime by representing function changes as closures that can be inspected by incremental programs. Our example here is the Cartesian product of two sequences computed in terms of functions *map* and *concat*.

*cartesianProduct* :: *Sequence a* → *Sequence b* → *Sequence* (*a*, *b*) *cartesianProduct xs ys* = *concatMap* (λ*x* → *map* (λ*y* → (*x* , *y*)) *ys*) *xs concatMap* :: (*a* → *Sequence b*) → *Sequence a* → *Sequence b concatMap f xs* = *concat* (*map f xs*)

We implemented incremental sequences and related primitives following Firsov and Jeltsch [9]: our change operations and first-order operations (such as *concat*) reuse their implementation. On the other hand, we must extend higherorder operations such as *map* to handle non-nil function changes and caching. A correct and efficient CTS derivative *dmapC* has to work differently depending on whether the given function change is nil or not: For a non-nil function change it has to go over the input sequence; for a nil function change it has to avoid that.

Cai et al. [7] use static analysis to conservatively approximate nil function changes as changes to terms that are closed in the original program. But in this example the function argument (λ*<sup>y</sup>* <sup>→</sup> (*<sup>x</sup>* , *<sup>y</sup>*)) to *map* in *cartesianProduct* is not a closed term. It is, however, crucial for the asymptotic improvement that we avoid looping over the inner sequence when the change to the free variable *x* in the change environment is **0***<sup>x</sup>* .

To enable runtime nil change detection, we apply closure conversion to the original program and explicitly construct closures and changes to closures. While the only valid change for closed functions is their nil change, for closures we can have non-nil function changes. A function change *df* , represented as a closure change, is nil exactly when all changes it closes over are nil.

We represent closed functions and closures as variants of the same type. Correspondingly we represent changes to a closed function and changes to a closure as variants of the same type of function changes. We inspect this representation at runtime to find out if a function change is a nil change.

**data** *Fun a b c* **where** *Closed* :: (*a* → (*b*, *c*)) → *Fun a b c Closure* :: (*e* → *a* → (*b*, *c*)) → *e* → *Fun a b c* **data** Δ(*Fun a b c*) **where** *DClosed* :: (*a* → Δ*a* → *c* → (Δ*b*, *c*)) → Δ(*Fun a b c*) *DClosure* :: (*e* → Δ*e* → *a* → Δ*a* → *c* → (Δ*b*, *c*)) → *e* → Δ*e* → Δ(*Fun a b c*)

We use the same benchmark setup as in the benchmark for the average computation on bags. The input of size *n* is a pair of sequences (*xs*, *ys*). Each sequence

initially contains the integers from 1 to *n*. Updating the result in reaction to a change *dxs* to the outer sequence *xs* takes less time than updating the result in reaction to a change *dys* to the inner sequence *ys*. While a change to the outer sequence *xs* results in an easily located change in the output sequence, a change for the inner sequence *ys* results in a change that needs a lot more calculation to find the elements it affects. We benchmark changes to the outer sequence *xs* and the inner sequence *ys* separately where the change to one sequence is the insertion of a single integer 1 at position 1 and the change for the other one is the nil change.

Fig. 9. Benchmark results for *cartesianProduct*

Figure 9 shows execution time versus input size. In this example again preparing the cache takes longer than from scratch recomputation alone. The speedup of incremental computation over from scratch recomputation increases with the size of the base input sequences because of the difference in time complexity. Eventually we do get speedups for both kinds of changes (to the inner and to the outer sequence), but for changes to the outer sequence we get a speedup earlier, at a smaller input size. The size of the cache grows super linearly in this example.

#### 4.3 Indexed Joins of Two Bags

Our goal is to show that we can compose primitive functions into larger and more complex programs and apply CTS differentiation to get a fast incremental program. We use an example inspired from the DBToaster literature [17]. In this example we have a bag of orders and a bag of line items. An order is a pair of an order key and an exchange rate. A line item is a pair of an order key and a price. We build an index mapping each order key to the sum of all exchange rates of the orders with this key and an index from order key to the sum of the prices of all line items with this key. We then merge the two maps by key, multiplying corresponding sums of exchange rates and sums of prices. We compute the total price of the orders and line items as the sum of those products.

**type** *Order* = (Z, Z) **type** *LineItem* = (Z, Z) *totalPrice* :: *Bag Order* <sup>→</sup> *Bag LineItem* <sup>→</sup> <sup>Z</sup> *totalPrice orders lineItems* = **let** *orderIndex* = *groupBy fst orders orderSumIndex* = *Map*.*map* (*Bag*.*foldMapGroup snd*) *orderIndex lineItemIndex* = *groupBy fst lineItems lineItemSumIndex* = *Map*.*map* (*Bag*.*foldMapGroup snd*) *lineItemIndex merged* = *Map*.*merge orderSumIndex lineItemSumIndex total* = *Map*.*foldMapGroup multiply merged* **in** *total groupBy* :: (*a* → *k*) → *Bag a* → *Map k* (*Bag a*) *groupBy keyOf bag* = *Bag*.*foldMapGroup* (λ*a* → *Map*.*singleton* (*keyOf a*) (*Bag*.*singleton a*)) *bag*

Unlike DBToaster, we assume our program is already transformed to explicitly use indexes, as above. Because our indexes are maps, we implemented a change structure, CTS primitives and their CTS derivatives for maps.

To build the indexes, we use a *groupBy* function built from primitive functions *foldMapGroup* on bags and *singleton* for bags and maps respectively. The CTS function *groupByC* and the CTS derivative *dgroupByC* are automatically generated. While computing the indexes with *groupBy* is self-maintainable, merging them is not. We need to cache and incrementally update the intermediately created indexes to avoid recomputing them.

We evaluate the performance in the same way we did in the other case studies. The input of size *n* is a pair of bags where both contain the pairs (*i*, *i*) for *i* between 1 and *n*. The change is an insertion of the order (1, 1) into the orders bag. For sufficiently large inputs, our CTS derivative of the original program produces updated results much faster than from scratch recomputation, again because of a difference in time complexity as indicated by Fig. 8b. The size of the cache grows linearly with the size of the input in this example. This is unavoidable, because we need to keep the indexes.

#### 4.4 Limitations and Future Work

*Typing of CTS programs.* Functions of the same type *<sup>f</sup>*1, *<sup>f</sup>*<sup>2</sup> :: *<sup>A</sup>* <sup>→</sup> *<sup>B</sup>* can be transformed to CTS functions *<sup>f</sup>*<sup>1</sup> :: *<sup>A</sup>* <sup>→</sup> (*B*, *<sup>C</sup>*1), *<sup>f</sup>*<sup>2</sup> :: *<sup>A</sup>* <sup>→</sup> (*B*, *<sup>C</sup>*2) with different cache types *C*1, *C*2, since cache types depend on the implementation. This heterogeneous typing of translated functions poses difficult typing issues, e.g. what is the translated type of a *list* (*<sup>A</sup>* <sup>→</sup> *<sup>B</sup>*)? We cannot hide cache types behind existential quantifiers because they would be too abstract for derivatives, which only work on very specific cache types. We can fix this problem with some runtime overhead by using a single type *Cache*, defined as a tagged union of all cache types or, maybe with more sophisticated type systems—like first-class translucent sums, open existentials or Typed Adapton's refinement types [12]—that could be able to correctly track down cache types properly.

In any case, we believe that these machineries would add a lot of complexity without helping much with the proof of correctness. Indeed, the simulation relation is more handy here because it maintains a global invariant about the whole evaluations (typically the consistency of cache types between base computations and derivatives), not many local invariants about values as types would.

One might wonder why caches could not be totally hidden from the programmer by embedding them in the derivatives themselves; or in other words, why we did not simply translate functions of type <sup>A</sup> <sup>→</sup> <sup>B</sup> into functions of type <sup>A</sup> <sup>→</sup> <sup>B</sup> <sup>×</sup> (ΔA <sup>→</sup> ΔB). We tried this as well; but unlike automatic differentiation, we must remember and update caches according to input changes (especially when receiving a sequence of such changes as in Sect. 2.1). Returning the updated cache to the caller works; we tried closing over the caches in the derivative, but this ultimately fails (because we could receive function changes to the original function, but those would need access to such caches).

*Comprehensive performance evaluation.* This paper focuses on theory and we leave benchmarking in comparison to other implementations of incremental computation to future work. The examples in our case study were rather simple (except perhaps for the indexed join). Nevertheless, the results were encouraging and we expect them to carry over to more complex examples, but not to all programs. A comparison to other work would also include a comparison of space usage for auxiliary data structure, in our case the caches.

*Cache pruning via absence analysis.* To reduce memory usage and runtime overhead, it should be possible to automatically remove from transformed programs any caches or cache fragments that are not used (directly or indirectly) to compute outputs. Liu [19] performs this transformation on CTS programs by using *absence analysis*, which was later extended to higher-order languages by Sergey et al. [25]. In lazy languages, absence analysis removes thunks that are not needed to compute the output. We conjecture that the analysis could remove unused caches or inputs, if it is extended to *not* treat caches as part of the output.

*Unary vs n-ary abstraction.* We only show our transformation correct for unary functions and tuples. But many languages provide efficient support for applying curried functions such as *div* :: <sup>Z</sup> <sup>→</sup> <sup>Z</sup> <sup>→</sup> <sup>Z</sup>. Naively transforming such a curried function to CTS would produce a function *divC* of type <sup>Z</sup> <sup>→</sup> (<sup>Z</sup> <sup>→</sup> (Z, *DivC* <sup>2</sup>)), *DivC* <sup>1</sup> with *DivC* <sup>1</sup> = (), which adds excessive overhead. In Sect. 2 and our evaluation we use curried functions and never need to use this naive encoding, but only because we always invoke functions of known arity.

# 5 Related Work

*Cache-transfer-style.* Liu [19]'s work has been the fundamental inspiration to this work, but her approach has no correctness proof and is restricted to a first-order untyped language. Moreover, while the idea of cache-transfer-style is similar, it's unclear if her approach to incrementalization would extend to higher-order programs. Firsov and Jeltsch [9] also approach incrementalization by code transformation, but their approach does not deal with changes to functions. Instead of transforming functions written in terms of primitives, they provide combinators to write CTS functions and derivatives together. On the other hand, they extend their approach to support mutable caches, while restricting to immutable ones as we do might lead to a logarithmic slowdown.

*Finite differencing.* Incremental computation on collections or databases by finite differencing has a long tradition [6,22]. The most recent and impressive line of work is the one on DBToaster [16,17], which is a highly efficient approach to incrementalize queries over bags by combining iterated finite differencing with other program transformations. They show asymptotic speedups both in theory and through experimental evaluations. Changes are only allowed for datatypes that form groups (such as bags or certain maps), but not for instance for lists or sets. Similar ideas were recently extended to higher-order and nested computation [18], though only for datatypes that can be turned into groups. Koch et al. [18] emphasize that iterated differentiation is necessary to obtain efficient derivatives; however, ANF conversion and remembering intermediate results appear to address the same problem, similarly to the field of automatic differentiation [27].

*Logical relations.* To study correctness of incremental programs we use a logical relation among base values *v*1, updated values *v*<sup>2</sup> and changes *dv*. To define a logical relation for an untyped λ-calculus we use a *step-indexed* logical relation, following Ahmed [4], Appel and McAllester [5]; in particular, our definitions are closest to the ones by Acar et al. [3], who also work with an untyped language, big-step semantics and (a different form of) incremental computation. However, they do not consider first-class changes. Technically, we use environments rather than substitution, and index our big-step semantics differently.

*Dynamic incrementalization.* The approaches to incremental computation with the widest applicability are in the family of self-adjusting computation [1,2], including its descendant Adapton [14]. These approaches incrementalize programs by combining memoization and change propagation: after creating a trace of base computations, updated inputs are compared with old ones in O(1) to find corresponding outputs, which are updated to account for input modifications. Compared to self-adjusting computation, Adapton only updates results that are demanded. As usual, incrementalization is not efficient on arbitrary programs, but only on programs designed so that input changes produce small changes to the computation trace; refinement type systems have been designed to assist in this task [8,12]. To identify matching inputs, Nominal Adapton [13] replaces input comparisons by pointer equality with first-class labels, enabling more reuse.

# 6 Conclusion

We have presented a program transformation which turns a functional program into its derivative and efficiently shares redundant computations between them thanks to a statically computed cache.

Although our first practical case studies show promising results, this paper focused on putting CTS differentiation on solid theoretical ground. For the moment, we only have scratched the surface of the incrementalization opportunities opened by CTS primitives and their CTS derivatives: in our opinion, exploring the design space for cache data structures will lead to interesting new results in purely functional incremental programming.

Acknowledgments. We are grateful to anonymous reviewers: they made important suggestions to help us improve our technical presentation. We also thank Cai Yufei, Tillmann Rendel, Lourdes del Carmen González Huesca, Klaus Ostermann, Sebastian Erdweg for helpful discussions on this project. This work was partially supported by DFG project 282458149 and by SNF grant No. 200021\_166154.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Concurrency and Distribution

# **Asynchronous Timed Session Types From Duality to Time-Sensitive Processes**

Laura Bocchi1(B) , Maurizio Murgia1,4, Vasco Thudichum Vasconcelos2, and Nobuko Yoshida<sup>3</sup>

 University of Kent, Canterbury, UK l.bocchi@kent.ac.uk <sup>2</sup> LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal Imperial College London, London, UK University of Cagliari, Cagliari, Italy

**Abstract.** We present a behavioural typing system for a higher-order timed calculus using session types to model timed protocols. Behavioural typing ensures that processes in the calculus perform actions in the timewindows prescribed by their protocols. We introduce duality and subtyping for timed asynchronous session types. Our notion of duality allows typing a larger class of processes with respect to previous proposals. Subtyping is critical for the precision of our typing system, especially in the presence of session delegation. The composition of dual (timed asynchronous) types enjoys progress when using an urgent receive semantics, in which receive actions are executed as soon as the expected message is available. Our calculus increases the modelling power of extant calculi on timed sessions, adding a blocking receive primitive with timeout and a primitive that consumes an arbitrary amount of time in a given range.

**Keywords:** Session types · Timers · Duality · <sup>π</sup>-calculus

# **1 Introduction**

Time is at the basis of many real-life protocols. These include common clientserver interactions as for example, *"An SMTP server SHOULD have a timeout of at least 5 minutes while it is awaiting the next command from the sender"* [22]. By protocol, we intend application-level specifications of interaction patterns (via message passing) among distributed applications. An extensive literature offers theories and tools for formal analysis of timed protocols, modelled for instance as timed automata [3,26,34] or Message Sequence Charts [2]. These works allow to reason on the properties of *protocols*, defined as formal models. Recent work,

This work has been partially supported by EPSRC EP/N035372/1, EP/K011715/1, EP/N027833/1, EP/K034413/1, EP/L00058X/1, EP/N028201/1, Aut. Reg. of Sardinia projects *Sardcoin* and *Smart collaborative engineering*, FCT through project Confident PTDC/EEI-CTP/4503/2014 and the LASIGE Research Unit UID/CEC/00408/2019. We thank Julien Lange for his advise and comments.

based on session types, focus on the relationship between time-sensitive protocols, modelled as timed extensions of session types, and their implementations abstracted as *processes* in some timed calculus. The relationship between protocols and processes is given in terms of static behavioural typing [12,15] or run-time monitoring [6,7,30] of processes against types. Existing work on timed session types [7,12,15,30] is based on simple abstractions for processes which do not capture time sensitive primitives such as blocking (as well as non-blocking) receive primitives with timeout and time consuming actions with variable, yet bound, duration. This paper provides a theory of asynchronous timed session types for a calculus that features these two primitives. We focus on the asynchronous scenario, as modern distributed systems (e.g., web) are often based on asynchronous communications via FIFO channels [4,33]. The link between protocols and processes is given in terms of static behavioural typing, checking for punctuality of interactions with respect to protocols prescriptions. Unlike previous work on asynchronous timed session types [12], our type system can check processes against protocols that are *not wait-free*. In wait-free protocols, the time-windows for corresponding send and receive actions have an empty intersection. We illustrate wait-freedom using a protocol modelled as two timed session types, each owning a set of clocks (with no shared clocks between types).

$$\mathbf{S\_{C}} = \text{!Commentd}(x < 5, \{x\}).\\S\_{\mathbf{C}}' \qquad S\_{\mathbf{S}} = \text{?Commentd}(\mathbf{y} < \mathbf{5}, \{y\}).\\S\_{\mathbf{S}}' \qquad (1)$$

The protocol in (1) involves a client S<sup>C</sup> with a clock x, and a server S<sup>S</sup> with a clock y (with both x and y initially set to 0). Following the protocol, the client must send a message of type Command within 5 min, reset x, and continue as S C. Dually, the server must be ready to receive a command with a timeout of 5 min, reset y, and continue as S <sup>S</sup>. The model in (1) is *not wait-free*: the intersection of the time-windows for the send and receive actions is non-empty (the timewindows actually coincide). The protocol in (2), where the server must wait until after the client's deadline to read the message, is wait-free.

$$\text{?} \text{comm}(x < 5, \{x\}). S\_{\mathbb{C}}^{\prime\prime} \qquad \text{?} \text{comm}(\mathbf{y} = \mathbf{5}, \{y\}). S\_{\mathbb{S}}^{\prime\prime} \tag{2}$$

Patterns like the one in (1) are common (e.g., the SMPT fragment mentioned at the beginning of this introduction) but, unfortunately, they are *not wait-free*, hence ruled out in previous work [12]. Arguably, (2) is an unpractical wait-free variant of (1): the client must always wait for at least 5 min to have the message read, no matter how early this message was sent. The definition of protocols for our typing system (which allows for *not wait-free* protocols) is based on a notion of *asynchronous timed duality*, and on a subtyping relation that provides accuracy of typing, especially in the case of channel passing.

*Asynchronous timed duality.* In the untimed scenario, each session type has one unique *dual* that is obtained by changing the polarities of the actions (send vs. receive, and selection vs. branching). For example, the dual of a session type S

that sends an integer and then receives a string is a session type S that receives an integer and then sends a string.

$$S = ! \mathbf{1} \mathbf{n} \mathbf{t} . ? \mathbf{S} \mathbf{t} \mathbf{n} \mathbf{n} \qquad \overline{S} = ? \mathbf{1} \mathbf{n} . ! \mathbf{S} \mathbf{t} \mathbf{n} \mathbf{n} \mathbf{n}$$

Duality characterises well-behaved systems: the behaviour described by the composition of dual types has no communication mismatches (e.g., unexpected messages, or messages with values of unexpected types) nor deadlocks. In the timed scenario, this is no longer true. Consider a timed extension of session types (using the model of time in timed automata [3]), and of (untimed) duality so that dual send/receive actions have equivalent time constraints and resets. The example below shows a timed type S with its dual S, where S owns clock x, and S owns clock y (with x and y initially set to 0):

$$S = \mathsf{Tnt}(x \leqslant 1, x). \mathsf{Tstring}(x \leqslant 2) \qquad \overline{S} = \mathsf{Tnt}(y \leqslant 1, y). \mathsf{Tstring}(y \leqslant 2)$$

Here S sends an integer at any time satisfying x - 1, and then resets x. After that, S receives a string at any time satisfying x - 2. The timed dual of S is obtained by keeping the same time constraints (and renaming the clock to make it clear that clocks are not shared). To illustrate our point, we use the semantics from timed session types [12], borrowed from Communicating Timed automata [23]. This semantics is *separated*, in the sense that only time actions may 'take time', while all other actions (e.g., communications) are instantaneous.<sup>1</sup> The aforementioned semantics allows for the following execution of <sup>S</sup> <sup>|</sup> <sup>S</sup>:

$$\begin{array}{ll} S \mid \overline{S} \stackrel{0.4}{\longrightarrow} \stackrel{\mathsf{Intr}}{\longrightarrow} & \mathsf{?Strring}(x \leqslant 2) \mid \overline{S} & \text{(clockwise values: } x = 0, \, y = 0.4) \\\ \stackrel{0.6}{\longrightarrow} \stackrel{\mathsf{Intr}}{\longrightarrow} & \mathsf{?Strring}(x \leqslant 2) \mid \mathsf{String}(x \leqslant 2) \text{ (clockwise values: } x = 0.6, \, y = 0) \\\ \stackrel{2}{\longrightarrow} & \stackrel{\mathsf{|\mathsf{Strring}|}}{\longrightarrow} & \text{(clockwise values: } x = 2.6, \, y = 2) \end{array}$$

where: (i) the system makes a time step of 0.4, then S sends the integer and resets x, yielding a state where x = 0 and y = 0.4; (ii) the system makes a time step of 0.6, then S receives the integer and resets y, yielding a state where x = 0.6 and y = 0; (iii) the system makes a time step of 2, then the continuation of S sends the string, when y = 2 and x = 2.6. In (iii), the string was sent too late: constraint x - 2 of the receiving endpoint is now unsatisfiable. The system cannot do any further legal step, and is stuck.

*Urgent receive semantics.* The example above shows that, in the timed asynchronous scenario, the straightforward extension of duality to the timed scenario does not necessarily characterise well-behaved communications. We argue, however, that the execution of <sup>S</sup> <sup>|</sup> <sup>S</sup>, in particular the time reduction with label 0.6, does not reflect the semantics of most common receive primitives. In fact, most mainstream programming languages implement *urgent receive* semantics

<sup>1</sup> Separated semantics can describe situations where actions have an associated duration.

for receive actions. We call a semantics *urgent receive* when receive actions are executed as soon as the expected message is available, given that the guard of that action is satisfied. Conversely, *non-urgent receive* semantics allows receive actions to fire at any time satisfying the time constraint, as long as the message is in the queue. The aforementioned reduction with label 0.6 is permitted by non-urgent receive semantics such as the one in [23], since it defers the reception of the integer despite the integer being ready for reception and the guard (y - 2) being satisfied, but not by urgent receive semantics. Urgent receive semantics allows, instead, the following execution for <sup>S</sup> <sup>|</sup> <sup>S</sup>:

$$\begin{array}{ll} S \mid \overline{S} \stackrel{0.4 \stackrel{! \text{int}}{\longrightarrow}}{\longrightarrow} & ?\textbf{String}(x \leqslant 2) \mid \overline{S} & \text{(clockwise: } x = 0, \, y = 0.4) \\ \stackrel{? \text{int}}{\longrightarrow} & ?\textbf{String}(x \leqslant 2) \mid !\textbf{String}(x \leqslant 2) \text{ (clockwise: } x = 0, \, y = 0) \\ \stackrel{2}{\longrightarrow} & ?\textbf{String}(x \leqslant 2) & \text{(clockwise: } x = 2, \, y = 2) \end{array}$$

If S sends the integer when x = 0.4, then S must receive the integer immediately, when y = 0.4. At this point, both endpoints reset their respective clocks, and the communication will continue in sync. Urgent receive primitives are common; some examples are the non-blocking WaitFreeReadQueue.read() and blocking WaitFreeReadQueue.waitForData() of Real-Time Java [13], and the receive primitives in Erlang and Golang. *Urgent receive semantics make interactions "more synchronous" but still as asynchronous as real-life programs*.

*A calculus for timed asynchronous processes.* Our calculus features two timesensitive primitives. The first is a parametric receive operation a<sup>n</sup>(b). P on a channel <sup>a</sup>, with a timeout <sup>n</sup> that can be <sup>∞</sup> or any number in **<sup>R</sup>**-0. The parametric receive captures a range of receive primitives: non-blocking (n = 0), blocking without timeout (<sup>n</sup> <sup>=</sup> <sup>∞</sup>), or blocking with timeout (<sup>n</sup> <sup>∈</sup> **<sup>R</sup>**><sup>0</sup>). The second primitive is a time-consuming action, delay(δ). P, where δ is a constraint expressing the time-window for the time consumed by that action. Delay processes model primitives like Thread.sleep(n) in real-time Java [13] or, more generally, any time-consuming action, with δ being an estimation of the delay of computation.

Processes in our calculus abstract implementations of protocols given as pairs of dual types. Consider the processes below.

$$P\_C = \mathsf{de1ay}(x<3). \overline{a}\,\mathsf{HEL0.} P\_C' \quad P\_S = \mathsf{de1ay}(x=5). \, a^0(b). P\_S' \quad Q\_S = a^5(b). Q\_S'$$

Processes abiding the protocols in (2) could be as follows: P<sup>C</sup> for the client S<sup>C</sup> , and P<sup>S</sup> for the server SS. The client process P<sup>C</sup> performs a time consuming action for up to 3 min, then sends command HELO to the server, and continues as P C . The server process P<sup>S</sup> sleeps for exactly 5 min, receives the message immediately (without blocking), and continues as P <sup>S</sup>. A process for the protocol in (1) could, instead be the parallel composition of P<sup>C</sup> , again for the client, and Q<sup>S</sup> for the server. Process Q<sup>S</sup> uses a blocking primitive with timeout; the server now blocks on the receive action with a timeout of 5 min, and continues as Q <sup>S</sup> as soon as a message is received. The blocking receive primitive with timeout is crucial to model processes typed against protocols one can express with asynchronous timed duality, in particular those that are not wait-free.

*A type system for timed asynchronous processes.* The relationship between types and processes in our calculus is given as a typing system. Well-typed processes are ensured to communicate at the times prescribed by their types. This result is given via Subject Reduction (Theorem 4), establishing that well-typedness is preserved by reduction. In our timed scenario, Subject Reduction holds under *receive liveness*, an assumption on the interaction structure of processes. This assumption is orthogonal to time. To characterise the interaction structures of a timed process we erase timing information from that processes (*time erasure*). Receive liveness requires that, whenever a time-erased processes is waiting for a message, the corresponding message is eventually provided by the rest of the system. While receive liveness is not needed for Subject Reduction in untimed systems [21], it is required for timed processes. This reflects the natural intuition that if an untimed-process violates progress, then its timed counterpart may miss deadlines. Notably, we can rely on existing behavioural checking techniques from the untimed setting to ensure receive liveness [17].

Receive liveness is not required for Subject Reduction in a related work on asynchronous timed session types [12]. The dissimilarity in the assumptions is only apparent; it derives from differences in the two semantics for processes. When our processes cannot proceed correctly (e.g., in case of missed deadlines) they reduce to a failed state, whereas the processes in [12] become stuck (indicating violation of progress).

*Synopsis.* In Sect. 2 we introduce the syntax and the formation rules for asynchronous timed session types. In Sect. 3, we give a modular Labelled Transition System (LTS) for types in isolation (Sect. 3.1) and for compositions of types (Sect. 3.3). The subtyping relation is given in Sect. 3.2 and motivated in Example 8, after introducing the typing rules. We introduce timed asynchronous duality and its properties in Sect. 4. Remarkably, the composition of dual timed asynchronous types enjoys progress when using an urgent receive semantics (Theorem 1). Section 5 presents a calculus for timed processes and Sect. 6 introduces its typing system. The properties of our typing system—Subject Reduction (Theorem 4) and Time Safety (Theorem 5)—are introduced in Sect. 7. Conclusions and related works are in Sect. 8. Proofs and additional material can be found in the online report [11].

# **2 Asynchronous Timed Session Types**

*Clocks and predicates.* We use the model of time from timed automata [3]. Let X be a finite set of clocks, let x1,...,x<sup>n</sup> range over clocks, and let each clock take values in **R**-<sup>0</sup>. Let t1,...,t<sup>n</sup> range over non-negative real numbers and <sup>n</sup>1,...,n<sup>n</sup> range over non-negative rationals. The set <sup>G</sup>(X) of predicates over <sup>X</sup> is defined by the following grammar.

$$\delta ::= \mathtt{true} \mid x > n \mid x = n \mid x - y > n \mid x - y = n \mid \neg \delta \mid \delta\_1 \land \delta\_2 \qquad \text{where} \quad x, y \in \mathbb{X}\_+$$

We derive false, <, , in the standard way. Predicates in the form <sup>x</sup>−y>n and <sup>x</sup> <sup>−</sup> <sup>y</sup> <sup>=</sup> <sup>n</sup> are called *diagonal* predicates; in these cases we assume <sup>x</sup> <sup>=</sup> <sup>y</sup>. Notation *cn*(δ) stands for the set of clocks in δ.

*Clock valuation and resets.* A clock valuation <sup>ν</sup> : <sup>X</sup> → **<sup>R</sup>**-<sup>0</sup> returns the time of the clocks in <sup>X</sup>. We write <sup>ν</sup> <sup>+</sup> <sup>t</sup> for the valuation mapping all <sup>x</sup> <sup>∈</sup> <sup>X</sup> to <sup>ν</sup>(x) + <sup>t</sup>, ν<sup>0</sup> for the initial valuation (mapping all clocks to 0), and, more generally, ν<sup>t</sup> for the valuation mapping all clocks to <sup>t</sup>. Let <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup> denote that <sup>δ</sup> is satisfied by <sup>ν</sup>. A reset predicate <sup>λ</sup> over <sup>X</sup> is a subset of <sup>X</sup>. When <sup>λ</sup> is <sup>H</sup> then no reset occurs, otherwise the assignment for each <sup>x</sup> <sup>∈</sup> <sup>λ</sup> is set to 0. We write <sup>ν</sup> [<sup>λ</sup> → 0] for the clock assignment that is like ν everywhere except that its assigns 0 to all clocks in λ.

*Types.* Timed session types, hereafter just types, have the following syntax:

$$\begin{array}{lclcl} T & ::= (\delta, S) \mid \mathsf{Mat} \mid \mathsf{Bool1} \mid \dots \\ \mid S & ::= !T(\delta, \lambda).S \mid \, ?T(\delta, \lambda).S \mid \, \oplus \{ \mathsf{l}\_{i}(\delta\_{i}, \lambda\_{i}) : S\_{i} \}\_{i \in I} \mid \, \& \{ \mathsf{l}\_{i}(\delta\_{i}, \lambda\_{i}) : S\_{i} \}\_{i \in I} \mid \, \\ & & \mu\alpha.S \mid \, \alpha \mid \, \mathsf{end} \end{array}$$

Sorts T include base types (Nat, Bool, etc.), and sessions (δ, S). Messages of type (δ, S) allow a participant involved in a session to delegate the remaining behaviour S; upon delegation the sender will no longer participate in the delegated session and receiver will execute the protocol described by S under any clock assignment satisfying δ. We denote the set of types with T.

Type !T(δ, λ).S models a *send action* of a payload with sort T. The sending action is allowed at any time that satisfies the guard δ. The clocks in λ are reset upon sending. Type ?T(δ, λ).S models the dual *receive action* of a payload with sort T. The receiving types require the endpoint to be ready to receive the message in the precise time window specified by the guard.

Type ⊕{li(δi, λi) : <sup>S</sup>i}<sup>i</sup>∈<sup>I</sup> is a *select action*: the party chooses a branch <sup>i</sup> <sup>∈</sup> <sup>I</sup>, where I is a finite set of indices, selects the label li, and continues as prescribed by Si. Each branch is annotated with a guard δ and reset λ. A branch j can be selected at any time allowed by <sup>δ</sup><sup>j</sup> . The dual type is &{li(δi, λi) : <sup>S</sup>i}<sup>i</sup>∈<sup>I</sup> for *branching actions*. Each branch is annotated with a guard and a reset. The endpoint must be ready to receive the label for j at any time allowed by δ<sup>j</sup> (or until another branch is selected).

Recursive type μα.S associates a *type variable* α to a recursion body S. We assume that type variables are guarded in the standard way (i.e., they only occur under actions or branches). We let A denote the set of type variables.

Type end models successful termination.

#### **2.1 Type Formation**

The grammar for types allow to generate types that are not implementable in practice, as the one shown in Example 1.

*Example 1 (Junk-types).* Consider S in (3) under initial clock valuation ν0.

$$S = ?T(x < 5, \mathcal{Q}). ! T(x < 2, \mathcal{Q}). \mathbf{end} \tag{3}$$

The specified endpoint must be ready to receive a message in the time-window between 0 and 5 time units, as we evaluate x < 5 in ν0. Assume that this receive action happens when x = 3, yielding a new state in which: (i) the clock valuation maps x to 3, and (ii) the endpoint must perform a send action while x < 2. Evidently, (ii) is no longer possible in the new clock valuation, as the x < 2 is now unsatisfiable. We could amend (3) in several ways: (a) by resetting x after the receive action; (b) by restricting the guard of the receive action (e.g., x < 2 instead of x < 5); or (c) by relaxing the guard of the send action. All these amendments would, however, yield a different type.

In the remainder of this section we introduce formation rules to rule out junk types as the one in Example 1 and characterise types that are well-formed. Intuitively, well-formed types allow, at any point, to perform some action in the present time or at some point in the future, unless the type is end.

*Judgments.* The formation rules for types are defined on judgments of the form

$$A; \delta \vdash S$$

where A is an environment assigning type variables to guards, and δ is a guard in <sup>G</sup>(X). <sup>A</sup> is used as an invariant to form recursive types. Guard <sup>δ</sup> collects the possible 'pasts' from which the next action in S could be executed (unless S is end). We use notation <sup>↓</sup> <sup>δ</sup> (the past of <sup>δ</sup>) for a guard <sup>δ</sup> such that <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup> if and only if <sup>∃</sup><sup>t</sup> : <sup>ν</sup> <sup>+</sup> <sup>t</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>. For example, <sup>↓</sup> (1 x - 2) = x - 2 and <sup>↓</sup> (<sup>x</sup> 3) = true. Similarly, we use the notation <sup>δ</sup>[<sup>λ</sup> → 0] to denote a guard in which all clocks in λ are reset. For example, (x - <sup>3</sup> <sup>∧</sup> <sup>y</sup> - 2)[<sup>x</sup> → 0] = (<sup>x</sup> = 0 <sup>∧</sup> <sup>y</sup> - 2). We use the notation <sup>δ</sup><sup>1</sup> <sup>Ď</sup> <sup>δ</sup><sup>2</sup> whenever, for all <sup>ν</sup>, <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup><sup>1</sup> <sup>=</sup><sup>⇒</sup> <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>2. The past and reset of a guard can be inferred algorithmically, and Ď is decidable [8].

<sup>A</sup>; true \$ end [end] ∈ {!, ?} <sup>A</sup>; <sup>γ</sup> \$ S δ[<sup>λ</sup> → 0] <sup>Ď</sup> γ T base type <sup>A</sup>; <sup>↓</sup> <sup>δ</sup> \$ <sup>T</sup>(δ, λ).S [interact] ∈ {!, ?} <sup>A</sup>; <sup>γ</sup> \$ S δ[<sup>λ</sup> → 0] <sup>Ď</sup> γ T = (δ , S ) <sup>H</sup>; <sup>γ</sup> \$ <sup>S</sup> <sup>δ</sup> <sup>Ď</sup> <sup>γ</sup> <sup>A</sup>; <sup>↓</sup> <sup>δ</sup> \$ <sup>T</sup>(δ, λ).S [delegate] ∈ {⊕, &} ∀<sup>i</sup> <sup>∈</sup> I A; <sup>γ</sup><sup>i</sup> \$ <sup>S</sup><sup>i</sup> <sup>δ</sup>i[λ<sup>i</sup> → 0] <sup>Ď</sup> <sup>γ</sup><sup>i</sup> <sup>A</sup>; <sup>↓</sup> - i∈I <sup>δ</sup><sup>i</sup> \$ {li(δi, λi) : <sup>S</sup>i}<sup>i</sup>∈<sup>I</sup> [choice] A, α : <sup>δ</sup>; <sup>δ</sup> \$ <sup>S</sup> <sup>A</sup>; <sup>δ</sup> \$ μα.S [rec] A, α : <sup>δ</sup>; <sup>δ</sup> \$ <sup>α</sup> [var]

Rule [end] states that the terminated type is well-formed against any A. The guard of the judgement is true since end is a final state (as end has no continuation, morally, the constraint of its continuation is always satisfiable). Rule [interact] ensures that the past of the current action δ entails the past of the subsequent action γ (considering resets if necessary): this rules out types in which the subsequent action can only be performed in the past. Rules [end] and [interact] are illustrated by the three examples below.

*Example 2.* The judgment below shows a type being *discarded* after an application of rule [interact] :

$$\text{1. } x \leqslant 3 \vdash\_{\vdash} \text{? } \mathsf{nat}(1 \leqslant x \leqslant 3, \mathcal{Q}). \mathsf{llat}(1 \leqslant x \leqslant 2, \mathcal{Q}). \mathsf{end} \tag{4}$$

The premise of [interact] would be <sup>δ</sup> Ď <sup>↓</sup> <sup>γ</sup>, which does not hold for <sup>δ</sup> = 1 - x - 3 and <sup>↓</sup> <sup>γ</sup> <sup>=</sup> <sup>x</sup> - 2. This means that guard (1 x - <sup>3</sup>, <sup>H</sup>) of the first action may lead to a state in which guard 1 x - 2 for the subsequent action is unsatisfiable. If we amend the type in (4) by adding a reset in the first action, we obtain a well-formed type. We show its formation below, where for simplicity we omit obvious preconditions like Nat base type, etc.

[end] <sup>H</sup>; true \$ end <sup>1</sup> x - <sup>2</sup> <sup>Ď</sup> true [interact] <sup>H</sup>; <sup>x</sup> - 2 \$ !Nat(1 x - <sup>2</sup>, <sup>H</sup>).end <sup>x</sup> = 0 <sup>Ď</sup> <sup>x</sup> - <sup>2</sup> [interact] <sup>H</sup>; <sup>x</sup> - 3 \$ ?Nat(1 x - <sup>3</sup>, {x}).!Nat(1 x -<sup>2</sup>, <sup>H</sup>).end

Rule [delegate] behaves as [interact] , with two additional premises on the delegated session: (1) S needs to be well-formed, and (2) the guard of the next action in S needs to be satisfiable with respect to δ . Guard δ is used to ensure a correspondence between the state of the delegating endpoint and that of the receiving endpoint. Rule [choice] is similar to [interact] but requires that there is at least one viable branch (this is accomplished by considering the weaker past ↓ <sup>i</sup>∈<sup>I</sup> <sup>δ</sup>i) and checking each branch for formation. Rules [rec] and [var] are for recursive types and variables, respectively. In [rec] the guard δ can be easily computed by taking the past of the next action of the in S (or the disjunction if S is a branching or selection). An algorithm for deciding type formation can be found in [11].

**Definition 1 (Well-formed types).** *We say that* S *is well-formed against clock valuation* <sup>ν</sup> *if* <sup>H</sup>; <sup>δ</sup> \$ <sup>S</sup> *and* <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>*, for some guard* <sup>δ</sup>*. We say that* <sup>S</sup> *is well-formed if it is well formed against* ν0*.*

We will tacitly assume types are well-formed, unless otherwise specified. The intuition of well-formedness is that if <sup>A</sup>; <sup>δ</sup> \$ <sup>S</sup> then <sup>S</sup> can be run (using the types semantics given in Sect. 3) under any clock valuation <sup>ν</sup> such that <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>. In the sequel, we take (well-formed) types equi-recursively [31].

#### **3 Asynchronous Session Types Semantics and Subtyping**

We give a compositional semantics of types. First, we focus on types in isolation from their environment and from their queues, which we call *simple type configurations*. Next we define subtyping for simple type configurations. Finally, we consider systems (i.e., composition of types communicating via queues).

$$\begin{array}{llll}\hline \nu \mid \mathsf{=}\delta & \mathsf{ }\nu \mid \mathsf{=}\delta\\ (\nu, !T(\delta,\lambda)).S & \xrightarrow{\scriptstyle \mathsf{T}\prime} (\nu \mid \lambda \mapsto \mathsf{0}], S) & \xrightarrow{\scriptstyle \mathsf{T}\prime\prime} (\nu, \mathsf{?}T(\delta,\lambda).S) & \xrightarrow{\scriptstyle \mathsf{T}\prime\prime} (\nu\mid \lambda \mapsto \mathsf{0}], S) \\ \hline & \nu \mid \mathsf{=}\delta\_{j} & \mathsf{j} \in I \\ \hline & (\nu, \mathsf{e}\mathbb{B}[\!\cdot\vert\_{\!i}(\delta\_{i},\lambda\_{i}):S\_{i}\rangle\_{\mathsf{i}\in I}) \xrightarrow{\scriptstyle \mathsf{T}\prime\prime} (\nu\left[\lambda\_{\mathcal{I}} \mapsto \mathsf{0}\right], S\_{j}) \\ \cline{2-4} & \nu \mid \mathsf{=}\delta\_{j} & \mathsf{j} \in I \\ \hline & (\nu, \mathsf{k}\{\!\cdot\vert\_{\!i}(\delta,\lambda\_{i}):S\_{i}\}\_{\mathsf{i}\in I}) \xrightarrow{\scriptstyle \mathsf{T}\prime\prime} (\nu\left[\lambda\_{\mathcal{I}} \mapsto \mathsf{0}\right], S\_{j}) \\ \hline & (\nu, S[\!\cdot\vert \mathsf{st}.S/\mathsf{t}/\mathsf{t}]) \xrightarrow{\scriptstyle \ell} (\nu', \mathsf{ }S') & \mbox{[rec]} \\ \hline & (\nu, \mathsf{\mu}\mathsf{t}, S) \xrightarrow{\scriptstyle \ell} (\nu', S') \\ \hline \end{array} \begin{array}{c} \hline \mathsf{[\ \cdot\geqslant\mathsf{)}\prime\prime\text{]} \mathrel{\textstyle{\frac{\cdot}{\cdot}\prime\prime}} \quad \mathsf{[\ \cdot\geqslant\mathsf{t}\prime\prime\prime\prime\right] \mathrel{\textstyle{\$$

**Fig. 1.** LTS for simple type configurations

#### **3.1 Types in Isolation**

The behaviour of *simple type configurations* is described by the Labelled Transition System (LTS) on pairs (ν, S) over (V×S), where clock valuation <sup>ν</sup> gives the values of clocks in a specific state. The LTS is defined over the following labels

$$\ell \implies \vdash m \mid ?m \mid \; t \mid \tau \qquad \qquad m ::= d \mid \mid \; \mathbf{l}$$

Label !m denotes an output action of message m and ?m an input action of m. A message m can be a sort T (that can be either a higher order message (δ, S) or base type), or a branching label l. The LTS for single types is defined as the least relation satisfying the rules in Fig. 1. Rules [snd], [rcv], [sel], and [bra] can only happen if the constraint of the next action is satisfied in the current clock valuation. Rule [rec] unfolds recursive types, and [time] always lets time elapse.

Let **s**, **s** , **<sup>s</sup>**<sup>i</sup> (<sup>i</sup> <sup>∈</sup> <sup>N</sup>) range over simple type configurations (ν, S). We write **s** - −→ when there exists **s** such that **s** - −→ **s** , and write **s** t - −→ for **<sup>s</sup>** <sup>t</sup> −→ - −→.

#### **3.2 Asynchronous Timed Subtyping**

We define subtyping as a partial relation on simple type configurations. As in other subtyping relations for session types we consider send and receive actions dually [14,16,19]. Our subtyping relation is covariant on output actions and contra-variant on input actions, similarly to that of [14]. In this way, our subtyping S < : S captures the intuition that a process well-typed against S can be safely substituted with a process well-typed against S . Definition 2, introduces a notation that is useful in the rest of this section.

**Definition 2 (Future enabled send/receive).** *Action is future enabled in* **<sup>s</sup>** *if* <sup>∃</sup><sup>t</sup> : **<sup>s</sup>** t - −→*. We write* **s** ! ⇒ *(resp.* **s** ? <sup>⇒</sup>*) if there exists a sending action* !<sup>m</sup> *(resp. a receiving action* ?m*) that is future enabled in* **s***.*

As common in session types, the communication structure does not allow for mixed choices: the grammar of types enforces choices to be either all input (branching actions), or output (selection actions). From this fact it follows that, given **s**, reductions **s** ! ⇒ and **s** ? ⇒ cannot hold simultaneously.

**Definition 3 (Timed Type Simulation).** *Fix* **s**<sup>1</sup> = (ν1, S1) *and* **s**<sup>2</sup> = (ν2, S2)*. A relation* R ∈ (<sup>V</sup> × S)<sup>2</sup> *is a* timed type simulation *if* (**s**1, **<sup>s</sup>**2) ∈ R *implies the following conditions:*

*1.* S<sup>1</sup> = end *implies* S<sup>2</sup> = end *2.* **s**<sup>1</sup> <sup>t</sup> !m<sup>1</sup> −→ **<sup>s</sup>** <sup>1</sup> *implies* ∃**s** <sup>2</sup>, m<sup>2</sup> : **s**<sup>2</sup> <sup>t</sup> !m<sup>2</sup> −→ **<sup>s</sup>** <sup>2</sup>*,* (m2, m1) ∈ S,(**s** <sup>1</sup>, **s** <sup>2</sup>) ∈ R *3.* **s**<sup>2</sup> <sup>t</sup> ?m<sup>2</sup> −→ **<sup>s</sup>** <sup>2</sup> *implies* ∃**s** <sup>1</sup>, m<sup>1</sup> : **s**<sup>1</sup> <sup>t</sup> ?m<sup>1</sup> −→ **<sup>s</sup>** <sup>1</sup>*,* (m1, m2) ∈ S*,* (**s** <sup>1</sup>, **s** <sup>2</sup>) ∈ R *4.* **s**<sup>1</sup> ? ⇒ *implies* **s**<sup>2</sup> ? ⇒ *and* **s**<sup>2</sup> ! ⇒ *implies* **s**<sup>1</sup> ! ⇒

*where* <sup>S</sup> *is the following extension of* <sup>R</sup> *to messages: (1)* (T,T ) ∈ S *if* <sup>T</sup> *and* T *are base types, and* T *is a subtype of* T *by sorts subtyping, e.g.,* (int, nat) ∈ S*; (2)* (l, l) ∈ S*; (3)* ((δ1, S1),(δ2, S2)) ∈ S*, if* <sup>∀</sup>ν<sup>1</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup><sup>1</sup> <sup>∃</sup>ν<sup>2</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup><sup>2</sup> : ((ν1, S1),(ν2, S2)) ∈ R *and* <sup>∀</sup>ν<sup>2</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup><sup>2</sup> <sup>∃</sup>ν<sup>1</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup><sup>1</sup> : ((ν1, S1),(ν2, S2)) ∈ R*.*

Intuitively, if (**s**1, **<sup>s</sup>**2) ∈ R then any environment that can safely interact with **s**2, can do so with **s**1. We write that **s**<sup>2</sup> simulates **s**<sup>1</sup> whenever **s**<sup>1</sup> and **s**<sup>2</sup> are in a timed type simulation. Below, **s**<sup>2</sup> simulates **s**1:

$$\mathbf{s}\_1 = (\nu\_0, ! \mathbf{nat}(x < 5, \mathcal{Q}). \mathbf{end}) \text{ } \mathbf{s}\_2 = (\nu\_0, ! \mathbf{int}(x \lessapprox 10, \mathcal{Q}). \mathbf{end})$$

Conversely, **s**<sup>1</sup> does not simulate **s**<sup>2</sup> because of condition (2). Precisely, **s**<sup>2</sup> can make a transition **s**<sup>2</sup> 10 !int −→ that cannot be matched by **<sup>s</sup>**<sup>1</sup> for two reasons: guard x < 5 is no longer satisfiable when <sup>x</sup> = 10, and (nat, int) ∈ S since int is not a subtype of nat. For receive actions, instead, we could substitute **s** with **s** if **s** had at least the receiving capabilities of **s**. Condition (4) in Definition 3 rules out relations that include, e.g., ((ν, ?T(true, <sup>H</sup>).end),(ν, !T(true, <sup>H</sup>).end)).

*Live simple type configurations.* In our subtyping definition we are interested in simple type configurations that are not stuck. Consider the example below:

$$(\nu, ! \mathbf{Int}(x \lessapprox 10, \mathcal{QR}) . \mathbf{end}) \tag{5}$$

The simple type configuration in (5) would not be stuck if ν = ν0, but would be stuck for any ν = ν [<sup>x</sup> → 10]. Definition <sup>4</sup> gives a formal definition of simple type configurations that are not stuck, i.e., that are *live*.

**Definition 4 (Live simple type configuration).** *A simple configuration* (ν, S) *is said* live *if:*

<sup>S</sup> <sup>=</sup> end *or* <sup>∃</sup>t, : (ν, S) <sup>t</sup> ◦<sup>m</sup> −→ (◦∈{!, ?})

Observe that for all well-formed S, (ν0, S) is live.

*Subtyping for simple type configurations.* We can now define subtyping for simple type configurations and state its decidability.

**Definition 5 (Subtyping). s**<sup>1</sup> *is a subtype of* **s**2*, written* **s**<sup>1</sup> < : **s**2*, if there exists a timed type simulation* R on live simple type configurations *such that* (**s**1, **<sup>s</sup>**2) ∈ R*. We write* <sup>S</sup><sup>1</sup> <sup>&</sup>lt; : <sup>S</sup><sup>2</sup> *when* (ν0, S1) <sup>&</sup>lt; : (ν0, S2)*. Abusing the notation, we write* m < : <sup>m</sup> *iff there exists* <sup>S</sup> *such that* (m, m ) ∈ S*.*

Subtyping has been shown to be decidable in the untimed setting [19] and in the timed first order setting [6]. In [6], decidability is shown through a reduction to model checking of timed automata networks. The result in [6] can be extended to higher-order messages using the techniques in [3], based on finite representations (called regions) of possibly infinite sets of clock valuations.

**Proposition 1 (Decidability of subtyping).** *Checking if* (δ1, S1) < : (δ2, S2) *is decidable.*

#### **3.3 Types with Queues, and Their Composition**

As interactions are asynchronous, the behaviour of types must capture the states in which messages are in transit. To do this, we extend simple type configurations with queues. A *configuration* **S** is a triple (ν, S, M) where ν is clock valuation, S is a type and M a FIFO unbounded queue of the following form:

$$\mathbb{M} ::= \bigotimes \mid \mid m; \mathbb{M}$$

M contains the messages sent by the co-party of S and not yet received by S. We write <sup>M</sup> for <sup>M</sup>; <sup>H</sup>, and call (ν, S, <sup>M</sup>) *initial* if <sup>ν</sup> <sup>=</sup> <sup>ν</sup><sup>0</sup> and <sup>M</sup> <sup>=</sup> <sup>H</sup>.

*Composing types.* Configurations are composed into *systems*. We denote **S** | **S** as the parallel composition of the two configurations **S** and **S** .

The labelled transition rules for systems are given in Fig. 2. Rule (snd) is for send actions. A send action can occur only if the time constraint of S is satisfied (by the premise, which uses either rule [snd] or [sel] in Fig. 1). Rule (que) models actions on queues. A queue is always ready to receive any message m. Rule (rcv) is for receive actions, where a message is read from the queue. A receiving action can only occur if the time constraint of S is satisfied (by the premise, which uses either rule [rcv] or [bra] in Fig. 1). The message is removed from the head of the queue of the receiving configuration. The third clause in the premise uses the notion of subtyping (Definition 3) for basic sorts, labels, and higher order messages. Rule (crcv) is the action of a configuration pulling a message of its queue. Rule (com) is for communication between a sending configuration and a buffer. Rule (ctime) lets time elapse in the same way for all configurations in a system. Rule (time) models time passing for single configurations. Time passing is subject to two constrains, expressed by the second and third conditions in the premise. Condition (ν, S) ! <sup>⇒</sup> requires the time action <sup>t</sup> to preserve the satisfiability of some send action. For example, in configuration

**Fig. 2.** LTS for systems. We omit the symmetric rules of (crcv), and (csnd).

(ν0, !T(x < <sup>2</sup>, <sup>H</sup>).S, <sup>H</sup>), a transition with label 2 would *not* preserve any send action (hence would not be allowed), while a transition with label 1.8 would be allowed by condition (ν, S) ! <sup>⇒</sup>. Condition <sup>∀</sup><sup>t</sup> < t : (ν + t , S, M) <sup>τ</sup> Û in the premise of rule (time) checks that there is no ready message to be received in the queue. This is to model urgency: when a configuration is in a receiving state and a message is in the queue then the receiving action must happen without delay. For example, (ν0, ?T(x < <sup>2</sup>, <sup>H</sup>).S, <sup>H</sup>) can make a transition with label 1, but (ν0, ?T(x < <sup>2</sup>, <sup>H</sup>).S, m) cannot make any time transition. Below we show two examples of system executions. Example 3 illustrates a good communication, thanks to urgency. We also illustrate in Example 4 that without an urgent semantics the system in Example 3 gets stuck.

*Example 3 (A good communication).* Consider the following types:

$$S\_1 = !T(x \lessapprox 1, x). \newline ?T(x \lessapprox 2). \newline \mathsf{end} \qquad S\_2 = ?T(y \lessapprox 1, y). \newline !T(y \lessapprox 2). \newline \qquad \mathsf{end}$$

System (ν[<sup>x</sup> → 0], S1, <sup>H</sup>) <sup>|</sup> (ν[<sup>x</sup> → 0], S2, <sup>H</sup>) can make a time step with label 0.5 by (ctime), yielding the system in (6)

$$\left(\nu[x \mapsto 0.5], S\_1, \mathcal{Q}\right) \mid \left(\nu[x \mapsto 0.5], S\_2, \mathcal{Q}\right) \tag{6}$$

The system in (6) can move by a τ step thanks to (com): the left-hand side configuration makes a step with label !T by (snd) while the right-hand side configuration makes a step ?T by (que), yielding system (7) below.

$$\left(\left(\nu[x \mapsto 0], ?T(x \lessgtr 2). \mathsf{end}, \mathcal{Q}\right)\right)\left(\nu[y \mapsto 0.5], S\_2, T\right) \tag{7}$$

The right-hand side configuration in the system in (7) must *urgently* receive message T due to the third clause in the premise of rule (time). Hence, the only possible step forward for (7) is by (crcv) yielding the system in (8).

$$\left(\nu[x \mapsto 0], ?T(x \lessapprox 2). \mathsf{end}, \mathcal{Q}\right) \mid \left(\nu[y \mapsto 0], !T(y \lessapprox 2). \mathsf{end}, \mathcal{Q}\right) \tag{8}$$

*Example 4 (In absence of urgency).* Without urgency, the system in (7) from Example 3 may get stuck. Assume the third clause of rule (time) was removed: this would allow (7) to make a time step with label 0.5, followed by a step by (rcv) yielding the system in (9), where clock y is reset after the receive action.

$$\left(\nu[x \mapsto 0.5], ?T(x \lessapprox 2). \mathsf{end}, \mathcal{Q}\right) \mid \left(\nu[y \mapsto 0], !T(y \lessapprox 2). \mathsf{end}, \mathcal{Q}\right) \tag{9}$$

followed by a τ step by (com) reaching the following state:

$$(\nu[x \mapsto 2.5], ?T(x \lessapprox 2). \mathsf{end}, T) \mid (\nu[y \mapsto 0], \mathsf{end}, \mathcal{Q}) \tag{10}$$

The message in the queue in (10) will never be received as the guard x - 2 is not satisfiable now or at any point in the future. This system is stuck. Instead, thanks to urgency, the clocks of the configurations of system (8) have been 'synchronised' after the receive action, preventing the system from getting stuck.

# **4 Timed Asynchronous Duality**

We introduce a timed extension of duality. As in untimed duality, we let each send/select action be complemented by a corresponding receive/branching action. Moreover, we require time constraints and resets to match.

**Definition 6 (Timed duality).** *The dual type* S *of* S *is defined as follows:*

$$\begin{array}{llll} \overline{\operatorname{!I}T(\delta,\lambda)\!\!\_{\operatorname{\overline{S}}}} = \operatorname{?I}(\delta,\lambda)\!\!\_{\operatorname{\overline{S}}} & \overline{\operatorname{?I}T(\delta,\lambda)\!\!\_{\operatorname{\overline{S}}}} = \operatorname{!I}(\delta,\lambda)\!\!\_{\operatorname{\overline{S}}} & \overline{\mu\alpha.\overline{S}} = \mu\alpha.\overline{S} \\ \overline{\oplus\{\operatorname{\mathbf{l}}\_{i}(\delta\_{i},\lambda\_{i}):S\_{i}\}\_{i\in I}} = \operatorname{\&}\{\operatorname{\mathbf{l}}\_{i}(\delta\_{i},\lambda\_{i}):\overline{S\_{i}}\}\_{i\in I} & \overline{\alpha} = \alpha \\ \overline{\&\{\operatorname{\mathbf{l}}\_{i}(\delta\_{i},\lambda\_{i}):S\_{i}\}\_{i\in I}} = \oplus\{\operatorname{\mathbf{l}}\_{i}(\delta\_{i},\lambda\_{i}):\overline{S\_{i}}\}\_{i\in I} & \overline{\operatorname{\mathbf{end}}} = \mathbf{end} \end{array}$$

Duality with urgent receive semantics enjoys the following properties: systems with dual types fulfil progress (Theorem 1); behaviour (resp. progress) of a system is preserved by the substitution of a type with a subtype (Theorem 2) (resp. Theorem 3). A system enjoys progress if it reaches states that are either final or that allow further communications, possibly after a delay. Recall that we assume types to be well-formed (cf. Definition 1): Theorems 1, 2, and 3 rely on this assumption.

**Definition 7 (Type progress).** *We say that a system* (ν, S, M) *is a* success *if* <sup>S</sup> <sup>=</sup> end *and* <sup>M</sup> <sup>=</sup> <sup>H</sup>*. We say that* **<sup>S</sup>**<sup>1</sup> <sup>|</sup> **<sup>S</sup>**<sup>2</sup> *satisfies* progress *if:*

**S**<sup>1</sup> | **S**<sup>2</sup> −→<sup>∗</sup> **S** <sup>1</sup> | **S** <sup>2</sup> =⇒ **S** <sup>1</sup> *and* **S** <sup>2</sup> *are success or* <sup>∃</sup><sup>t</sup> : **<sup>S</sup>** <sup>1</sup> | **S** 2 t τ −→

**Theorem 1 (Duality progress).** *System* (ν0, S, <sup>H</sup>) <sup>|</sup> (ν0, S, <sup>H</sup>) *enjoys progress.*

We show that subtyping does not introduce new behaviour, via the usual notion of timed simulation [1]. Let **c**, **c**1, **c**<sup>2</sup> range over systems. Fix **c**<sup>1</sup> = (ν<sup>1</sup> <sup>1</sup> , S<sup>1</sup> <sup>1</sup> , M<sup>1</sup> <sup>1</sup>) <sup>|</sup> (ν<sup>1</sup> <sup>2</sup> , S<sup>1</sup> <sup>2</sup> , M<sup>1</sup> <sup>2</sup>), and **c**<sup>2</sup> = (ν<sup>2</sup> <sup>1</sup> , S<sup>2</sup> <sup>1</sup> , M<sup>2</sup> <sup>1</sup>) <sup>|</sup> (ν<sup>2</sup> <sup>2</sup> , S<sup>2</sup> <sup>2</sup> , M<sup>2</sup> <sup>2</sup>). We say that a binary relation over systems preserves end if: S<sup>i</sup> <sup>1</sup> <sup>=</sup> end <sup>∧</sup> <sup>M</sup><sup>i</sup> <sup>1</sup> <sup>=</sup> <sup>H</sup> iff <sup>S</sup><sup>i</sup> <sup>2</sup> <sup>=</sup> end <sup>∧</sup> <sup>M</sup><sup>i</sup> <sup>2</sup> = H for all <sup>i</sup> ∈ {1, <sup>2</sup>}. Write **<sup>c</sup>**<sup>1</sup> **<sup>c</sup>**<sup>2</sup> if (**c**1, **<sup>c</sup>**2) are in a timed simulation that preserves end.

**Theorem 2 (Safe substitution).** *If* <sup>S</sup> <sup>&</sup>lt; : <sup>S</sup>*, then* (ν0, S, <sup>H</sup>) <sup>|</sup> (ν0, S , <sup>H</sup>) (ν0, S, <sup>H</sup>) <sup>|</sup> (ν0, S, <sup>H</sup>)*.*

**Theorem 3 (Progressing substitution).** *If* <sup>S</sup> <sup>&</sup>lt; : <sup>S</sup>*, then* (ν0, S, <sup>H</sup>) <sup>|</sup> (ν0, S , <sup>H</sup>) *satisfies progress.*

# **5 A Calculus for Asynchronous Timed Processes**

We introduce our asynchronous calculus for timed processes. The calculus abstracts implementations that execute one or more sessions. We let P, P , Q, . . . range over processes, <sup>X</sup> range over process variables, and define <sup>n</sup> <sup>∈</sup> <sup>R</sup>-<sup>0</sup> ∪ {∞}. We use the notation *a* for ordered sequences of channels or variables.


a v.P sends a value v on channel a and continues as P. Similarly, a l. P sends a label l on channel a and continue as P. Process if v then P else Q behaves as either <sup>P</sup> or <sup>Q</sup> depending on the boolean value <sup>v</sup>. Process <sup>P</sup> <sup>|</sup> <sup>Q</sup> is for parallel composition of P and Q, and 0 is the idle process. def D in P is the standard recursive process: D is a declaration, and P is a process that may contain recursive calls. In recursive calls <sup>X</sup>*<sup>a</sup>* ; *<sup>a</sup>* the first list of parameters has to be instantiated with values of ground types, while the second with channels. Recursive calls are instantiated with equations X(*a* ; *a*) in D. Process (νab)P is for scope restriction of endpoints a and b. Process ab : h is a queue with name ab (colloquially used to indicate that it contains messages in transit from a to b) and content h. (νab) binds endpoints a and b, and queues ab and ba in P.

There are two kind of time-consuming processes: those performing a timeconsuming action (e.g., method invocation, sleep), and those waiting to receive a message. We model the first kind of processes with delay(δ). P, and the second kind of processes with <sup>a</sup><sup>n</sup>(b). P (receive) and <sup>a</sup><sup>n</sup> <sup>Ź</sup> {l<sup>i</sup> : <sup>P</sup>i}<sup>i</sup>∈<sup>I</sup> (branching). In delay(δ). P, δ is a constraints as those defined for types, but on one single clock x. The name of the clock here is immaterial: clock x is used as a syntactic tool to define intervals for the time-consuming (delay) action. In this sense, assume x is bound in delay(δ). P. Process delay(δ). P consumes any amount of time t such that t is a solution of δ. For example delay(x - 3). P consumes any value between 0 to 3 time units, then behaves as P. Process a<sup>n</sup>(b). P receive a message on channel a, instantiates b and continue as P. Parameter n models different receive primitives: non-blocking (<sup>n</sup> = 0), blocking (<sup>n</sup> <sup>=</sup> <sup>∞</sup>), and blocking with timeout (<sup>n</sup> <sup>∈</sup> <sup>R</sup>-<sup>0</sup>). If <sup>n</sup> <sup>∈</sup> <sup>R</sup>-<sup>0</sup> and no message is in the queue, the process waits <sup>n</sup> time units before moving into a failed state. If <sup>n</sup> is set to <sup>∞</sup> the process models a blocking primitive without timeout. Branching process <sup>a</sup>nŹ{l<sup>i</sup> : <sup>P</sup>i}i∈<sup>I</sup> is similar, but receives a label l<sup>i</sup> and continues as Pi.

Run-time processes are not written by programmers and only appear upon execution. Process failed is the process that has violated a time constraint. We say that P is a *failed state* if it has failed as a syntactic sub-term. Process delay(t). P delays for exactly t time units.

*Well-formed processes.* Sessions are modelled as processes of the following form

$$(\nu ab)(P \mid ab : h \mid ba : h')$$

where P is the process for endpoints a and b, ab is the queue for messages from a to b, and ba is the queues for messages from b to a. A process can have more than one ongoing session. For each, we expect that all necessary queues are present and well-placed. We ensure that queues are well-placed via a well-formedness property for processes (see [11] for an inductive definition). Well-formedness rules out processes of the following form:

$$(\nu ab)\ (a^n(c).\newline (ba:h'\mid P)\mid Q\mid ab:h)\tag{11}$$

The process in (11) in not well-formed since queue ba for communications to endpoint a is not usable as it is in the continuation of the receive action. Well-formedness of processes is necessary to our safety results. We check wellformedness orthogonally to the typing system for the sake of simpler typing rules. While well-formedness ensures the absence of misplaced queues, the presence of an appropriate pair of queues for every session is ensured by the typing rules.

*Session creation.* Usually well-formedness is ensured by construction, as sessions are created by a specific (synchronous) reduction rule [10,21]. This kind of session creation is cumbersome in the timed setting as it allows delays that are not captured by protocols, hence well-typed processes may miss deadlines. Other work on timed session types [12] avoids this problem by requiring that all session creations occur before any delay action. Our calculus allows session to be created at any point, even after delays. In (12) a session with endpoints c and d is created after a send action (assume P includes the queues for this new session).

$$(\nu ab) \left(\overline{a} \, v. \mathbf{de1ay}(x \lessapprox 3). (\nu cd)(P) \mid Q \mid ab:h \mid ba:h' \right) \tag{12}$$

A process like the one in (12) may be thought as a dynamic session creation that happens synchronously (as in [10,21]), but assuming that all participants are ready to engage without delays. Our approach yields a simplification to the calculus (syntax and reduction rules) and, yet, a more general treatment of session initiation than the work in [12].

$$\begin{array}{ccc} P \ \longrightarrow \ P'\\ P' \ \longrightarrow \ P' \end{array} \qquad \begin{array}{ccc} P \ \longrightarrow \ \star \ \star\\ P \ \longrightarrow \ P' \end{array} \tag{\text{Red1/Red2}}$$

$$a^n(c).P \mid ba: v \cdot h \quad \rightharpoonup \quad P[v/c] \mid ba: h \tag{\text{Rcv}}$$

$$a \lnot\text{l}. P \mid ab : h \quad \rightharpoonup \ P \mid ab : h \cdot \text{l} \tag{501}$$

$$a^n \rhd \{ \mathbf{l}\_i : P\_i \}\_{i \in I} \mid ba : \mathbf{1}\_j \cdot h \quad \rightharpoonup \quad P\_j \mid ba : h \qquad (j \in I) \tag{\text{Bra}}$$

$$\frac{\vdash \delta[t/x]}{\mathsf{ed1ay}(\delta).P \quad \rightharpoonup \mathsf{ed1ay}(t).P} \tag{\mathsf{Def}}$$

$$\begin{array}{rclcrcl}\hline P & \multicolumn{1}{c}{P} & \multicolumn{1}{c}{P'} & & & \\ \hline P \mid Q & \multicolumn{1}{c}{P' \mid Q} & & & \end{array} \qquad \begin{array}{rclcrcl}P & \multicolumn{1}{c}{P} & \multicolumn{1}{c}{P'} & & \\ \hline \mathsf{def } D \ \mathsf{in } P & \multicolumn{1}{c}{P} & \multicolumn{1}{c}{D \ \mathsf{in } P'} & & \\ \hline \end{array} \qquad \begin{array}{rclcrcl}\text{[Par/Doc]}\h & & & \multicolumn{1}{c}{P'} & & \\ \hline \end{array}$$

$$\begin{array}{c} \mathsf{def } X(a'; \mathsf{b}') = P' \text{ in } X(v; \mathsf{b}) \mid Q \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{def } X(a'; \mathsf{b}') = P' \text{ in } P'[v, \mathsf{b}/a', \mathsf{b}'] \mid Q \end{array}$$

$$\begin{array}{ccccc} P \equiv P' & P' \quad \rightharpoonup & Q' \quad Q' \equiv Q \\ \hline P \quad \rightharpoonup & Q \end{array} \qquad \begin{array}{ccccc} P \quad \rightharpoonup & P' \\ \hline (\nu ab)P \quad \rightharpoonup & (\nu ab)P' \end{array} \qquad \begin{array}{ccccc} \mathbb{A}\\$ \text{Str/A}\text{cope} \text{e} \end{array}$$

$$\begin{array}{ccccc} P \equiv P' & P' & \leadsto \star & Q' & Q' \equiv Q \\ \hline P & \curvearrowleft & Q & & \\ \end{array} \qquad \begin{array}{ccccc} P & \curvearrowleft & \Phi\_t(P) & & \\ \end{array} \qquad \begin{array}{ccccc} \mathbf{\upleft}\_t(P) & & & \\ \hline \end{array} \begin{array}{ccccc} \mathbf{\upleft}\_t \mathbf{\upleft}\_t \mathbf{\upleft}\_t \mathbf{\upleft}\_t \mathbf{\upright}\_t \end{array} \end{array}$$

**Fig. 3.** Reduction for processes (rule [IfF], symmetric for [IfT] is omitted).

**Fig. 4.** Time-passing function <sup>Φ</sup>*t*(P). Rule for <sup>a</sup>*<sup>t</sup>*- Ź {l*<sup>i</sup>* : P*i*}*<sup>i</sup>*∈*<sup>I</sup>* is omitted for brevity. φ*t*(P) is undefined in the remaining cases.

*Reduction for processes.* Processes are considered modulo structural equivalence, denoted by ≡, and defined by adding the following rule for delays to the standard ones [28]: delay(0). P <sup>≡</sup> <sup>P</sup>. Reduction rules for processes are given in Fig. 3. A reduction step −→ can happen because of either an instantaneous step by [Red1] or time-consuming step ù by [Red2]. Rules [Send], [Rcv], [Sel], and [Bra] are the usual asynchronous communication rules. Rule [Det] models the random occurrence of a precise delay t, with t being a solution of δ. The other untimed rules, [IfT], [Par], [Def], [Rec], [AStr], and [AScope] are standard. Note that rule [Par] does not allow time passing, which is handled by rule [Delay]. Rule [TStr] is the timed version of [AStr]. Rule [Delay] applies a *time-passing* function Φ<sup>t</sup> (defined in Fig. 4) which distributes the delay t across all the parts of a process. Φt(P) is a partial function: it is undefined if P can immediately make an urgent action, such as evaluation of expressions or output actions. If Φt(P) is defined, it returns the process resulting from letting t time units elapse in P. Φt(P) may return a failed state, if delay t makes a deadline in P expire. The definition of <sup>Φ</sup>t(P<sup>1</sup> <sup>|</sup> <sup>P</sup>2) relies on two auxiliary functions: Wait(P) and NEQueue(P) (see [11] for the full definition). Wait(P) returns the set of channels on which P (or some syntactic sub-term of P) is waiting to receive a message/label. NEQueue(P) returns the set of endpoints with a non-empty inbound queue. For example, Wait(a<sup>t</sup> (b). Q) = Wait(a<sup>t</sup> <sup>Ź</sup> {l<sup>i</sup> : <sup>P</sup>i}<sup>i</sup>∈<sup>I</sup> ) = {a} and NEQueue(ba : <sup>h</sup>) = {a} given that <sup>h</sup> <sup>=</sup> <sup>H</sup>. <sup>Φ</sup>t(P<sup>1</sup> <sup>|</sup> <sup>P</sup>2) is defined only if no urgent action could immediately happen in <sup>P</sup><sup>1</sup> <sup>|</sup> <sup>P</sup>2. For example, <sup>Φ</sup>t(P<sup>1</sup> <sup>|</sup> <sup>P</sup>2) is undefined for <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>a</sup><sup>t</sup> (b). Q and P<sup>2</sup> = ba : v.

In the rest of this section we show the reductions of two processes: one with urgent actions (Example 5), and one to a failed state (Example 6). We omit processes that are immaterial for the illustration (e.g., unused queues).

*Example 5 (Urgency and undefined* Φt*).* We show the reduction of process P = (νab)(<sup>a</sup> 'Hi'.Q <sup>|</sup> ab : <sup>H</sup> <sup>|</sup> <sup>b</sup><sup>10</sup>(c). P ) that has an urgent action. Process P can make the following reduction by [Send]:

$$P \quad \rightharpoonup \quad (\nu ab)(Q \mid ab : \text{``Hi'} \mid b^{10}(c). P'),$$

At this point, to apply rule [Delay], say with t = 5, we need to apply the timepassing function as shown below:

$$\Phi\_5( (\nu ab) \langle \mathbb{Z}^\vee \text{Hi'}.Q \mid ab : \text{'Hi'} \mid b^{10}(c).P') ) = (\nu ab) \langle \mathbb{Z}^\vee \text{Hi'}.Q \mid \Phi\_5(ab : \text{'Hi'} \mid b^{10}(c).P') \rangle)$$

which is undefined. <sup>Φ</sup>5(ab : <sup>H</sup> <sup>|</sup> <sup>b</sup><sup>10</sup>(c). P ) is undefined because Wait(b<sup>10</sup>(c). P)<sup>X</sup> NEQueue(ab : 'Hi') = {b} <sup>=</sup> <sup>H</sup>. Since <sup>Φ</sup>5(P ) is undefined. Instead, the message in queue ab can be received by rule [Rcv]:

$$(\nu ab)(Q \mid ab : \text{'H'} \mid b^{10} (c). P') \quad \rightarrow \quad (\nu ab)(Q \mid ab : \mathcal{Q} \mid P[\text{'H'}/c])$$

*Example 6 (An execution with failure).* We show a reduction to a failing state of a process with a non-blocking receive action (expecting a message immediately) composed with another process that sends a message after a delay.

$$\begin{array}{llll} \mathsf{ededay}(x=3).\overline{a}\text{ 'Hi'}.Q \mid ab:\mathcal{J} \mid b^{0}(c).P & \text{apply [Doc]}\\ \hfil\rightarrow & \mathsf{edalay}(3).\overline{a}\text{ 'Hi'}.Q \mid ab:\mathcal{J} \mid b^{0}(c).P = P' & \text{apply [Doc1ay]} \text{ with } t=3\\ \hfil\rightarrow & \Phi\_{3}(P') \end{array}$$

The application of the time-passing function to P yields a failing state (a message is not received in time) as shown below, where the second equality holds since Wait(b0(c). P) <sup>X</sup> NEQueue(ab : <sup>H</sup>) = <sup>H</sup>:

> <sup>Φ</sup>3(delay(3). <sup>a</sup> 'Hi'.Q <sup>|</sup> <sup>b</sup><sup>0</sup>(c). P <sup>|</sup> ab : <sup>H</sup>) = <sup>Φ</sup>3(delay(3). <sup>a</sup> 'Hi'.Q) <sup>|</sup> <sup>Φ</sup>3(b<sup>0</sup>(c). P <sup>|</sup> <sup>Φ</sup>3(ab : <sup>H</sup>)) = delay(0). <sup>a</sup> 'Hi'.Q <sup>|</sup> failed <sup>|</sup> ab : <sup>H</sup>

# **6 Typing for Asynchronous Timed Processes**

We validate programs against specifications using judgements of the form <sup>Γ</sup> \$ <sup>P</sup> <sup>Ź</sup> <sup>Δ</sup>. Environments are defined as follows:

$$\begin{aligned} \Delta &::= \bigotimes \mid \Delta, a: (\nu, S) \mid \Delta, ab: \mathbb{A} &\qquad \Theta ::= \bigotimes \mid \Theta \cup \{\Delta\};\\ \Gamma &::= \bigotimes \mid \Gamma, a: T \mid \Gamma, X: (\mathcal{T}; \Theta) \end{aligned}$$

Environment Δ is a session environment, used to keep track of the ongoing sessions. When Δ(a)=(ν, S) it means that the process being validated is acting as a role in session a specified by S, and ν is the clock valuation describing a (virtual) time in which the next action in S may be executed. We write dom(Δ) for the set of variables and channels in Δ. Environment Γ maps variables a to sorts T and process variables X to pairs (*T* ; Θ), where *T* is a vector of sorts and Θ is a set of session environments. The mapping of process variable is used to type recursive processes: *T* is used to ensure well-typed instantiation of the recursion parameters, and Θ is used to model the set of possible scenarios when a new iteration begins.

*Notation, assumptions, and auxiliary definitions.* We write Δ + t for the session environment obtained by incrementing all clock valuations in the codomain of Δ by t.

**Definition 8.** *We define the disjoint union* <sup>A</sup><sup>B</sup> *of sets of clocks* <sup>A</sup> *and* <sup>B</sup> *as:*

$$A \uplus B = \{ in\_l(x) \mid x \in A \} \cup \{ in\_r(x) \mid x \in B \}$$

*where* in<sup>l</sup> *and* in<sup>r</sup> *are one to one endofunctions on clocks and, for all* <sup>x</sup> <sup>∈</sup> <sup>A</sup> *and* <sup>y</sup> <sup>∈</sup> <sup>B</sup>*,* inl(x) <sup>=</sup> inr(y)*. With an abuse of notation, we define the disjoint union of clock valuations* <sup>ν</sup>1, ν2*, in symbols* <sup>ν</sup><sup>1</sup> <sup>ν</sup>2*, as a clock valuation satisfying:*

$$\nu\_1 \uplus \nu\_2(in\_l(x)) = \nu\_1(x) \qquad \nu\_1 \uplus \nu\_2(in\_r(x)) = \nu\_2(x)$$

*We use the symbol for the iterate disjoint union.*

For a configuration (ν, S) we define val((ν, S)) = ν, and type((ν, S)) = S. We overload function val to session environments Δ as follows:

$$\mathtt{val}(\Delta) = \bigoplus\_{a \in \text{dom}(\Delta)} \mathtt{val}(\Delta(a))$$

We require Θ to satisfy the following three conditions:


$$\{\nu \mid \nu \mid = \delta\} = \bigcup\_{\Delta \in \Theta} \mathbf{val}(\Delta).$$

The last condition ensures that Θ is finitely representable, and is key for decidability of type checking.

*Example 7.* We show some examples of Θ that do or do not satisfy the last requirement above. Let S<sup>1</sup> =!T(x - 2).end and S<sup>2</sup> =!T(y -2).end, and let:

$$\begin{array}{l} \Theta\_{1} = \{\Delta \mid \Delta(a) = (\nu\_{1}, S\_{1}) \land \Delta(b) = (\nu\_{2}, S\_{2}) \land \nu\_{1}(x) \leqslant 2 \land \nu\_{1}(x) = \nu\_{2}(y)\};\\ \Theta\_{2} = \{\Delta \mid \Delta(a) = (\nu\_{1}, S\_{1}) \land \Delta(b) = (\nu\_{2}, S\_{2}) \land \nu\_{1}(x) \leqslant \sqrt{2} \land \nu\_{1}(x) = \nu\_{2}(y)\};\\ \Theta\_{3} = \{\Delta \mid \Delta(a) = (\nu\_{1}, S\_{1}) \land \Delta(b) = (\nu\_{2}, S\_{2}) \land \nu\_{1}(x) + \nu\_{2}(y) = 2\}. \end{array}$$

We have that Θ<sup>1</sup> satisfies condition (3): let δ<sup>1</sup> = x - <sup>2</sup> <sup>∧</sup> <sup>y</sup> <sup>−</sup> <sup>x</sup> = 0. It is easy to see that {<sup>ν</sup> <sup>|</sup> <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>1} <sup>=</sup> <sup>Δ</sup>∈<sup>Θ</sup> val(Δ). For <sup>Θ</sup>2, a candidate proposition would be δ<sup>2</sup> = x - <sup>√</sup><sup>2</sup> <sup>∧</sup> <sup>y</sup> <sup>−</sup> <sup>x</sup> = 0. However, <sup>δ</sup><sup>2</sup> can not be derived with the syntax of propositions, as <sup>√</sup>2 is irrational. Indeed, <sup>Θ</sup><sup>2</sup> does not satisfy the condition. For Θ3, let δ<sup>3</sup> = x + y = 2. Again, δ<sup>3</sup> is not a guard, as additive constraints in the form x + y = n are not allowed. Indeed, also Θ<sup>3</sup> does not satisfy the condition.

In the following, we write *a* : *T* for a<sup>1</sup> : T1,...,a<sup>n</sup> : T<sup>n</sup> when *a* = a1,...,a<sup>n</sup> and *T* = T1,...,T<sup>n</sup> (assuming *a* and *T* have the same number of elements). Similarly for *b* : **(***ν, S***)**. In the typing rules, we use a few auxiliary definitions: Definition 9 (t-reading Δ) checks if any ongoing sessions in a Δ can perform an input action within a given timespan, and Definition 10 (Compatibility of configurations) extends the notion of duality to systems that are not in an initial state.

**Definition 9 (**t**-reading** Δ**).** *Session environment* Δ *is* t*-reading if there exist some* <sup>a</sup> <sup>∈</sup> dom(Δ)*,* <sup>t</sup> < t *and* <sup>m</sup> *such that:* <sup>Δ</sup>(a)=(ν, S) <sup>∧</sup> (<sup>ν</sup> <sup>+</sup> <sup>t</sup> , S) ?<sup>m</sup>−→*.*

Namely, Δ is t-reading if any of the open sessions in the mapping prescribe a read action within the time-frame between ν and ν + t. Definition 9 is used in the typing rules for time-consuming processes – [Vrcv], [Drcv], and [Del*t*] – to 'disallow' derivations when a (urgent) receive may happen.

**Definition 10 (Compatibility of configurations).** *Configuration* (ν1, <sup>S</sup>1, <sup>M</sup>1) *is compatible with* (ν2, S2, <sup>M</sup>2)*, written* (ν1, S1, <sup>M</sup>1)⊥(ν2, S2, <sup>M</sup>2)*, if:*


By condition (3) initial configurations are compatible when they include dual types, i.e., (ν0, S, <sup>H</sup>)⊥(ν0, S, <sup>H</sup>). By condition (2) two configurations may temporarily misalign as execution proceeds: one may have read a message from its queue, while the other has not, as long as the former is ready to receive it immediately. Thanks to the particular shape of type's interactions, initial configurations – of the form (ν0, S, <sup>H</sup>)⊥(ν0, S, <sup>H</sup>) – will only reach systems, say (ν1, S1, <sup>M</sup>1)⊥(ν2, S2, <sup>M</sup>2), in which at least one between <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> is empty. Condition (1) requires compatible configurations to satisfy this basic property.

*Typing rules.* The typing rules are given in Fig. 5. Rule [Vrcv] is for input processes. The first premise consists of two conditions requiring the time-span [ν, ν + n] in which the process can receive the message to *coincide* with δ:


The second premise of [Vrcv] requires the continuation P to be well-typed against the continuation of the type, for all possible session environments where the virtual time is somewhere between [ν, ν +n], where the virtual valuation ν in the mapping of session a is reset according to λ. Rule [Drcv], for processes receiving delegated sessions, is like [Vrcv] except: (a) the continuation P is typed against a session environment *extended with the received session* S , and (b) the clock valuation ν of the receiving session must satisfy δ . Recall that by formation rules (Sect. 2.1) S is well-formed against all ν that satisfy δ .

Rule [Vsend] is for output processes. Send actions are instantaneous, hence the type current ν needs to satisfy δ. As customary, the continuation of the process needs to be well-typed against the continuation of the type (with ν being reset according to λ, and Γ extended with information on the sort of b). [Dsend] for delegation is similar but: (a) the delegated session is removed from the session environment (the process can no longer engage in the delegated session), and (b) valuation ν of the delegated session must satisfy guard δ .

Rule [Delδ] checks that P is well-typed against all possible solutions of δ. Rule [Del*t*] shifts the virtual valuations in the session environment of t. This is as the corresponding rule in [12] but with the addition of the check that Δ is not t-reading, needed because of urgent semantics.

Rule [Res] is for processes with scopes.

<sup>2</sup> While not necessary for our safety results, this constraint simplifies our theory. Timing variations between types and programs are all handled in one place: rule [Subt].

Rule [Rec] is for recursive processes. The rule is as usual [21] except that we use a set of session environments Θ (instead of a single Δ) to capture a set of possible scenarios in which a recursion instance may start, which may have different clock valuations. Rule [Var] is also as expected except for the use of Θ.

Rules [Par] and [Subt] straightforward.

*Example 8 (Typing with subtyping).* Subtyping substantially increases the power of our type system, in particular in the presence of channel passing. Intuitively, without subtyping, the type of any higher-order send action should be an equality constraint (e.g., x = 1) rather than more general timeout (e.g., x < 1). We illustrate our point using P defined below:

$$\begin{aligned} P &= (\nu a\_1 b\_1)(\nu a\_2 b\_2)(P\_1 \mid P\_2 \mid P\_3 \mid Q) & P\_1 &= \mathsf{des1ay}(x \leqslant 1). \overline{a\_1} \, a\_2 \\ P\_2 &= b\_1^1(c). \, c^2(d) & P\_3 &= \mathsf{des1ay}(1 \leqslant x \land x \leqslant 2). \overline{b\_2} \, \mathsf{true} \end{aligned}$$

where Q contains empty queues of the involved endpoints. Intuitively, P proceeds as follows: (1) P<sup>1</sup> sends channel a<sup>2</sup> to P<sup>2</sup> within one time unit, and terminates; (2) P<sup>2</sup> reads the message as soon as it arrives, and listens for a message across the received channel (a2) for two time units; (3) P<sup>3</sup> sends value true through channel b<sup>2</sup> at a time in between 1 and 2, unaware that now she is communicating with P2, and then terminates; (4) P<sup>2</sup> reads the message immediately and terminates. See below for one possible reduction:

$$\begin{array}{c} P \longrightarrow^\* (\nu a\_1 b\_1) (\nu a\_2 b\_2) (\overline{a\_1} a\_2 \mid b\_1^0 (c). \ c^2 (d) \mid \mathsf{ed} \mathtt{ay} (0 \leqslant x \land x \leqslant 1). \overline{b\_2} \,\mathsf{true} \,\mathsf{e}) \mid Q) \\ \longrightarrow^\* (\nu a\_1 b\_1) (\nu a\_2 b\_2) (0 \mid a\_2^2 (d) \mid \mathsf{ed} \mathtt{ay} (0.5). \overline{b\_2} \,\mathsf{true} \,\, \mid Q) \\ \longrightarrow (\nu a\_1 b\_1) (\nu a\_2 b\_2) (0 \mid a\_2^{1.5} (d) \mid \overline{b\_2} \,\mathsf{true} \,\, \mid Q) \\ \longrightarrow^\* (\nu a\_1 b\_1) (\nu a\_2 b\_2) (0 \mid 0 \mid 0 \mid Q) \end{array}$$

Although P executes correctly, the involved processes are well-typed against types that are not dual:

$$\vdash \quad P\_1 \rhd a\_1 : (\nu\_0, S\_1), a\_2 : (\nu\_0, S\_2) \quad \vdash \quad P\_2 \rhd b\_1 : (\nu\_0, S\_1') \quad \vdash \quad P\_3 \rhd b\_2 : (\nu\_0, \overline{S\_2})$$

for S<sup>1</sup> =!(y - 1, S2)(x - 1), S<sup>2</sup> =?Bool(1 <sup>y</sup> <sup>∧</sup> <sup>y</sup> - 2), S <sup>1</sup> =?(y = 0, S <sup>2</sup>)(x - 1). In order to type-check P, we need to apply rule [Res], requiring endpoints of the same session to have dual types. But clearly: S <sup>1</sup> <sup>=</sup> <sup>S</sup>1. Without subtyping, <sup>P</sup> would not be well-typed. By subtyping, however, (y - 1, S2) < : (y = 0, S <sup>2</sup>) with S <sup>2</sup> =?Bool(y - 2).end, and then S <sup>1</sup> < : S <sup>1</sup>. Thanks to the subtyping rule [subt] we can derive \$ <sup>P</sup><sup>2</sup> <sup>Ź</sup> <sup>b</sup><sup>1</sup> : (ν0, <sup>S</sup>1) and, in turn, \$ <sup>P</sup> Ź H.

#### **7 Subject Reduction and Time Safety**

The main properties of our typing system are Subject Reduction and Time Safety. Time Safety ensures that the execution of well-typed processes will only 604 L. Bocchi et al.

$$\begin{array}{c} \forall t: \quad \nu + t \left| \begin{array}{c} \delta \iff t \leqslant n \\ P \nmid t \end{array} \right. \quad \begin{array}{c} \nu + t \left| \begin{array}{c} \delta \iff t \leqslant n \\ P \nmid T \end{array} \right. \end{array} \begin{array}{c} \delta \leqslant t \leqslant n \\ P \nmid T \end{array} \quad \begin{array}{c} \Delta \text{ not } t\text{-reaching} \end{array} \quad \begin{array}{c} \left[ \begin{array}{c} \text{Prov} \\ \end{array} \right. \end{array} \end{array} \quad \begin{array}{c} \left[ \begin{array}{c} \text{Prov} \\ \end{array} \right. \end{array} \end{array}$$

$$\begin{array}{c} \forall t: \quad \nu + t \left| = \delta \iff t \leqslant n \quad T = (\delta', S') \quad \nu' \right| = \delta'\\ \forall t \leqslant n: \quad T \vdash P \lhd \Delta + t, a: (\nu + t \left[ \lambda \mapsto 0 \right], S), b: (\nu', S') \quad \Delta \text{ not } t\text{-reading} \\ \hline \end{array}$$

$$\frac{\begin{array}{c} \Gamma \vdash \ b: T \quad \nu \vdash \delta \quad \Gamma \vdash \ P \vdash \Delta, a: (\nu \left[ \lambda \mapsto 0 \right], \ S \end{array} \bigg| }{\begin{array}{c} \Gamma \vdash \ \overline{a} \, b.P \Rightarrow \Delta, a: (\nu, \, !T(\delta, \lambda).S) \end{array} \bigg| }\end{array} \tag{\text{Vsend}!}$$

$$\frac{T = (\delta', S') \quad \nu' \vdash \delta' \quad \nu \vdash \delta \quad \Gamma \vdash \quad P \vdash \Delta, a : (\nu \left[ \lambda \mapsto 0 \right], \ S)}{\Gamma \vdash \ \overline{a} \, b.P \Rightarrow \Delta, a : (\nu, \, !T(\delta, \lambda) . S), b : (\nu', \, S')} \qquad\qquad \text{[\\$send]}$$

$$\begin{array}{c} \begin{array}{c} \forall t \in \delta: I^{\mathsf{r}} \vdash \mathsf{des1ay}(t). P \mathbin{\rhd} \mathsf{\bot} \\ \hline I^{\mathsf{r}} \vdash \mathsf{des1ay}(\delta). P \mathbin{\rhd} \end{array} \end{array} \begin{array}{c} \begin{array}{c} I^{\mathsf{r}} \vdash \mathsf{P} \mathbin{\rhd} \end{array} \begin{array}{c} \Delta \mathbin{\rhd} \end{array} \begin{array}{c} \begin{array}{c} \Delta \text{ not  $t$ -reading} \\ \hline I^{\mathsf{r}} \vdash \mathsf{des1ay}(t). P \mathbin{\rhd} \end{array} \begin{array}{c} \begin{array}{c} \mathsf{[\mathsf{Del1}\delta/\mathsf{Del1}\} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \mathsf{[\mathsf{Del1}\delta/\mathsf{Del1}\} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \mathsf{[\mathsf{Del1}\delta/\mathsf{Del1}\} \} \end{array} \end{array}$$

$$\frac{\iota\left(\nu\_{1},S\_{1},\mathbb{M}\_{1}\right)\bot\left(\nu\_{2},S\_{2},\mathbb{M}\_{2}\right)\quad\Gamma\vdash\quad P\v\rightsquigarrow\Delta,\,a\mathrel{\mathop{:}}\ \Delta,\,a\mathrel{\mathop{:}}\ \begin{array}{c} \iota\left(\nu\_{1},\,S\_{1}\right),\,b\mathrel{\mathop{:}}\ \begin{array}{c} \iota\left(\nu\_{2},\,S\_{2}\right),\,ba\mathrel{\mathop{:}}\ \mathbb{M}\_{1}\ \;ab\mathrel{\mathop{:}}\ \mathbb{M}\_{2}\end{array}\;\Big|}{\Gamma\vdash\quad\langle\nu ab\rangle P\v\rightsquigarrow\Delta}\end{array}\right)$$

$$\begin{array}{c} \Delta \in \Theta \quad \forall i: \ I \vdash v\_i: T\_i \\ \hline \Gamma, X: T; \Theta \vdash X \langle v: \mathsf{b} \rangle \lnot \Delta \end{array} \quad \begin{array}{c} \Gamma \vdash P \lnot P \mathbin{\mathsf{\hkern-1.1em} \Delta\_1 \quad \Gamma \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta\_2 \quad \mathsf{\hkern-1.1em} \Delta\_1 \quad \Delta\_2 \end{array}} \quad \begin{array}{c} \mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta\_2 \quad \mathsf{\hkern-1.1em} \Delta\_1 \quad \Gamma \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta\_2 \quad \Delta\_1 \end{array}} \quad \begin{array}{c} \mathsf{\hkern-1.1em} \mathsf{\hkern-1.1em} \Delta \vdash P \mathbin{\mathsf{\hkern-1.1em} \Delta\_1 \quad \Delta\_2 \end{array} \quad \begin{array}{c} \mathsf{\hkern-1.1em} \mathsf{\hkern-1.1em} \Delta \vdash P \mathbin{\mathsf{\hkern-1.1em} \Delta\_1 \quad \Delta\_2 \end{array} \quad \begin{array}{c} \mathsf{\hkern-1.1em} \mathsf{\hkern-1.1em} \Delta \vdash P \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta\_1 \quad \Delta\_2 \end{array} \quad \begin{array}{c} \mathsf{\hkern-1.1em} \Delta \vdash P \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash P \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\hkern-1.1em} \Delta \vdash Q \mathbin{\mathsf{\$$

$$\frac{\forall (\nu, S) \in \Theta:\ I, a:T, X:T; \Theta \vdash P \lnot\mathsf{P}: (\nu, S) \quad \Gamma, X:T; \Theta \vdash Q \lnot Q \hspace{0.2ex} \Delta \qquad [\mathsf{Rec}]}{\Gamma \vdash \mathsf{def}\ X(a:,b) = P \text{ in } Q \vDash \Delta} \quad [\mathsf{Rec}] :$$

$$\frac{\Gamma \vdash \ P \not\sim \Delta' \qquad \Delta' <: \Delta}{\Gamma \vdash \ P \not\sim \Delta} \quad \frac{\Gamma \vdash \ P \not\sim \Delta}{\Gamma \vdash \ P \not\sim \Delta, a : (\nu, \mathsf{end})} \qquad\qquad \left[\mathsf{Subt}/\mathsf{Weak}\right])$$

# **Fig. 5.** Selected typing rules for processes

reach *fail-free* states. Recall, P is fail-free when none of its sub-terms is the process failed. Time Safety builds on a condition that is not related with time, but with the structure of the process interactions. If an untimed process gets stuck due to mismatches in its communication structure, a timed process with the same communication structure may move to a failed state. Consider P below:

$$\begin{array}{l} P = (\nu ab)(\nu cd) \, Q \qquad R = ab : \bigotimes \mid ba : \bigotimes \mid cd : \bigotimes \mid dc : \bigotimes \mid dc : \bigotimes \\\ Q = a^5(e) . \overline{d} . \, e . 0 \mid c^5(e) . \overline{b} . \, e . 0 \mid R \end{array} \tag{13}$$

<sup>P</sup> is well-typed: H \$ <sup>P</sup> <sup>Ź</sup> <sup>a</sup> : (ν0, S), b : (ν0, <sup>S</sup>), c : (ν0, S), d : (ν0, <sup>S</sup>) with <sup>S</sup> <sup>=</sup> ?Int(x - <sup>5</sup>, <sup>H</sup>).end. However, <sup>P</sup> can only make time steps, and when, overall, more than 5 time units elapse (e.g., 6 in the reduction below) P reaches a failed state due to a circular dependency between actions of sessions (νab) and (νcd):

$$P \quad \longrightarrow \quad \Phi\_6(Q) = (\nu ab)(\nu cd) \left(\mathbf{f} \mathbf{a} \mathbf{i} \mathbf{1} \mathbf{d} \mid \mathbf{f} \mathbf{a} \mathbf{i} \mathbf{1} \mathbf{d} \mid R\right)$$

Our typing system does not check against such circularities across different interleaved sessions. This is common in work on untimed [21] and timed [12] session types. However, in the untimed scenario, progress for interleaved sessions can be guaranteed by means of additional checks on processes [17]. Time Safety builds on the results in [17] by using an assumption (receive liveness) on the underneath structure of the timed processes. This assumptions is formally captured in Definition 11, which is based on an untimed variant of our calculus.

*The untimed calculus.* We define untimed processes, denoted by Pˆ, as processes obtained from the grammar given for timed processes (Sect. 5) without delays and failed processes. In untimed processes, time annotations of branching/receive processes are immaterial, hence omitted in the rest of the paper.

Given a (timed) process P, one can obtain its untimed counter-part by *erasing* delays and failed processes; we denoted the result of such erasure on P by erase(P). The semantics of untimed processes is defined as the one for timed processes (Sect. 5) except that reduction rules [Delay], [TStr], and [Red2], are removed. Abusing the notation, we write <sup>P</sup><sup>ˆ</sup> −→ <sup>P</sup>ˆ when an untimed process <sup>P</sup><sup>ˆ</sup> moves to a state Pˆ using the semantics for untimed processes. The definitions of Wait(Pˆ) and NEQueue(Pˆ) can be derived from the definitions for timed processes in the straightforward way.

Definition 11 (receive liveness) formalises our assumption on the interaction structures of a process.

**Definition 11 (Receive liveness).** Pˆ *is said to satisfy receive liveness (or is* live*, for short) if, for all* <sup>P</sup>ˆ *such that* <sup>P</sup><sup>ˆ</sup> −→<sup>∗</sup> <sup>P</sup>ˆ *:*

$$
\hat{P}' \equiv (\nu ab)\hat{Q} \land a \in \mathsf{Watt}(\hat{Q}) \implies \exists \hat{Q}' : \hat{Q} \longrightarrow^\* \hat{Q}' \land a \in \mathsf{NEQuue}(\hat{Q}')
$$

In any reachable state Pˆ of a live untimed process Pˆ, if any endpoint a in Pˆ is waiting to receive a message (<sup>a</sup> <sup>∈</sup> Wait(Qˆ)), then the overall process is able to reach a state <sup>Q</sup>ˆ where <sup>a</sup> can perform the receive action (<sup>a</sup> <sup>∈</sup> NEQueue(Qˆ )).

Consider process P in (13). The untimed process erase(P) is not live because Wait(erase(P)) = {a, c} and a, c ∈ NEQueue(erase(P)), since NEQueue(erase(P)) is the empty set. Syntactically, erase(P) is as P, but it does not have the same behaviour. P can only make time steps, reaching a failed process, while erase(P) is stuck, as untimed processes only make communication steps.

*Properties.* Time safety relies on Subject Reduction Theorem 4, which establishes a relation (preserved by reduction) of well-typed processes and their types.

**Theorem 4 (Subject reduction for closed systems).** *Let* erase(P) *be live. If* H \$ <sup>P</sup> Ź H *and* <sup>P</sup> −→ <sup>P</sup> *then* H \$ <sup>P</sup> Ź H*.*

Note that Subject Reduction assumes erase(P) to be live. For instance, the example of P in (13) is well-typed, but erase(P) is not live. The process can reduce to a failed state (as illustrated earlier in this section) that cannot be typed (failed processes are not well-typed). Time Safety establishes that welltyped processes only reduce to fail-free states.

**Theorem 5 (Time safety).** *If* erase(P) *is live,* \$ <sup>P</sup> ŹH *and* <sup>P</sup> −→<sup>∗</sup> <sup>P</sup> *, then* P *is fail-free.*

Typing is decidable if one uses processes annotated with the following information: (1) scope restrictions (νab : S)P are annotated with the type S of the session for endpoint a (the type of b is implicitly assumed to be S and both endpoints are type checked in the initial clock valuation ν0); (2) receive actions an(b : T). P are annotated with the type T of the received message; (3) recursion X(*a* **:** *T* ; *a* **:** *S*, δ) = P are annotated with types for each parameter, and a guard modelling the state of the clocks. We call annotated programs those annotated processes derived without using productions marked as run-time (i.e., failed and delay(t). P), and where n in a<sup>n</sup>(b : T). P ranges over Q-<sup>0</sup> ∪ {∞}.

**Proposition 2.** *Type checking for annotated programs is decidable.*

# **8 Conclusion and Related Work**

We introduced duality and subtyping relations for asynchronous timed session types. Unlike for untimed and timed synchronous [6] dualities, the composition of dual types does not enjoy progress in general. Compositions of asynchronous timed dual types enjoy progress *when using an urgent receive semantics*. We propose a behavioural typing system for a timed calculus that features nonblocking and blocking receive primitives (with and without timeout), and time consuming primitives of arbitrary but constrained delays. The main properties of the typing system are Subject Reduction and Time Safety; both results rely on an assumption (receive liveness) of the underneath interaction structure of processes. In related work on timed session types [12], receive liveness is not required for Subject Reduction; this is because the processes in [12] block (rather than reaching a failed state) whenever they cannot progress correctly, hence e.g., missed deadline are regarded as progress violations. By explicitly capturing failures, our calculus paves the way for future work on combining static checking with run-time instrumentation to prevent or handle failures.

Asynchronous timed session types have been introduced in [12], in a multiparty setting, together with a timed π-calculus, and a type system. The direct extension of session types with time introduces unfeasible executions (i.e., types may get stuck), as we have shown in Example 1. [12] features a notion of feasibility for choreographies, which ensures that types enjoy progress. We ensure progress of types by formation and duality. The semantics of types in [12] is different from ours in that receive actions are not urgent. The work in [12] gives one extra condition on types (wait-freedom), because feasible types may still yield undesirable executions in well-typed processes. Thanks to our duality, subtyping, and calculus (in particular the blocking receive primitive with timeout) this condition is unnecessary in this work. As a result, our typing system allows for types that are *not wait-free*. By dropping wait-freedom, we can type a class of common real-world protocols in which processes may be ready to receive messages even before the final deadline of the corresponding senders. Remarkably, SMTP mentioned in the introduction is *not wait-free*. For some other aspects, our work is less general than the one in [12], as we consider binary sessions rather than multiparty sessions. A theory of timed multiparty asynchronous protocols that encompasses the protocols in [12] and those considered here is an interesting future direction. The work in [6] introduces a theory of synchronous timed session types, based on a decidable notion of compatibility, called *compliance*, that ensures progress of types, and is equivalent to synchronous timed duality and subtyping in a precise sense [6]. Our duality and subtyping are similar to those in [6], but apply to the asynchronous scenario. The work in [15] introduces a typed calculus based on temporal session types. The temporal modalities in [15] can be used as a discrete model of time. Timed session types, thanks to clocks and resets, are able to model complex timed dependencies that temporal session types do not seem able to capture. Other work studies models for asynchronous timed interactions, e.g., Communicating Timed Automata [23] (CTA), timed Message Sequence Charts [2], but not their relationships with processes. The work in [5] introduces a refinement for CTA, and presents a notion of urgency similar to the one used in this paper, preliminary studied also in [29].

Several timed calculi have been introduced outside the context of behavioural types. The work in [32] extends the π- calculus with time primitives inspired in CTA and is closer, in principle, to our types than our processes. Another timed extension of the π-calculus with time-consuming actions has been applied to the analysis the active times of processes [18]. Some works focus on specific aspects of timed behaviour, such as timeouts [9], transactions [24,27], and services [25]. Our calculus does not feature exception handlers, nor timed transactions. Our focus in on detecting time violations via static typing, so that a process only moves to fail-free states.

The calculi in [7,12,15] have been used in combination with session types. The calculus in [12] features a non-blocking receive primitive similar to our a<sup>0</sup>(b). P, but that never fails (i.e., time is not allowed to flow if a process tries to read from an empty buffer—possibly leading to a stuck process rather than a failed state). The calculus in [7] features a blocking receive primitive without timeout, equivalent to our a∞(b). P. The calculus in [15], seems able to encode a non-blocking receive primitive like the one of [12] and a blocking receive primitive without timeout like our a∞(b). P. None of these works features blocking receive primitives with timeouts. Furthermore, existing works feature [7,12] or can encode [15] only precise delays, equivalent to delay(x = n). P. Such punctual predictions are often difficult to achieve. Arbitrary but constrained delays are closer abstractions of time-consuming programming primitives (and possibly, of predictions one can derive by cost analysis, e.g., [20]).

As to applications, timed session types have been used for run-time monitoring [7,30] and static checking [12]. A promising future direction is that of integrating static typing with run-time verification and enforcement, towards a theory of hybrid timed session types. In this context, extending our calculus with exception handlers [9,24,27] could allow an extension of the typing system, that introduces run-time instrumentation to handle unexpected time failures.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Manifest Deadlock-Freedom for Shared Session Types**

Stephanie Balzer1(B) , Bernardo Toninho2(B), and Frank Pfenning<sup>1</sup>

<sup>1</sup> Carnegie Mellon University, Pittsburgh, USA balzers@cs.cmu.edu <sup>2</sup> NOVA LINCS, Universidade Nova de Lisboa, Lisbon, Portugal

btoninho@fct.unl.pt

**Abstract.** Shared session types generalize the Curry-Howard correspondence between intuitionistic linear logic and the session-typed π-calculus with adjoint modalities that mediate between linear and shared session types, giving rise to a programming model where shared channels must be used according to a locking discipline of acquire-release. While this generalization greatly increases the range of programs that can be written, the gain in expressiveness comes at the cost of deadlock-freedom, a property which holds for many linear session type systems. In this paper, we develop a type system for logically-shared sessions in which types capture not only the interactive behavior of processes but also constrain the order of resources (i.e., shared processes) they may acquire. This typelevel information is then used to rule out cyclic dependencies among acquires and synchronization points, resulting in a system that ensures *deadlock-free communication* for well-typed processes in the presence of shared sessions, higher-order channel passing, and recursive processes. We illustrate our approach on a series of examples, showing that it rules out deadlocks in circular networks of both shared and linear recursive processes, while still being permissive enough to type concurrent implementations of shared imperative data structures as processes.

**Keywords:** Linear and shared session types · Deadlock-freedom

# **1 Introduction**

*Session types* [25–27] naturally describe the interaction protocols that arise amongst concurrent processes that communicate via message-passing. This typing discipline has been integrated (with varying static safety guarantees) into several mainstream language such as Java [28,29], F# [43], Scala [49,50], Go [11] and Rust [33]. Session types moreover enjoy a logical correspondence between *linear logic* and the *session-typed* π*-calculus* [8,9,51,55]. Languages building on this correspondence [24,52,55] not only guarantee *session*

Supported by NSF Grant No. CCF-1718267: "Enriching Session Types for Practical Concurrent Programming" and NOVA LINCS (Ref. UID/CEC/04516/2019).

*fidelity* (i.e., type preservation) but also *deadlock-freedom* (i.e., global progress). The latter is guaranteed even in the presence of interleaved sessions, which are often excluded from the deadlock-free fragments of traditional session-typed frameworks [20,26,27,53]. These logical session types, however, exclude programming scenarios that demand *sharing* of mutable resources (e.g., shared databases or shared output devices) instead of functional resource replication.

To increase their practicality, logical session types have been extended with *manifest sharing* [2]. In the resulting language, linear and shared sessions coexist, but the type system enforces that clients of shared sessions run in mutual exclusion of each other. This separation is achieved by enforcing an *acquirerelease* policy, where a client of a shared session must first acquire the session before it can participate in it along a private linear channel. Conversely, when a client releases a session, it gives up its linear channel and only retains a shared reference to the session. Thus, sessions in the presence of manifest sharing can change, or *shift*, between shared and linear execution modes. At the type-level, the acquire-release policy manifests in a stratification of session types into linear and shared with adjoint modalities [5,47,48], connecting the two strata. Operationally, the modality shifting *up* from the linear to the shared layer translates into an *acquire* and the one shifting *down* from shared to linear into a *release*.

Manifest sharing greatly increases the range of programs that can be written because it recovers the expressiveness of the untyped asynchronous π-calculus [3] while maintaining session fidelity. As in the π-calculus, however, the gain in expressiveness comes at the cost of *deadlock-freedom*. An illustrative example is an implementation of the classical dining philosophers problem, shown in Fig. 1, using the language SILL<sup>S</sup> [2] that supports manifest sharing (in this setting we often equate a process with the session it offers along a distinguished channel). The code shows the process *fork proc*, implementing a session of type sfork, and the processes *thinking* and *eating*, implementing sessions of type philosopher. We defer the details of the typing and the definition of the session types sfork and philosopher to Sect. 2 and focus on the programmatic working of the processes for now. For ease of reading, we typeset shared session types and variables denoting shared channel references in red.

A *fork proc* process represents a fork that can be perpetually acquired and released. The actions accept and detach are the duals of acquire and release, respectively, allowing a process to accept an acquire by a client and to initiate a release by a client, respectively. Process thinking has two shared channel references as arguments, for the forks to the left and right of the philosopher, which the process tries to acquire. If the acquire succeeds, the process recurs as an eating philosopher with two (now) linear channel references of type lfork. Once a philosopher is done eating, it releases both forks and recurs as a thinking philosopher. Let's set a table for three philosopher that share three forks, all spawned as processes executing in parallel:

*f<sup>0</sup>* ← *fork proc* ; *f<sup>1</sup>* ← *fork proc* ; *f<sup>2</sup>* ← *fork proc* ; *p<sup>0</sup>* ← *thinking* ← *f<sup>0</sup>* , *f<sup>1</sup>* ; *p<sup>1</sup>* ← *thinking* ← *f<sup>1</sup>* , *f<sup>2</sup>* ; *p<sup>2</sup>* ← *thinking* ← *f<sup>2</sup>* , *f<sup>0</sup>* ;


**Fig. 1.** Dining philosophers in SILL<sup>S</sup> [2].

Infamously, this configuration may deadlock because of the *circular* dependency between the acquires. We can break this cycle by changing the last line to *p<sup>2</sup>* ← *thinking* ← *f<sup>0</sup>* , *f<sup>2</sup>* , ensuring that forks are acquired in increasing order.

Perhaps surprisingly, cyclic dependencies between acquire requests are not the only source of deadlocks. Fig. 2 gives an example, defining the processes *owner* and *contester* , which both have a shared channel reference to a common resource that can be perpetually acquired and released. Both processes acquire the shared resource, but additionally exchange the message ping. More precisely, process *owner* spawns the process *contester* , acquires the shared resource, and only releases the resource after having received the message ping from the *contester* . Process *contester* , on the other hand, first attempts to acquire the resource and then sends the message *ping* to the owner. The program deadlocks if process *owner* acquires the resource first. In that case, process *owner* waits for process *contester* to send the message ping while process *contester* waits to acquire the resource held by process *owner* . We note that this deadlock arises in both synchronous and asynchronous semantics.

```
owner : {1 sres}
o owner sr =
  c contester sr ;
  lr acquire sr ;
  case c of
  | ping wait c ;
          sr release lr ; close o
                                          contester : {⊕{ping : 1} sres}
                                          c contester sr =
                                            lr acquire sr ;
                                            c.ping ;
                                            sr release lr ;
                                            close c
```
**Fig. 2.** Circular dependencies among acquire and synchronization actions.

In this paper, we develop a type system for manifest sharing that rules out cycles between acquire requests and interdependencies between acquire requests and synchronization actions, detecting the two kinds of deadlocks explained above. In our type system, session types not only prescribe *when* resources must be acquired and released, but also the *range* of resources that may be acquired. To this end, we equip the type system with the notion of a *world*, an abstract value at which a process resides, and type processes relative to an acyclic *ordering* on worlds, akin to the partial-order based approaches of [34,37]. The contributions of this paper are:


This paper is structured as follows: Sect. 2 provides a short introduction to manifest sharing. Sect. 3 develops the type system and dynamics of the language SILLS<sup>+</sup> . Sect. 4 illustrates the introduced concepts on an extended example. Sect. 5 discusses the meta-theoretical properties of SILLS<sup>+</sup> , emphasizing progress. Sect. 6 compares with examples of related work and identifies future work. Sect. 7 discusses related work, and Sect. 8 concludes this paper.

# **2 Manifest Sharing**

In the previous section, we have already explored the programmatic workings of *manifest sharing* [2], which enforces an *acquire-release* policy on shared channel references. In this section, we clarify the typing of shared processes.

A key contribution of manifest sharing is not only to support acquire-release as a programming primitive but also to make it *manifest* in the type system. Generalizing the idea of type *stratification* [5,47,48], session types are partitioned into a linear and shared layer with two *adjoint modalities* connecting the layers:

$$\begin{array}{rcl} A\_{\mathsf{5}} & \triangleq \, \, \_{\mathsf{t}}^{\mathsf{S}} A\_{\mathsf{t}}\\ A\_{\mathsf{t}}, B\_{\mathsf{t}} & \triangleq \, A\_{\mathsf{t}} \otimes B\_{\mathsf{t}} \mid \oplus \{ \overline{l:A\_{\mathsf{t}}} \} \mid \otimes \{ \overline{l:A\_{\mathsf{t}}} \} \mid A\_{\mathsf{t}} \multimap B\_{\mathsf{t}} \mid \exists x: A\_{\mathsf{5}}. B\_{\mathsf{t}} \mid \varPi x: A\_{\mathsf{5}}. B\_{\mathsf{t}} \mid \mathbf{1} \mid \downarrow\_{\mathsf{t}}^{\mathsf{S}} A\_{\mathsf{t}} \end{array}$$

In the linear layer, we get the standard connectives of intuitionistic linear logic (*A*<sup>L</sup> <sup>⊗</sup>*B*L, *<sup>A</sup>*<sup>L</sup> *<sup>B</sup>*L, ⊕{<sup>l</sup> : <sup>A</sup>L}, -{l : AL}, and **1**). These connectives are extended with the modal operator ↓<sup>S</sup> <sup>L</sup>AS, shifting *down* from the shared to the linear layer. Similarly, in the shared layer, we have the operator ↑<sup>S</sup> <sup>L</sup>AL, shifting *up* from the linear to the shared layer. The former translates into a *release* (and, dually, detach), the latter into an *acquire* (and, dually, accept). As a result, we obtain a system in which session types prescribe all forms of communication, including the acquisition and release of shared processes.

Table 1 provides an overview of SILLS's session types and their operational reading. Since SILL<sup>S</sup> is based on an intuitionistic interpretation of linear logic session types [8], types are expressed from the point of view of the *providing process* with the channel along which the process provides the session behavior being characterized by its session type. This choice avoids the explicit duality operation present in original presentations of session types [25,26] and in those based


**Table 1.** Session types in SILL<sup>S</sup> and their operational meaning.

on classical linear logic [55]. Table 1 lists the points of view of the *provider* and *client* of a given connective in the first and second lines, respectively. Moreover, Table 1 gives for each connective its session type before and after the message exchange, along with their respective process terms. We can see that the process terms of a provider and a client for a given connective come in matching pairs, indicating that the participants' views of the session change consistently. We use the subscripts L and S to distinguish between linear and shared channels, respectively.

We are now able to give the session types of the processes *fork proc*, *thinking*, and *eating* defined in the previous section:

#### lfork = ↓<sup>S</sup> <sup>L</sup> sfork sfork = ↑<sup>S</sup> <sup>L</sup> lfork phil = **1**

The mutually recursive session types lfork and sfork represent a fork that can perpetually be acquired and released. We adopt an *equi-recursive* [14] interpretation for recursive session types, silently equating a recursive type with its unfolding and requiring types to be *contractive* [19].

We briefly discuss the typing and the dynamics of acquire-release. The typing and the dynamics of the residual linear connectives are standard, and we detail them in the context of SILLS<sup>+</sup> (see Sect. 3). As is usual for an intuitionistic interpretation, each connective gives rise to a left and a right rule, denoting the use and provision, respectively, of a session of the given type:

(T-↑<sup>S</sup> <sup>L</sup><sup>R</sup>) Γ; · - P*x*<sup>L</sup> :: (x<sup>L</sup> : AL) Γ <sup>x</sup><sup>L</sup> <sup>←</sup> accept <sup>x</sup>S; <sup>P</sup>*x*<sup>L</sup> :: (x<sup>S</sup> : <sup>↑</sup><sup>S</sup> <sup>L</sup>AL) (T-↑<sup>S</sup> <sup>L</sup><sup>L</sup>) Γ, x<sup>S</sup> : <sup>↑</sup><sup>S</sup> <sup>L</sup>AL; Δ, x<sup>L</sup> : A<sup>L</sup> - Q*x*<sup>L</sup> :: (z<sup>L</sup> : CL) Γ, x<sup>S</sup> : <sup>↑</sup><sup>S</sup> <sup>L</sup>AL; Δ x<sup>L</sup> ← acquire xS; Q*x*<sup>L</sup> :: (z<sup>L</sup> : CL) (T-↓<sup>S</sup> <sup>L</sup><sup>R</sup>) Γ - P*x*<sup>S</sup> :: (x<sup>S</sup> : AS) Γ; · <sup>x</sup><sup>S</sup> <sup>←</sup> detach <sup>x</sup>L; <sup>P</sup>*x*<sup>S</sup> :: (x<sup>L</sup> : <sup>↓</sup><sup>S</sup> <sup>L</sup>AS) (T-↓<sup>S</sup> <sup>L</sup><sup>L</sup>) Γ, x<sup>S</sup> : AS; Δ - Q*x*<sup>S</sup> :: (z<sup>L</sup> : CL) <sup>Γ</sup>; Δ, x<sup>L</sup> : <sup>↓</sup><sup>S</sup> <sup>L</sup>A<sup>S</sup> x<sup>S</sup> ← release xL; Q*x*<sup>S</sup> :: (z<sup>L</sup> : CL)

The typing judgments Γ P :: (x<sup>S</sup> : AS) and Γ; Δ P :: (x<sup>L</sup> : AL) indicate that process P provides a session of type A along channel x, given the typing of the channels specified in typing contexts Γ (and Δ). Γ and Δ consist of hypotheses on the typing of shared and linear channels, respectively, where Γ is a structural and Δ a linear context. To allow for recursive process definitions, the typing judgment depends on a signature Σ that is populated with all process definitions prior to type-checking. The adjoint formulation precludes shared processes from depending on linear channel references [2,47], a restriction motivated from logic referred to as the independence principle [47]. Thus, when a shared session accepts an acquire and shifts to linear, it starts with an empty linear context.

Operationally, the dynamics of SILL<sup>S</sup> is captured by *multiset rewriting rules* [12], which denote computation in terms of state transitions between configurations of processes. Multiset rewriting rules are local in that they only mention the parts of a configuration they rewrite. For acquire-release we have the following:

(D-↑<sup>S</sup> L) proc(*a*S, *x*<sup>L</sup> ← accept *a*<sup>S</sup> ;*P<sup>x</sup>*<sup>L</sup> ), proc(*c*L, *x*<sup>L</sup> ← acquire *a*<sup>S</sup> ; *Q<sup>x</sup>*<sup>L</sup> ) −→ proc(*a*L, [*a*L/*x*L]*P<sup>x</sup>*<sup>L</sup> ), proc(*c*L, [*a*L/*x*L] *Q<sup>x</sup>*<sup>L</sup> ), unavail(*a*S) (D-↓<sup>S</sup> L) proc(*a*L, *x*<sup>S</sup> ← detach *a*<sup>L</sup> ;*P<sup>x</sup>*<sup>S</sup> ), proc(*c*L, *x*<sup>S</sup> ← release *a*<sup>L</sup> ; *Q<sup>x</sup>*<sup>S</sup> ), unavail(*a*S) −→ proc(*a*S, [*a*S/*x*S]*P<sup>x</sup>*<sup>S</sup> ), proc(*c*L, [*a*S/*x*S] *Q<sup>x</sup>*<sup>S</sup> )

Configuration states are defined by the predicates proc(*cm*, *P*) and unavail(*a*S). The former denotes a running process with process term P providing along channel c*m*, the latter acts as a placeholder for a shared process providing along channel a<sup>S</sup> that is currently not available. The above rule exploits the invariant that a process' providing channel a can appear at one of two modes, a linear one, aL, and a shared one, aS. While the process (i.e. the session) is linear, it provides along aL, while it is shared, along aS. When a process shifts between modes, it switches between the two modes of its offering channel. The channel at the appropriate mode is substituted for the variables occurring in process terms.

# **3 Manifest Deadlock-Freedom**

In this section, we introduce our language SILLS<sup>+</sup> , a session-typed language that supports sharing without deadlock. We focus on SILLS<sup>+</sup> 's type system and dynamics in this section and discuss its meta-theoretical properties in Sect. 5.

#### **3.1 Competition and Collaboration**

The introduction of acquire-release, to ensure that the multiple clients of a shared process interact with the process in mutual exclusion from each other, gives rise to an obvious source of deadlocks, as acquire-release effectively amounts to a locking discipline. The typical approach to prevent deadlocks in that case is to impose a partial order on the resources and to *"lock-up"*, i.e., to lock the resources in ascending order. We adopted this strategy in Sect. 1 (Fig. 1) to break the cyclic dependencies among the acquires in the dining philosophers.

In Sect. 1, however, we also considered another example (Fig. 2) and discovered that *cyclic acquisitions* are not the only source of deadlocks, but deadlocks can also arise from *interdependent acquisitions and synchronizations*. In that example, we can prevent the deadlock by moving the acquire past the synchronization, in either of the two processes. Whereas in a purely linear session-typed system the sequencing of actions within a process do not affect other processes, the relative placement of acquire requests and synchronizations become relevant in a shared session-typed system.

Based on this observation, we can divide the processes in a shared-session discipline into *competitors* and *collaborators*. The former compete for a set of resources, whereas the latter do not overlap in the set of resources they acquire. For example, in the dining philosophers (Fig. 1), the philosophers *p<sup>0</sup>* , *p<sup>1</sup>* , and *p<sup>2</sup>* compete with each other for the set of forks *f<sup>0</sup>* , *f<sup>1</sup>* , and *f<sup>2</sup>* , whereas the process that spawns the philosophers and the forks collaborates with either of them.

Transferring this idea to the process graph that emerges at run-time, we note that competitors are siblings whereas collaborators stand in a parent-descendant relationship. We illustrate this outcome on Fig. 3 that shows a possible runtime process graph for the dining philosophers. Linear processes are depicted as solid black circles with a white identifier and shared processes are depicted as dotted filled violet circles with a black identifier. Linear channels are depicted as black lines, shared channel references as dotted violet lines with the arrow head pointing to the shared process being acquired<sup>1</sup>. The identifiers P0, P1, and P<sup>2</sup> stand for the three philosophers, F0, F1, and F<sup>2</sup> for the three forks, and T for the process that sets the table. The current run-time graph depicts the scenario in which P<sup>1</sup> is eating, while the other two philosophers are still thinking.

Embedded in the graph is a *tree* that arises from the linear processes and the linear channels connecting them. For any two nodes in this tree, the *parent* node denotes the *client* process and the *child* node the *providing* process. We note that the *independence principle* (see Sect. 2), which precludes shared processes from depending on linear channel references, guarantees that there exists exactly one tree in the process graph, with the linear main process as its root. The shape of the tree changes when new processes are spawned, linear channels exchanged (through ⊗ and ), or shared processes acquired. For example, process P<sup>2</sup> could acquire the shared fork F0, which then becomes a linear child process of P2, should the acquire succeed. As indicated by the shared channel references, the

<sup>1</sup> We have made sure to make the different concepts distinguishable in greyscale mode.

**Fig. 3.** Run-time process graph for dining philosophers (see Fig. 1).

sibling nodes P0, P1, and P<sup>2</sup> compete with each other for the nodes F0, F1, and F2, whereas the node T does not compete for any of the resources acquired by its *descendants* (including F<sup>1</sup> and F2). Our type system enforces this paradigm, as we discuss in the next section.

#### **3.2 Type System**

**Invariants.** Having identified the notions of *collaborators* and *competitors*, our type system must guarantee: *(i)* that collaborators acquire mutually disjoint sets of resources; *(ii)* that competitors employ a locking-up strategy for the resources they share; and, *(iii)* that competitors have released all acquired resources when synchronizing with other competitors. Invariant *(ii)* rules out cyclic acquisitions and invariants *(i)* and *(iii)* combined rule out interdependent acquisitions and synchronizations.

To express the high-level invariants above in our type system, we introduce the notion of a *world* – an abstract value that is equipped with a partial order – and associate such a world with every process. Programmers can *create* worlds, indicate the world at which a process resides at spawn time, and define an *order* on worlds. Moreover, we associate with each process a *range of worlds* that indicates the worlds of resources that the process may acquire. As a result, we obtain the following typing judgments:

> <sup>Ψ</sup>; <sup>Γ</sup> <sup>P</sup> :: (x<sup>S</sup> : <sup>A</sup>S[ω<sup>k</sup> <sup>ω</sup>*<sup>n</sup>* <sup>ω</sup>*<sup>l</sup>* ]) (where <sup>Ψ</sup> <sup>+</sup> irreflexive) <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Φ</sup>; <sup>Δ</sup> <sup>P</sup> :: (x<sup>L</sup> : <sup>A</sup>L[ω<sup>k</sup> <sup>ω</sup>*<sup>n</sup>* <sup>ω</sup>*<sup>l</sup>* ]) (where <sup>Ψ</sup> <sup>+</sup> irreflexive)

The typing judgments reveal that we impose worlds at the *judgmental level*, resulting in a *hybrid system*, in which the adjoint modalities for acquire-release are complemented with world modalities that occur as *syntactic objects* in propositions [7]. We use the notation <sup>x</sup><sup>m</sup> : <sup>A</sup>m[ω<sup>k</sup> <sup>ω</sup>*<sup>n</sup>* <sup>ω</sup>*<sup>l</sup>* ] (where m stands for S or L) to associate worlds ωk, ωl, and ω<sup>n</sup> with a process that offers a session of type A<sup>m</sup> along channel x. World ω<sup>k</sup> denotes the world at which the process resides. We refer to this world as the *self* world. Worlds ω<sup>l</sup> and ω<sup>n</sup> indicate the range of worlds of resources that the process may acquire, with ω<sup>l</sup> denoting the *minimal (min)* world in this range and ω<sup>n</sup> the *maximal (max)* one.

Process terms are typed relative to the order specified in Ψ and the contexts Γ, Φ, and Δ. As in Sect. 2, Γ is a structural context consisting of hypotheses on the typing of variables bound to shared channel references, augmented with world annotations. We find it necessary to split the linear context "Δ" from Sect. 2 into the two disjoint contexts Φ and Δ, allowing us to separate channels that are possibly aliased (due to sharing) from those that are not, respectively. Both Φ and Δ consist of hypotheses on the typing of variables that are bound to linear channels, augmented with world annotations. Ψ is presupposed to be *acyclic* and defined as: Ψ - · | Ψ , ω*<sup>k</sup>* < ω*<sup>l</sup>* | Ψ , ω*o*, where ω stands for a concrete *world* w or a *world variable* δ. We allow Ψ to contain single worlds, to support singletons as well as to accommodate world creation prior to order declaration. We define the transitive closure Ψ <sup>+</sup>, yielding a *strict partial order*, and the reflexive transitive closure Ψ∗, yielding a *partial order*.

The high-level invariants *(i)*, *(ii)*, and *(iii)* identified earlier naturally transcribe into the following invariants, which we impose on the typing judgments above. We use the notation x<sup>m</sup> ; P to denote a process term that currently executes an action along channel xm.


Invariants 1 and 2 ensure that, for any node in the tree, the acquired resources reside at smaller worlds than those acquired by any descendant. As a result, the two invariants guarantee high-level invariant *(i)*. Invariant 3, on the other hand, imposes a lock-up strategy on acquires and thus guarantees high-level invariant *(ii)*. To guarantee high-level invariant *(iii)*, we impose Invariant 4, which forces a process to release any acquired resources before communicating along its offering channel. Since sibling nodes cannot be directly connected by a linear channel, the only way for them to synchronize is through a common parent. Finally, to guarantee that world annotations are internally consistent, we require for each annotation [ω<sup>k</sup> <sup>ω</sup>*<sup>n</sup>* <sup>ω</sup>*<sup>l</sup>* ] that ω<sup>k</sup> < ω<sup>l</sup> ≤ ωn.

**Rules.** We now present select process typing rules, a complete listing is provided in the companion technical report [4]. The only new rules with respect to the language SILL<sup>S</sup> [2] are those pertaining to world creation and order determination. These are extra-logical judgmental rules. We allow both linear and shared processes to create and relate worlds. Rules (T-NewL) and (T-NewS) create a new world w and make it available to the continuation Qw. Rules (T-OrdL) and (T-OrdS) relate two existing worlds, while preserving acyclicity of the order.

Ψ, w; Γ; Φ; Δ Q<sup>w</sup> :: (x<sup>L</sup> : AL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ; Φ; Δ w ← new world; Q<sup>w</sup> :: (x<sup>L</sup> : AL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-NewL) Ψ, w; Γ Q<sup>w</sup> :: (x<sup>S</sup> : AS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ w ← new world; Q<sup>w</sup> :: (x<sup>S</sup> : AS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-NewS) ωp, ω<sup>r</sup> ∈ Ψ (Ψ,ω<sup>p</sup> < ωr) <sup>+</sup> irreflexive Ψ,ω<sup>p</sup> < ωr; Γ; Φ; Δ Q :: (x<sup>L</sup> : AL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ; Φ; Δ ω<sup>p</sup> < ωr; Q :: (x<sup>L</sup> : AL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-OrdL) ωp, ω<sup>r</sup> ∈ Ψ (Ψ,ω<sup>p</sup> < ωr) <sup>+</sup> irreflexive Ψ,ω<sup>p</sup> < ωr; Γ Q :: (x<sup>S</sup> : AS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ ω<sup>p</sup> < ωr; Q :: (x<sup>S</sup> : AS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-OrdS)

We now consider the typing rule for acquire, which must explicitly enforce the various low-level invariants above. Since an acquire results in the addition of a new child node to the executing process, the rule can interfere with Invariants 1 and 2. The first two premises of the rule ensure that the two invariants are preserved. Moreover, the rule has to ensure that the acquiring process is lockingup (Invariant 3), which is achieved by the third premise.

$$\begin{array}{llll} \Psi^{\bullet} \vdash \omega\_{k} \leq \omega\_{m} \leq \omega\_{n} & \Psi^{+} \vdash \omega\_{n} < \omega\_{u} & \forall y\_{\mathsf{L}} : B\_{\mathsf{L}}[\omega\_{\mathsf{L}}\uparrow\_{\omega\_{p}}^{\omega\_{\mathsf{T}}}] \in \Phi : \omega\_{l} < \omega\_{m} \\ \Psi; \; \Gamma, x\_{\mathsf{S}} : \; \uparrow\_{\mathsf{L}}^{\mathsf{S}} A\_{\mathsf{L}}[\omega\_{m}\uparrow\_{\omega\_{u}}^{\omega\_{v}}]; \; \Phi, x\_{\mathsf{L}} : A\_{\mathsf{L}}[\omega\_{m}\uparrow\_{\omega\_{u}}^{\omega\_{v}}]; \; \Delta \vdash Q\_{\mathsf{L}} :: \left(z\_{\mathsf{L}} : C\_{\mathsf{L}}[\omega\_{\mathsf{J}}\uparrow\_{\omega\_{\mathsf{k}}}^{\omega\_{n}}]\right) \\ \hline \Psi; \; \Gamma, x\_{\mathsf{S}} : \; \uparrow\_{\mathsf{L}}^{\mathsf{S}} A\_{\mathsf{L}}[\omega\_{m}\downarrow\_{\omega\_{u}}^{\omega\_{v}}]; \; \Phi ; \; \Delta \vdash x\_{\mathsf{L}} \leftarrow \mathsf{acquire } x\_{\mathsf{S}}; Q\_{\mathsf{L}} :: \left(z\_{\mathsf{L}} : C\_{\mathsf{L}}[\omega\_{\mathsf{J}}\uparrow\_{\omega\_{\mathsf{k}}}^{\omega\_{n}}]\right) \end{array} \text{( $\Gamma \uparrow$ ^{5}\_{L})}$$

The remaining shift rules are actually *unchanged* with respect to SILLS, modulo the world annotations. In particular, low-level Invariant 4 is already satisfied because the conclusion of rule (T-↑<sup>S</sup> <sup>L</sup><sup>R</sup>) does not have a context Φ and because the independence principle forces Φ to be empty in rule (T-↓<sup>S</sup> <sup>L</sup><sup>R</sup>).

Ψ; Γ; · ; · P*<sup>x</sup>*<sup>L</sup> :: (x<sup>L</sup> : AL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ x<sup>L</sup> ← accept x<sup>S</sup> ; P*<sup>x</sup>*<sup>L</sup> :: (x<sup>S</sup> : ↑<sup>S</sup> LAL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-↑<sup>S</sup> <sup>L</sup><sup>R</sup>) Ψ; Γ, x<sup>S</sup> : AS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]; <sup>Φ</sup>; <sup>Δ</sup> <sup>Q</sup>*<sup>x</sup>*<sup>S</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω<sup>j</sup> <sup>ω</sup>*n*ω*<sup>k</sup>* ]) Ψ; Γ; Φ, x<sup>L</sup> : ↓<sup>S</sup> LAS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]; <sup>Δ</sup> <sup>x</sup><sup>S</sup> <sup>←</sup> release <sup>x</sup><sup>L</sup> ; <sup>Q</sup>*<sup>x</sup>*<sup>S</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω<sup>j</sup> <sup>ω</sup>*n*ω*<sup>k</sup>* ]) (T-↓<sup>S</sup> <sup>L</sup><sup>L</sup>) Ψ; Γ P*<sup>x</sup>*<sup>S</sup> :: (x<sup>S</sup> : AS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ; · ; · x<sup>S</sup> ← detach x<sup>L</sup> ; P*<sup>x</sup>*<sup>S</sup> :: (x<sup>L</sup> : ↓<sup>S</sup> LAS[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-↓<sup>S</sup> <sup>L</sup><sup>R</sup>)

We now consider the linear connectives, starting with **1**. Rule (T-**1**L) reveals that only processes that have never been acquired may be terminated. This restriction is important to guarantee progress because existing clients of a shared process may wait indefinitely otherwise. We impose the restriction as a wellformedness condition on a session type, giving rise to a *strictly equi-synchronizing* session type. The notion of an *equi-synchronizing* session type [2] has been defined for SILL<sup>S</sup> and guarantees that a process that has been acquired at a type A<sup>S</sup> is released back to the type AS, should it ever be released. A *strictly* equi-synchronizing session type additionally requires that an acquired resource *must* be released. The corresponding rules can be found in [4]. Linearity enforces Invariant 4 in rule (T-**1**R), making sure that no linear channels are left behind.

$$\begin{array}{c} \Psi; \; \Gamma; \; \Phi; \; \Delta \vdash Q :: (\mathsf{z}\_{\mathsf{L}} : C\_{\mathsf{L}}[\omega\_{j} \uparrow\_{\omega\_{k}}^{\omega\_{n}}]) \\\hline \Psi; \; \Gamma; \; \Phi; \; \Delta, x\_{\mathsf{L}} : \mathbf{1}[\omega\_{m} \uparrow\_{\omega\_{u}}^{\omega\_{v}}] \vdash \mathsf{wait} \, x\_{\mathsf{L}} ; Q :: (z\_{\mathsf{L}} : C\_{\mathsf{L}}[\omega\_{j} \uparrow\_{\omega\_{k}}^{\omega\_{n}}]) \\\hline \Psi; \; \Gamma; \; \cdot ; \; \vdash \mathsf{close} \, x\_{\mathsf{L}} :: (x\_{\mathsf{L}} : \mathbf{1}[\omega\_{m} \uparrow\_{\omega\_{u}}^{\omega\_{v}}]) \end{array} \text{( $\mathbf{T} \cdot \mathbf{1}\_{R}$ )}$$

Next, we consider internal and external choice. Since internal and external choice cannot alter the linear process tree of a process graph, the rules are very similar to the ones in SILLS. The only differences are that we get two left rules for each connective and that the Φ-context of each right rule must be empty to satisfy Invariant 4. The former is merely due to the tracking of possibly aliased sessions in the Φ context. We only list rules for internal choice, those for external choice are dual and can be found in [4].

(∀i) Ψ; Γ; Φ; Δ, x<sup>L</sup> : A<sup>L</sup>*<sup>i</sup>* [ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ] <sup>Q</sup>*<sup>i</sup>* :: (z<sup>L</sup> : <sup>C</sup>L[ω<sup>j</sup> <sup>ω</sup>*n*ω*<sup>k</sup>* ]) Ψ; Γ; Φ; Δ, x<sup>L</sup> : ⊕{l : AL}[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ] case <sup>x</sup><sup>L</sup> of <sup>l</sup> <sup>⇒</sup> <sup>Q</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω<sup>j</sup> <sup>ω</sup>*n*ω*<sup>k</sup>* ]) (T-⊕<sup>L</sup><sup>1</sup> ) (∀i) Ψ; Γ; Φ, x<sup>L</sup> : A<sup>L</sup>*<sup>i</sup>* [ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]; <sup>Δ</sup> <sup>Q</sup>*<sup>i</sup>* :: (z<sup>L</sup> : <sup>C</sup>L[ω<sup>j</sup> <sup>ω</sup>*n*ω*<sup>k</sup>* ]) Ψ; Γ; Φ, x<sup>L</sup> : ⊕{l : AL}[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]; <sup>Δ</sup> case <sup>x</sup><sup>L</sup> of <sup>l</sup> <sup>⇒</sup> <sup>Q</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω<sup>j</sup> <sup>ω</sup>*n*ω*<sup>k</sup>* ]) (T-⊕<sup>L</sup><sup>2</sup> ) Ψ; Γ; · ; Δ P :: (x<sup>L</sup> : A<sup>L</sup> *<sup>h</sup>* [ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) Ψ; Γ; · ; Δ xL.l*<sup>h</sup>* ; P :: (x<sup>L</sup> : ⊕{l : AL}[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ]) (T-⊕R)

More interesting are linear channel output and input, since these alter the linear process tree of a process graph. Moreover, additional world annotations are needed to indicate the worlds of the channel that is exchanged. For the latter we use the notation @ω<sup>l</sup> <sup>ω</sup>*<sup>r</sup>* <sup>ω</sup>*<sup>p</sup>* , indicating that the exchanged channel has the worlds ωl, ωp, and ω<sup>r</sup> for self, min, and max, respectively. To account for induced changes in the process graph, the rules that type an input of a linear channel must guard against any disturbance of Invariants 1 and 2. Because the two invariants guarantee that parents do not overlap with their descendants in terms of acquired resources, they prevent any exchange of acquired channels. We thus restrict ⊗ and to the exchange of channels that have not yet been acquired. This is not a limitation since, as we will see below, shared channel output and input are unrestricted.

Even with the above restriction in place, we still have to make sure that a received channel satisfies Invariant 2. If we were to state a corresponding premise on the receiving rules, invertibility of the rules would be disturbed. To uphold invertibility, we impose a well-formedness condition on session types that ensures for a session of type <sup>A</sup>L@ω<sup>l</sup> <sup>ω</sup>*<sup>r</sup>* <sup>ω</sup>*<sup>p</sup>* <sup>⊗</sup>BL[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ] that ω<sup>v</sup> < ω<sup>p</sup> and, analogously, for a session of type <sup>A</sup>L@ω<sup>l</sup> <sup>ω</sup>*<sup>r</sup>* <sup>ω</sup>*p* <sup>B</sup>L[ω<sup>m</sup> <sup>ω</sup>*<sup>v</sup>* <sup>ω</sup>*<sup>u</sup>* ] that ω<sup>v</sup> < ωp. Session types are checked to be well-formed upon process definition. Given type well-formedness, we obtain the following rules for , noting that the right rule enforces Invariant 4 by requiring an empty Φ-context. The rules for ⊗ are dual.

<sup>Ψ</sup>; <sup>Γ</sup> ; <sup>Φ</sup>; Δ, x<sup>L</sup> : <sup>B</sup>L[ω*mωv ωu* ] <sup>Q</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω*jωnωk* ]) <sup>Ψ</sup>; <sup>Γ</sup> ; <sup>Φ</sup>; Δ, x<sup>L</sup> : <sup>A</sup>L@ω*lωr ωp*- <sup>B</sup>L[ω*mωv ωu* ], y<sup>L</sup> : <sup>A</sup>L[ω*lωr ωp* ] send <sup>x</sup><sup>L</sup> <sup>y</sup><sup>L</sup> ; <sup>Q</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω*jωnωk* ]) (T--L1 ) <sup>Ψ</sup>; <sup>Γ</sup> ; Φ, x<sup>L</sup> : <sup>B</sup>L[ω*mωv ωu* ]; <sup>Δ</sup> <sup>Q</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω*jωnωk* ]) <sup>Ψ</sup>; <sup>Γ</sup> ; Φ, x<sup>L</sup> : <sup>A</sup>L@ω*lωr ωp*- <sup>B</sup>L[ω*mωv ωu* ]; Δ, y<sup>L</sup> : <sup>A</sup>L[ω*lωr ωp* ] send <sup>x</sup><sup>L</sup> <sup>y</sup><sup>L</sup> ; <sup>Q</sup> :: (z<sup>L</sup> : <sup>C</sup>L[ω*jωnωk* ]) (T--L2 ) <sup>Ψ</sup>; <sup>Γ</sup> ; · ; Δ, y<sup>L</sup> : <sup>A</sup>L[ω*lωr ωp* ] <sup>P</sup>*y*<sup>L</sup> :: (x<sup>L</sup> : <sup>B</sup>L[ω*mωv ωu* ]) <sup>Ψ</sup>; <sup>Γ</sup> ; · ; <sup>Δ</sup> <sup>y</sup><sup>L</sup> <sup>←</sup> recv <sup>x</sup><sup>L</sup> ; <sup>P</sup>*y*<sup>L</sup> :: (x<sup>L</sup> : <sup>A</sup>L@ω*lωr ωp*- <sup>B</sup>L[ω*mωv ωu* ]) (T--R)

Since there are no invariants imposed on the shared context Γ, the rules for shared channel output and input are identical to those in SILLS. The only differences are that we have two left rules and that the Φ-context of the right rule must be empty to satisfy Invariant 4. The former is merely due to the tracking of possibly aliased sessions in the Φ context.

$$\begin{array}{c} \Psi; \; I, y\_{5}: \; A\_{5}[\omega\_{1}\downarrow\_{\omega\_{p}}^{\omega\_{r}}]; \Phi; \; \Delta\_{1}: B\_{1}[\omega\_{m}\downarrow\_{\omega\_{u}}^{\omega\_{v}}] \vdash Q\_{y\_{5}} :: \left(z\_{1}:C\_{1}[\omega\_{j}\downarrow\_{\omega\_{k}}^{\omega\_{n}}]\right) \\ \hline \Psi; \; I'; \Phi; \; \Delta, x\_{\mathtt{L}}: \exists x; A\_{5} \otimes \omega\_{1}\downarrow\_{\omega\_{p}}^{\omega\_{r}}.B\_{1}[\omega\_{m}\downarrow\_{\omega\_{u}}^{\omega\_{v}}] \vdash y\_{5} \leftarrow \mathsf{recv}\,x\_{\mathtt{L}}; Q\_{y\_{5}} :: \left(z\_{1}:C\_{1}[\omega\_{j}\uparrow\_{\omega\_{k}}^{\omega\_{n}}]\right) \\ \Psi; \; I', y\_{5}: A\_{5}[\omega\_{1}\downarrow\_{\omega\_{p}}^{\omega\_{r}}]; \Phi, x\_{\mathtt{L}}: B\_{1}[\omega\_{m}\downarrow\_{\omega\_{u}}^{\omega\_{v}}]; \; \Delta \vdash Q\_{y\_{5}} :: \left(z\_{1}:C\_{1}[\omega\_{j}\uparrow\_{\omega\_{k}}^{\omega\_{n}}]\right) \\ \hline \Psi; \; I'; \Phi, x\_{\mathtt{L}}: \exists x.A\_{5} \otimes \omega\_{1}\downarrow\_{\omega\_{p}}^{\omega\_{r}}.B\_{1}[\omega\_{m}\downarrow\_{\omega\_{u}}^{\omega\_{u}}]; \; \Delta \vdash y\_{5} \leftarrow \mathsf{recv}\,x\_{\mathtt{L}}; Q\_{y\_{5}} :: \left(z\_{1}:C\_{1}[\omega\_{j}\downarrow\_{\omega\_{k}}^{\omega\_{n}}]\right) \\ \Psi; \; I, y\_{5}: A\_{5}[\omega\_{1}\downarrow\_{\omega\_{$$

We finally consider the rules for forwarding and spawning. We allow a shared forward between processes that offer the same session at the same worlds. Because forwards have to be *world-invariant*, however, no well-typed program could ever have a linear forward. The process being forwarded to must be in either of the contexts Φ or Δ, and thus satisfies Invariant 2, making it impossible for the world annotations of the forwarder and forwardee to match. We omit linear forwarding and discuss possible future extensions in Sect. 6.

$$\begin{array}{|c|l|l|l|} \hline \hline \Psi \text{\'s} & \Gamma , y\_{\sf S} \text{ : } A\_{\sf S} [\omega\_{\sf j} \uparrow\_{\omega\_{\sf k}}^{\omega\_{\sf n}}] \vdash \text{fwd } x\_{\sf S} \, y\_{\sf S} \text{ : } (x\_{\sf S} \, : A\_{\sf S} [\omega\_{\sf j} \uparrow\_{\omega\_{\sf k}}^{\omega\_{\sf n}}]) \\ \hline \end{array} \text{\textbf{(T-ID\_{\sf S}})}$$

The rules for spawning depend on the possible modes of the spawning and spawned processes: (T-SpawnLL) specifies how a linear process can spawn another linear process; (T-SpawnSS) specifies how a shared processes can spawn another shared process. The rules are checked relative to a process definition found in the signature Σ and to a world substitution mapping γ : |Ψ|→|Ψ |, such that for each δ ∈ Ψ we have Ψ γ(δ), where |Ψ| denotes the *field* of Ψ (i.e., the union of its domain and range). As usual, we lift substitution to types ˆγ(Am), contexts ˆγ(Γ), and orders ˆγ(Ψ). Both rules ensure that, given the mapping γ, the order Ψ of the spawning process entails the one of the process definition (Ψ γˆ(Ψ )). The linear spawn rule (T-SpawnLL) further enforces Invariant 2 for the spawned child. We note that the spawned child enters the linear context Δ in the spawning process' continuation since no aliases to such a process can exist at this point.

<sup>Δ</sup><sup>1</sup> <sup>=</sup> <sup>y</sup><sup>L</sup> : <sup>B</sup>L[ω*mωv ωu* ] <sup>Φ</sup><sup>1</sup> <sup>=</sup> <sup>y</sup>˜<sup>L</sup> : <sup>B</sup>˜L[˜ω*mω*˜*<sup>v</sup> <sup>ω</sup>*˜*<sup>u</sup>* ] <sup>Γ</sup><sup>1</sup> <sup>=</sup> <sup>z</sup><sup>S</sup> : <sup>C</sup>S[ω*lωr ωp* ] (Ψ- x- <sup>L</sup> : A- L[δ*jδn δk* ] <sup>←</sup> <sup>X</sup><sup>L</sup> <sup>←</sup> <sup>Δ</sup>-, Φ-, Γ - = P*x*- L*,*dom(*Δ*-)*,*dom(*Φ*-)*,*dom(*Γ* -)*,Ψ*-- ) ∈ Σ γˆ(A- L[δ*jδn δk* ]) = <sup>A</sup>L[ω*jωnωk* ] ˆγ(Δ-) = Δ1 γˆ(Φ-) = Φ1 γˆ(Γ -) = Γ<sup>1</sup> Ψ γˆ(Ψ-) <sup>Ψ</sup><sup>+</sup> <sup>ω</sup>*<sup>t</sup>* < ω*<sup>k</sup>* <sup>Ψ</sup>; <sup>Γ</sup>1, Γ2; <sup>Φ</sup>2; <sup>Δ</sup>2, x<sup>L</sup> : <sup>A</sup>L[ω*jωnωk* ] <sup>Q</sup>*x*<sup>L</sup> :: (z-- <sup>L</sup> : <sup>D</sup>L[ω*iωt ωq* ]) <sup>Ψ</sup>; <sup>Γ</sup>1, Γ2; <sup>Φ</sup>1, Φ2; <sup>Δ</sup>1, Δ<sup>2</sup> <sup>x</sup><sup>L</sup> : <sup>A</sup>L[ω*jωnωk* ] <sup>←</sup> <sup>X</sup><sup>L</sup> <sup>←</sup> <sup>y</sup>L, <sup>y</sup>˜L, <sup>z</sup><sup>S</sup> ; <sup>Q</sup>*x*<sup>L</sup> :: (z-- <sup>L</sup> : <sup>D</sup>L[ω*iωt ωq* ]) (T-SpawnLL)

<sup>Γ</sup><sup>1</sup> <sup>=</sup> <sup>z</sup><sup>S</sup> : <sup>C</sup>S[ω*lωr ωp* ] (Ψ- x- <sup>S</sup> : A- S[δ*jδn δk* ] <sup>←</sup> <sup>X</sup><sup>S</sup> <sup>←</sup> <sup>Γ</sup> - = P*x*- <sup>S</sup>*,*dom(*<sup>Γ</sup>* -)*,Ψ*-- ) ∈ Σ γˆ(A- S[δ*jδn δk* ]) = <sup>A</sup>S[ω*jωnωk* ] ˆγ(<sup>Γ</sup> -) = Γ<sup>1</sup> Ψ γˆ(Ψ-) <sup>Ψ</sup>; <sup>Γ</sup>1, Γ2, x<sup>S</sup> : <sup>A</sup>S[ω*jωnωk* ] <sup>Q</sup>*x*<sup>S</sup> :: (z-- <sup>S</sup> : <sup>D</sup>S[ω*iωt ωq* ]) <sup>Ψ</sup>; <sup>Γ</sup>1, Γ<sup>2</sup> <sup>x</sup><sup>S</sup> : <sup>A</sup>S[ω*jωnωk* ] <sup>←</sup> <sup>X</sup><sup>S</sup> <sup>←</sup> <sup>z</sup><sup>S</sup> ; <sup>Q</sup>*x*<sup>S</sup> :: (z-- <sup>S</sup> : <sup>D</sup>S[ω*iωt ωq* ]) (T-SpawnSS)

In the companion technical report [4], we provide a variant of rule (T-SpawnLL) for the case of a linear recursive tail call. Without linear forwarding, a linear tail call can no longer be implicitly "de-sugared" into a spawn and a linear forward [2,22,52], but must be accounted for explicitly. In the report, we also provide the rules for checking process definitions. Those rules make sure that the process' world order is acyclic, that the types of the providing session and argument sessions are well-formed, and that the process satisfies Invariants 1 and 2.

# **3.3 Dining Philosophers in SILLS<sup>+</sup>**

Having introduced our type system, we revisit the dining philosophers from Sect. 1 and show how to program the example in SILLS<sup>+</sup> , ensuring that the program will run without deadlocks. The code is given in Fig. 4. We note the world annotations in the signature of the process definitions. For instance,

$$\{thinking: \{\delta\_0 < \delta\_1, \delta\_1 < \delta\_2, \delta\_2 < \delta\_3 \vdash \mathsf{phl}[\delta\_0 \upharpoonright\_{\delta\_1}^{\delta\_2}] \leftarrow \mathsf{sfork}[\delta\_1 \upharpoonright\_{\delta\_3}^{\delta\_3}], \mathsf{sfork}[\delta\_2 \upharpoonright\_{\delta\_3}^{\delta\_3}]; \cdot; \cdot\}$$

indicates that, given the order δ<sup>0</sup> < δ<sup>1</sup> < δ<sup>2</sup> < δ3, process *thinking* provides a session of type phil[δ<sup>0</sup> <sup>δ</sup><sup>2</sup> <sup>δ</sup><sup>1</sup> ] and uses two shared channel references of type sfork[δ<sup>1</sup> <sup>δ</sup><sup>3</sup> <sup>δ</sup><sup>3</sup> ] and sfork[δ<sup>2</sup> <sup>δ</sup><sup>3</sup> <sup>δ</sup><sup>3</sup> ]. The two · signify that neither acquired nor linear channel references are given as arguments. The signature indicates that the two shared fork references reside at different worlds, such that the world of the first one is smaller than the one of the second.

Let's briefly convince ourselves that the two acquires in process *thinking* in Fig. 4 are type-correct. For each acquire we have to show that: the world of the resource to be acquired is within the acquiring process' range; the max of the acquiring process is smaller than the min of the acquired resource; and, that the self of the acquired resource is larger than those of all already acquired resources. We can convince ourselves that all those conditions are readily met.

**Fig. 4.** Deadlock-free version of dining philosophers in SILLS<sup>+</sup> .

We note, however, that if we were to swap the two acquires, the program would not type-check.

Let us once more set the table for three philosophers and three forks. We execute this code in a process with world annotations [δ<sup>a</sup> <sup>δ</sup>*<sup>b</sup>* <sup>δ</sup>*<sup>b</sup>* ] such that δ<sup>a</sup> < δb. We first create new worlds and define their order:

w<sup>1</sup> ← new world; w<sup>2</sup> ← new world; w<sup>3</sup> ← new world; w<sup>4</sup> ← new world; δ<sup>a</sup> < w1; δ<sup>a</sup> < w2; δ<sup>b</sup> < w1; w<sup>1</sup> < w2; w<sup>1</sup> < w3; w<sup>1</sup> < w4; w<sup>2</sup> < w3; w<sup>2</sup> < w4; w<sup>3</sup> < w4;

We then spawn the forks, each residing at a different world, such that the max world of a fork is higher than the self of the highest fork, ensuring Invariant 2 for the philosopher processes that we spawn afterwards:

```
f1 : sfork[w1 w4
              w4 ] ← fork proc ; f2 : sfork[w2 w4
                                                 w4 ] ← fork proc ;
f3 : sfork[w3 w4
              w4 ] ← fork proc ;
```
When we spawn the philosophers, we ensure that P<sup>0</sup> is going to pick up fork F<sup>1</sup> and then F2, P<sup>1</sup> is going to pick up F<sup>2</sup> and then F3, and P<sup>2</sup> is going to pick up F<sup>1</sup> and then F3.

*p<sup>0</sup>* : phil[δ<sup>a</sup> w2 w1 ] <sup>←</sup> *thinking* ← ·; ·; *<sup>f</sup><sup>1</sup>* , *<sup>f</sup><sup>2</sup>* ; *<sup>p</sup><sup>1</sup>* : phil[δ<sup>a</sup> w3 w2 ] <sup>←</sup> *thinking* ← ·; ·; *<sup>f</sup><sup>2</sup>* , *<sup>f</sup><sup>3</sup>* ; *p<sup>2</sup>* : phil[δ<sup>a</sup> w3 w1 ] <sup>←</sup> *thinking* ← ·; ·; *<sup>f</sup><sup>1</sup>* , *<sup>f</sup><sup>3</sup>* ;

We note that the deadlocking spawn

*<sup>p</sup><sup>2</sup>* : phil[δ<sup>a</sup> w3 w1 ] ← *thinking* ← ·; ·; *f<sup>3</sup>* , *f<sup>1</sup>* ;

is type-incorrect since we would substitute both w<sup>1</sup> and w<sup>3</sup> for δ<sup>1</sup> and w<sup>3</sup> and w<sup>1</sup> for δ2, which violates the ordering constraints put in place by typing.

#### **3.4 Dynamics**

We now give the *dynamics* of SILLS<sup>+</sup> . Our current system is based on a *synchronous* dynamics. While this choice is more conservative, it allows us to narrow the complexity of the problem at hand.

As in SILLS, we use *multiset rewriting rules* [12] to capture the dynamics of SILLS<sup>+</sup> (see Sect. 2). Multiset rewriting rules represent computation in terms of local state transitions between configurations of processes, only mentioning the parts of a configuration they rewrite. We use the predicates proc(*am*, wa1 wa3 wa2 , *<sup>P</sup>am* ) and unavail(*a*S, <sup>w</sup>a1 wa3 wa2 ) to define the states of a configuration (see Sect. 5.1). The former denotes a process executing term P that provides along channel a<sup>m</sup> at mode m with worlds wa1 , wa2 , and wa3 for self, min, and max, respectively. The latter acts as a placeholder for a shared process providing along channel a<sup>S</sup> with worlds wa1 , wa2 , and wa3 for self, min, and max, respectively, that is currently unavailable. We note that since worlds are also run-time artifacts, they must occur as part of the state-defining predicates.

Fig. 5 lists selected rules of the dynamics. Since the rules remain largely the same as those of SILLS, apart from the world annotations that are "threaded through" unchanged, we only discuss the rules that actually differ from the SILL<sup>S</sup> rules. The interested reader can find the remaining rules in the companion technical report [4].

```
(D-SpawnLL)
proc(aL, wa1
               wa3 wa2 , xL : AL[wb1
                                  wb3 wb2 ] XL cL,˜cL, dS ; QxL ),
!def(Ψ-
          x -

              L : A-

                   L[δj δn
                         δk ] XL Δ-

                                           , Φ-

                                               , Γ-
                                                    = Px-

                                                          L,dom(Δ-
                                                                  ),dom(Φ-
                                                                           ),dom(Γ-
                                                                                   ),Ψ-
                                                                                       -
                                                                                        )
     proc(bL, wb1
                     wb3 wb2 , [bL/x -

                                L , cL/dom(Δ-

                                               ),˜cL/dom(Φ-

                                                              ), dS/dom(Γ-

                                                                             )]ˆγ(Px-

                                                                                    L,dom(Δ-
                                                                                             ),dom(Φ-
                                                                                                     ),dom(Γ-
                                                                                                              ),Ψ-
                                                                                                                  -
                                                                                                                   )),
     proc(aL, wa1
                     wa3 wa2 , [bL/xL]QxL ),
     unavail(bS, wb1
                        wb3 wb2 ) (b fresh)
(D-New)
proc(a, wa1
              wa3 wa2 , w new world; Qw) proc(a, wa1
                                                                  wa3 wa2 , Qw) (w fresh)
(D-Ord)
proc(a, wa1
              wa3 wa2 , w < w-

                            ; Q) proc(a, wa1
                                                       wa3 wa2 , Q)
```

```
Fig. 5. Selected multiset rewriting rules of SILLS+ .
```
Noteworthy are the rules D-New and D-Ord for creating and relating worlds, respectively. Rule D-New creates a fresh world, which will be globally available in the configuration. Rule D-Ord, on the other hand, updates the configuration's order with the pair w < w . Rule D-SpawnLL, lastly, substitutes actual worlds for world variables in the body of the spawned process, using the substitution mapping γ defined earlier. It relies on the existence of a corresponding definition predicate for each process definition contained in the signature Σ. We note that the substitution γ in rule D-SpawnLL instantiates the appropriate world variables in the spawned process P.

# **4 Extended Example: An Imperative Shared Queue**

We now develop a typical imperative-style implementation of a queue that uses a list data structure internally to store the queue's elements and has shared references to the front and the back of the list for concurrent dequeueing and enqueueing, respectively. The session types for the queue and the list are<sup>2</sup>

queue *A*<sup>S</sup> = ↑<sup>S</sup> L-{enq : Πx:*A*S. ↓<sup>S</sup> <sup>L</sup>queue *A*S, deq : ⊕{none : ↓<sup>S</sup> <sup>L</sup>queue *A*S,some : ∃x:*A*S. ↓<sup>S</sup> <sup>L</sup>queue *A*S}}

list *A*<sup>S</sup> = ↑<sup>S</sup> L-{ins : Πx:*A*S. ∃y:list *A*S. ↓<sup>S</sup> <sup>L</sup>list *A*S, del : ⊕{none : ↓<sup>S</sup> <sup>L</sup>list *A*S,some : ∃x:*A*S. ↓<sup>S</sup> <sup>L</sup>list *A*S}

The list is implemented in terms of processes *empty* and *elem*, denoting the empty list and a cons cell, respectively. We show the more interesting case of a cons cell (Fig. 6). The queue is defined by processes *head* (Fig. 7) and *queue proc* (Fig. 8), the latter being the queue's interface to its clients.

**Fig. 6.** Imperative queue – *elem* process.

We can now define a client (Fig. 8) for the queue, assuming existence of a corresponding shared session type item and a process *item proc* offering a session of type item[δ3<sup>δ</sup><sup>4</sup> <sup>δ</sup><sup>4</sup> ]. The client instantiates the queue at world δb, allowing it to acquire resources at world w1, which is exactly the world at which process *queue proc* instantiates the list. Given that the client itself resides at world δa, which is smaller than the queue's world δb, the client is allowed to acquire the queue, which in turn will acquire the list to satisfy any requests by the client.

The example showcases a paradigmatic use of several collaborators, where collaborators can hold resources while they "talk down" in the tree. In particular, as illustrated in Fig. 9, the clients C1, C2, and C<sup>3</sup> compete for resources at world δb, i.e., the queue Q. On the other hand, a client C<sup>i</sup> collaborates with the queue Q, the list elements Li, and the items Ii, since they do not overlap in

<sup>2</sup> We adopt polymorphism for the example without formal treatment since it is orthogonal and has been studied for session types in [23,46].



**Fig. 8.** Imperative queue – *queue proc* process and *client* process.

the set of resources they may acquire: a client acquires resources at δb, a queue resources at w1, a list resources at w2, and an item resources at w4, and we have δ<sup>a</sup> < δ<sup>b</sup> < w<sup>1</sup> < w<sup>2</sup> < w<sup>3</sup> < w4. We note in particular that the setup prevents a list element from acquiring its successor, forcing linear access through the queue.

#### **5 Semantics**

In this section, we discuss the meta-theoretical properties of SILLS<sup>+</sup> , focusing on deadlock-freedom. The companion technical report [4] provides further details.

**Fig. 9.** Run-time process graph for imperative queue (see Fig. 3 for legend).

#### **5.1 Configuration Typing and Preservation**

Given the hierarchy between mode S and L and the fact that shared processes cannot depend on linear processes, we divide a configuration into a *shared* part Λ and a linear part Θ. We use the typing judgment Ψ; Γ Λ; Θ :: Γ;Φ, Δ to type configurations. The judgment expresses that a well-formed configuration Λ; Θ provides the shared channels in Γ and the linear channels in Φ and Δ. A configuration is type-checked relative to all shared channel references and a global order Ψ. While type-checking is compositional insofar as each process definition can be type-checked separately, solely relying on the process' local Ψ (and Γ), at run-time, the entire order that a configuration relies upon is considered. We give the configuration typing rules in Fig. 10.

Our progress theorem crucially depends on the guarantee that the Invariants 1 and 2 from Sect. 3 hold for every linear process in a configuration's tree. This is expressed by the premises Inv1(proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> )) and Inv2(proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> )) in rule (T-Θ2), based on the Definitions <sup>1</sup> and <sup>2</sup> below that restate Invariants 1 and 2 for an entire configuration. We note that Invariant 2 is based on the set of all transitive children (i.e., *descendants*) of a process. We formally define the notion of a descendant inductively over a welltyped linear configuration. The interested reader can find the definition in the companion technical report [4].

**Invariant 1 (**min(parent) ≤ self(acquired child) ≤ max(parent)**).** *If* Ψ; Γ Θ :: Φ, Δ *and for any* proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> ) <sup>∈</sup> <sup>Θ</sup> *such that* <sup>Ψ</sup>; <sup>Γ</sup>; <sup>Φ</sup>1; <sup>Δ</sup><sup>1</sup> <sup>P</sup><sup>a</sup><sup>L</sup> :: (a<sup>L</sup> : AL[wa1 wa3 wa2 ])*,* Inv1(proc(*a*L, <sup>w</sup>a1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> )) *holds if an only if for every acquired resource* b<sup>L</sup> : BL[wb1 wb3 wb2 ] <sup>∈</sup> <sup>Φ</sup><sup>1</sup> *it holds that* <sup>Ψ</sup><sup>∗</sup> <sup>w</sup>a2 <sup>≤</sup> <sup>w</sup>b1 <sup>≤</sup> <sup>w</sup>a3 *. Moreover, if* P<sup>a</sup><sup>L</sup> = x<sup>L</sup> ← acquire c<sup>S</sup> ; Q*<sup>x</sup>*<sup>L</sup> *, for a* (c<sup>S</sup> : ↑<sup>S</sup> <sup>L</sup>CL[wc1 wc3 wc2 ]) <sup>∈</sup> <sup>Γ</sup>*, then, for every acquired resource* b<sup>L</sup> : BL[wb1 wb3 wb2 ] <sup>∈</sup> <sup>Φ</sup>1*, it holds that* <sup>Ψ</sup> <sup>+</sup> <sup>w</sup>b1 <sup>&</sup>lt; <sup>w</sup>c1 *and that* Ψ<sup>∗</sup> wa2 ≤ wc1 ≤ wa3 *.*

**Fig. 10.** Configuration typing

**Invariant 2 (max(parent)** < **minima(descendants)).** *If* Ψ; Γ Θ :: Φ, Δ *and for any* proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> ) <sup>∈</sup> <sup>Θ</sup> *and that process' descendants* (Ψ; <sup>Γ</sup> Θ :: Φ, Δ) a<sup>L</sup> = (Φ , Δ )*,* Inv2(proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> )) *holds iff for every descendant* b<sup>L</sup> : BL[wb1 wb3 wb2 ] <sup>∈</sup> (Φ , Δ ) *it holds that* <sup>Ψ</sup> <sup>+</sup> <sup>w</sup>a3 <sup>&</sup>lt; <sup>w</sup>b2 *.*

Our preservation theorem states that Invariants 1 and 2 are preserved for every linear process in the configuration along transitions. Moreover, the theorem expresses that the types of the providing linear channels Φ and Δ are maintained along transitions and that new shared channels and worlds may be allocated. The proof relies, in particular, on session types being strictly equi-synchronizing, on a process' type well-formedness and assurance that the process' min world is less than or equal to its max world.

**Theorem 5.1 (Preservation).** *If* Ψ; Γ Λ; Θ :: Γ; Φ, Δ *and* Λ; Θ −→ Λ ; Θ *, then* Ψ ; Γ Λ ; Θ :: Γ ; Φ, Δ*, for some* Λ *,* Θ *,* Ψ *, and* Γ *.*

#### **5.2 Progress**

In our development so far we have distilled the two scenarios of interdependencies between processes that can lead to deadlocks: *cyclic acquisitions* and *interdependent acquisitions and synchronizations*. This has lead to the development of a type system that ingrains the notions of *competitors* and *collaborators*, such that the former compete for a set of resources whereas the latter do not overlap in the set of resources they acquire. Our type system then ties these notions to a configuration's linear process tree such that collaborators stand in a parent-descendant relationship to each other and competitors in a sibling/cousin relationship. In this section, we prove that this orchestration is sufficient to rule out any of the aforementioned interdependencies.

To this end we introduce the notions of *red* and *green arrows* that allow us to reason about process interdependencies in a configuration's tree. A red arrow points from a linear proc(*a*L, wa1 wa3 wa2 , *<sup>Q</sup>*) to a linear proc(*b*L, <sup>w</sup>b1 wb3 wb2 , *<sup>P</sup>*), if the former is attempting to acquire a resource held by the latter and, consequently, is waiting for the latter to release that resource. A green arrow points from a linear proc(*a*L, wa1 wa3 wa2 , *<sup>Q</sup>*) to a linear proc(*b*L, <sup>w</sup>b1 wb3 wb2 , *<sup>P</sup>*), if the former is waiting to synchronize with the latter. We define these arrows formally as follows:

**Definition 5.2 (Acquire Dependency — "Red Arrow").** *Given a wellformed and well-typed configuration* Ψ; Γ Λ; Θ :: Γ;Φ, Δ*, there exists a waitingdue-to-acquire relation* A(Θ) *among linear processes in* Θ *at run-time such that*

proc(*a*L, wa1 wa3 wa2 , *<sup>x</sup>*<sup>L</sup> <sup>←</sup> acquire *<sup>c</sup>*S; *<sup>Q</sup><sup>x</sup>*<sup>L</sup> ) <sup>&</sup>lt;<sup>A</sup> proc(*b*L, <sup>w</sup>b1 wb3 wb2 , *<sup>P</sup> <sup>c</sup>*<sup>L</sup> )

*where* P c<sup>L</sup> *denotes a process term with an occurrence of channel* cL*.*

**Definition 5.3 (Synchronization Dependency — "Green Arrow").** *Given a well-formed and well-typed configuration* Ψ; Γ Λ; Θ :: Γ;Φ, Δ*, there exists a waiting-due-to-synchronization relation* S(Θ) *among linear processes in* Θ *at run-time such that*

$$\begin{split} & \mathsf{proc}(a\_{\mathsf{L}}, \,\mathsf{w}\_{\mathsf{a}\_{1}}\,\mathsf{\mathsf{J}}\_{\mathsf{w}\_{\mathsf{b}\_{2}}}^{\mathsf{w}\_{\mathsf{a}\_{3}}}, \,\langle b\_{\mathsf{l}}\rangle; Q) <\_{S} \mathsf{proc}(b\_{\mathsf{l}}, \,\mathsf{w}\_{\mathsf{b}\_{1}}\,\mathsf{\mathsf{J}}\_{\mathsf{w}\_{\mathsf{b}\_{2}}}^{\mathsf{w}\_{\mathsf{b}\_{3}}}, \,\langle \neg b\_{\mathsf{l}}\rangle; P) \\ & \mathsf{proc}(b\_{\mathsf{l}}, \,\mathsf{w}\_{\mathsf{b}\_{1}}\,\mathsf{\mathsf{J}}\_{\mathsf{w}\_{\mathsf{b}\_{2}}}^{\mathsf{w}\_{\mathsf{b}\_{3}}}, \,\langle b\_{\mathsf{l}}\rangle; P) <\_{S} \mathsf{proc}(a\_{\mathsf{l}}, \,\mathsf{w}\_{\mathsf{a}\_{1}}\,\mathsf{\mathsf{J}}\_{\mathsf{w}\_{\mathsf{b}\_{2}}}^{\mathsf{w}\_{\mathsf{b}\_{3}}}, \,\langle \neg b\_{\mathsf{l}}\rangle; Q\langle b\_{\mathsf{l}}\rangle) ). \end{split}$$

*where* P a<sup>L</sup> *denotes a process term with an occurrence of channel* bL*,* a ; P *a process term that currently executes an action along channel* a*, and* ¬a ; P *a process term whose currently executing action does not involve the channel* a*.*

It may be helpful to consult Fig. 3 at this point and note the semantic difference between the violet arrows in that figure and the red arrows discussed here. Whereas violet arrows point from the acquiring process to the resource being acquired, red arrows point from the acquiring process to the process that is holding the resource. Thus, violet arrows can go out of the tree, while red arrows stay within. Given the definitions of red and green arrows, we can define the relation W(Θ) on the configuration's tree, which contains all process pairs that are in some way waiting for each other:

**Definition 5.4 (Waiting Dependency).** *Given a well-formed and welltyped configuration* Ψ; Γ Λ; Θ :: Γ;Φ, Δ*, there exists a waiting relation* W(Θ) *among processes in* Θ *at run-time such that* proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup>*) <sup>&</sup>lt;<sup>W</sup> proc(*b*L, wb1 wb3 wb2 , *<sup>Q</sup>*)*,*

*– if* proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup>*) <sup>&</sup>lt;<sup>A</sup> proc(*b*L, <sup>w</sup>b1 wb3 wb2 , *<sup>Q</sup>*)*, or – if* proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup>*) <sup>&</sup>lt;<sup>S</sup> proc(*b*L, <sup>w</sup>b1 wb3 wb2 , *<sup>Q</sup>*)*.*

Having defined the relation W(Θ), we can now state the key lemma underlying our progress theorem, indicating that W(Θ) is acyclic in a well-formed and well-typed configuration.

**Lemma 5.5 (Acyclicity of** W(Θ)**).** *If* Ψ; Γ Λ; Θ :: Γ; Φ, Δ*, then* W(Θ) *is acyclic.*

We focus on explaining the main idea of the proof here. The proof proceeds by induction on Ψ; Γ Θ :: Φ, Δ, assuming for the non-empty case Ψ; Γ Θ, proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup>a*<sup>L</sup> ) :: (Φ, Δ, a<sup>L</sup> : <sup>A</sup>L[wa1 wa3 wa2 ]) that <sup>W</sup>(Θ) is acyclic, by the inductive hypothesis. We then know that there cannot exist any paths of green and red arrows in Θ that form a cycle, and we have to show that there is no way of introducing such a cyclic path by adding node proc(*a*L, wa1 wa3 wa2 , *<sup>P</sup><sup>a</sup>*<sup>L</sup> ) to the configuration Θ. In particular, the proof considers all possible new arrows that may be introduced by adding the node and that are necessary for creating a cycle, showing that such arrows cannot come about in a well-typed configuration.

We illustrate the reasoning for the two selected cases shown in Fig. 11. Case **(a)** represents a case in which process P<sup>a</sup><sup>L</sup> is waiting to synchronize with its child P<sup>b</sup><sup>L</sup> while holding a resource a descendant of P<sup>b</sup><sup>L</sup> or P<sup>b</sup><sup>L</sup> itself wants to acquire. However, this scenario cannot come about in a well-typed configuration because P<sup>a</sup><sup>L</sup> and P<sup>b</sup><sup>L</sup> are collaborators and thus cannot overlap in resources they acquire. Case **(b)** represents a case in which process P<sup>a</sup><sup>L</sup> is waiting to synchronize with its child P<sup>b</sup><sup>L</sup> while another child, process P<sup>c</sup><sup>L</sup> , is waiting to synchronize with P<sup>a</sup><sup>L</sup> . Given acyclicity of W(Θ), a necessary condition for a cycle to form is that there already must exist a red arrow **C** in the configuration that connects the subtrees in which the siblings P<sup>b</sup><sup>L</sup> and P<sup>c</sup><sup>L</sup> reside. However, this scenario cannot come about in a well-typed configuration because P<sup>b</sup><sup>L</sup> and P<sup>c</sup><sup>L</sup> are competitors, forcing P<sup>c</sup><sup>L</sup> or any of its descendant to release a resource before synchronizing with P<sup>a</sup><sup>L</sup> . These arguments are made precise in various lemmas in [4].

**Fig. 11.** Two prototypical cases in proof of acyclicty of W(Θ).

Given acyclicity of W(Θ), we can state and prove the following strong progress theorem. The theorem relies on the notion of a *poised* process, a process currently executing an action along its offering channel, and distinguishes a configuration only consisting of the top-level, linear "main" process from one that consists of several linear processes. We use |Θ| to denote the cardinality of Θ:

**Theorem 5.6 (Progress).** *If* Ψ; Γ Λ; Θ :: (Γ; c<sup>L</sup> : **1**[wc1 wc3 wc2 ])*, then either*

	- *if* |Θ| = 1*, then either* Λ; Θ −→ Λ ; Θ *, for some* Λ *and* Θ *, or* Θ *is poised, or*
	- *if* |Θ| > 1*, then* Λ; Θ −→ Λ ; Θ *, for some* Λ *and* Θ *.*

The theorem indicates that, as long as there exist at least two linear processes in the configuration, the configuration can always step. If the configuration only consists of the main process, then this process will become poised (i.e., ready to close), once all sub-computations are finished. The proof of the theorem relies on the acyclicity of W(Θ) and the fact that all sessions must be strictly equisynchronizing.

# **6 Additional Discussion**

**Linear Forwarding.** Our current formalization does not include linear forwarding because a forward changes the process tree and thus endangers the invariants imposed on it. This means that certain programs from the purely linear fragment may not type-check in our system. However, the correspondingly η-expanded versions of these programs should be expressible and type-checkable in SILLS<sup>+</sup> . As part of future work, we want to explore the addition of the linear forward

$$\frac{\Psi^{+} \vdash \omega\_{n} < \omega\_{u}}{\Psi; \; \Gamma; \; \because \; y\_{\mathsf{L}} : A\_{\mathsf{t}}[\omega\_{m} \uparrow\_{\omega\_{u}}^{\omega\_{v}}] \vdash \mathsf{fwd} \; x\_{\mathsf{t}} \; y\_{\mathsf{t}} :: \left(x\_{\mathsf{t}} : A\_{\mathsf{t}}[\omega\_{j} \uparrow\_{\omega\_{k}}^{\omega\_{n}}]\right)} \; \text{(\text{T-ID}\_{\mathsf{L}})}$$

which allows forwarding to processes that are known to not yet be aliased and whose world annotations meet the premise <sup>Ψ</sup> <sup>+</sup> <sup>ω</sup><sup>n</sup> < ωu. Restricting to processes in Δ should uphold Invariant 1, while the premise of the rule should uphold Invariant 2. However, this change will affect the inner working of the proofs, the use of inversion in particular, which might have far-reaching consequences that need to be carefully explored.

**Unbounded Process Networks and World Polymorphism.** The typing discipline presented in the previous sections, while rich enough to account for a wide range of interesting programs, cannot type programs that spawn a statically undetermined number of shared sessions that are then to be used. For instance, while we can easily type a configuration of any given number of dining philosophers (Sect. 3.3), we cannot type a recursive process in which the number of philosophers (and forks) is potentially unbounded (as done in [21,38]), due to the way worlds are created and propagated across processes.

The general issue lies in implementing a statically unbounded network of processes that interact with each other. These interactions require the processes to be spawned at different worlds which must be generated dynamically as needed. To interact with such a statically unknown number of processes uniformly, their offering channels must be stored in a list-like structure for later use. However, in our system, recursive types have to be invariant with respect to worlds. For instance, in a recursive type such as <sup>T</sup> <sup>=</sup> <sup>A</sup>L@ω<sup>l</sup> ω*<sup>r</sup>* <sup>ω</sup>*<sup>p</sup>* ⊗T, the worlds ωl, ωp, ω<sup>r</sup> are fixed in the unfoldings of T. Thus, we cannot type a world-heterogeneous list and cannot form such process networks.

Given that the issues preventing us from typing such unbounded networks lie in problems of world invariance, the natural solution is to explore some form of *world polymorphism*, where types can be parameterized by worlds which are instantiated at a later stage. Such techniques have been studied in the context of hybrid logical processes in [7] by considering session types of the form ∀δ.A and ∃δ.A, sessions that are parametric in the world variable δ, that is instantiated by a concrete reachable world at runtime. While their development cannot be mapped directly to our setting, it is a promising avenue of future work.

#### **7 Related Work**

**Behavioral Type Analysis of Deadlocks.** The addition of channel usage information to types in a concurrent, message-passing setting was pioneered by Kobayashi and Igarashi [30,34], who applied the idea to deadlock prevention in the π-calculus and later to more general properties [31,32], giving rise to a generic system that can be instantiated to produce a variety of concrete typing disciplines for the π-calculus (e.g., race detection, deadlock detection, etc.).

This line of work types π-calculus processes with a simplified form of *process* (akin to CCS [42] terms without name restriction) that characterizes the input/output behavior of processes. These types are augmented with abstract data that pertain to the relative ordering of channel actions, with the type system ensuring that the transitive closure of such orderings forms a strict partial order, ensuring deadlock-freedom (i.e., communication succeeds unless a process diverges). Building on this, Kobayashi et al. proposed type systems that ensure a stronger property dubbed lock-freedom [35] (i.e., communication always succeeds), and variants that are amenable to type inference [36,39]. Kobayashi [37] extended this latter system to more accurately account for recursive processes while preserving the existence of a type inference algorithm.

Our system draws significant inspiration from this line of work, insofar as we also equip types with abstract ordering data on certain communication actions, which is then statically enforced to form a strict partial order. We note that our SILLS<sup>+</sup> language differs sufficiently from the pure π-calculus in terms of its constructs and semantics to make the formulation of a direct comparison or an immediate application of their work unclear (e.g., [37] uses replication to encode recursive processes). Moreover, we integrate this style of order-based reasoning with both linear and shared session typing, which interact in non-trivial ways (especially in the presence of recursive types and recursive process definitions).

In terms of typability, enforcing session fidelity can be a double-edged sword: some examples of the works above can be transposed to SILLS<sup>+</sup> with mostly cosmetic changes and without making use of shared sessions (e.g., a parallel implementation of factorial that recurses via replication but always answers on a private channel); others are incompatible with linear sessions and require the use of shared sessions via the acquire-release discipline, which entails a more indirect but still arguably faithful modelling of the original π-calculus behavior; some examples, however, cannot be easily adapted to the shared session discipline (e.g., ∗c?(x, y).x?(z).y?(z) | ∗c?(x, y).y?(z).x?(z) is typable in [37], where x?(z) denotes input on x and ∗c?(x, y) denotes replicated input) and their transcription, while possible, would be too far removed from the original term to be deemed a faithful representation. Recursive processes are known to produce patterns that can be challenging to analyze using such order-based techniques. The work of [21,38] specializes Kobayashi's system to account for potentially unbounded process networks with non-trivial forms of sharing. Such systems are not typable in our work (see Sect. 6 for additional discussion on this topic).

The work of Padovani [44] develops techniques inspired by [35,37] to develop a typing system for deadlock (and lock) freedom for the linear π-calculus where (linear) channels must be used exactly once. By enforcing this form of linearity, the resulting system uses only one piece of ordering data per channel usage and can easily integrate a form of channel polymorphism that accounts for intricate cyclic interleavings of recursive processes. The combination of manifest sharing and linear session typing does not seem possible without the use of additional ordering data, and the lack of single-use linear channels make the robust channel polymorphism of [44] not feasible in our setting.

Dardha and Gay [15] recently integrated a system of Kobayashi-style orderings in a logical session π-calculus based on classical linear logic, extended with the ability to form *cyclic dependencies* of actions on *linear* session channels (Atkey et al. [1] study similar cycles but do not consider deadlock-freedom), without the need for new process constructs or an acquire-release discipline. Their work considers only a restricted form of replication common in linear logicbased works, not including recursive types nor recursive process definitions. This reduces the complexity of their system, at the cost of expressiveness. We also note that the cycles enabled by their system are produced by processes sharing multiple *linear* names. Since linearity is still enforced, they cannot represent the more general form of cycles that exploit shared channels, as we do.

A comparative study of session typing and Kobayashi-style systems in terms of sharing was developed by Dardha and P´erez [16], showing that such orderbased techniques can account for sharing in ways that are out of reach of both classical session typing and pure logic-based session typing. Our system (and that of [15]) aims to combine the heightened power of Kobayashi-style systems with the benefits of session typing, which seems to be better suited as a typing discipline for a high-level programming language [18].

**Progress and Session Typing.** To address limitations of classical binary session types, Honda et al. [27] introduced *multiparty* session types, where sessions are described by so-called global types that capture the interactions between an arbitrary number of session participants. Under some well-formedness constraints, global types can be used to ensure that a collection of processes correctly implements the global behavior in a deadlock-free way. However, these global type-based approaches do not ensure deadlock freedom in the presence of higher-order channel passing or interleaved multiparty sessions. Coppo et al. [13] and Bettini et al. [6] develop systems that track usage orders among interleaved multiparty sessions, ruling out cyclic dependencies that can lead to deadlocks. The resulting system is quite intricate, since it combines the full multiparty session theory with the order tracking mechanism, interacts negatively with recursion (essentially disallowing interleaving with recursion) and, by tracking order at the multiparty session-level, ends up rejecting various benign configurations that can be accounted for by our more fine-grained analysis. We also highlight the analyses of Vieira and Vasconcelos [54] and Padovani et al. [45] that are more powerful than the approaches above, at the cost of a more complex analysis based on conversation types [10] (themselves a partial-order based technique).

**Static Analysis of Concurrent Programs.** Lange et al. [40,41] develop a deadlock detection framework applied to the Go programming language. Their work distills CCS processes from programs which are then checked for deadlocks by a form of symbolic execution [40] and *model-checked* against modal μ-calculus formulae [41] which encode deadlock-freedom of the abstracted process (among other properties of interest). Their abstraction introduces some distance between the original program and the analysed process and so the analysis is sound only for certain restricted program fragments, excluding any combination of recursion and process spawning. Our direct approach does not suffer from this limitation.

de'Liguoro and Padovani [17] develop a typing discipline for deadlock-freedom in a setting where processes exchange messages via unordered mailboxes. Their calculus subsumes the actor model and their analysis combines both so-called mailbox types and specialized dependency graphs to track potential cycles between mailboxes in actor-based systems. The unordered nature of actor-based communication introduces significant differences wrt our work, which crucially exploits the ordering of exchanged messages.

#### **8 Concluding Remarks**

In this paper we have developed the concept of manifest deadlock-freedom in the context of the language SILLS<sup>+</sup> , a shared session-typed language, showcasing both the programming methodology and the expressiveness of our framework with a series of examples. Deadlock-freedom of well-typed programs is established by a novel abstraction of so-called green and red arrows to reason about the interdependencies between processes in terms of linear and shared channel references.

In future work, we plan to address some of the limitations of the interactions of deadlock-free shared sessions with recursion, by considering promising notions of world polymorphism and world communication. We also plan to study the problem of world inference and the inclusion of a linear forwarding construct.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Categorical Model of an i***/***o-typed** *π***-calculus**

Ken Sakayori(B) and Takeshi Tsukada

The University of Tokyo, Tokyo, Japan sakayori@kb.is.s.u-tokyo.ac.jp

**Abstract.** This paper introduces a new categorical structure that is a model of a variant of the **i**/**o**-typed π-calculus, in the same way that a cartesian closed category is a model of the λ-calculus. To the best of our knowledge, no categorical model has been given for the **i**/**o**-typed πcalculus, in contrast to session-typed calculi, to which corresponding logic and categorical structure were given. The categorical structure introduced in this paper has a simple definition, combining two well-known structures, namely, closed Freyd category and compact closed category. The former is a model of effectful computation in a general setting, and the latter describes connections via channels, which cause the effect we focus on in this paper. To demonstrate the relevance of the categorical model, we show by a semantic consideration that the π-calculus is equivalent to a core calculus of Concurrent ML.

**Keywords:** <sup>π</sup>-calculus · Categorical type theory · Compact closed category · Closed Freyd category

# **1 Introduction**

The Curry-Howard-Lambek correspondence reveals the trinity of the simplytyped λ-calculus, propositional intuitionistic logic and cartesian closed category. Via the correspondence, a type of the calculus can be seen as a formula of the logic, and as an object of a category; a term can be seen as a proof and as a morphism (see, e.g., [23]). Since its discovery, a number of variations have been proposed and studied.

In concurrency theory, a correspondence between a process calculus and logic was established by Caires, Pfenning and Toninho [8,9] and later by Wadler [48]. What they found is that session types [18,20] can be seen as formulas of linear logic [14], and processes as proofs. This remarkable result has inspired lots of work (e.g. [3,4,10,25,45,46]).

This correspondence is, however, not completely satisfactory as pointed out in [3,26], as well as by Wadler himself [48]. The session-typed calculi in [9,48] corresponding to linear logic have only well-behaved processes, because the session type systems guarantee deadlock-freedom and race-freedom of well-typed processes. This strong guarantee is often useful for programmers writing processes in the typed calculus, but can be seen as a significant limitation of expressive power. For example, it prevents us from modelling wild concurrent systems or programs that might fall into deadlocks or race conditions.

This paper describes an approach to a Curry-Howard-Lambek correspondence for concurrency in the presence of deadlocks and race conditions, from the viewpoint of categorical type theory.

**What Is the Categorical Model of the** *π***-calculus?** We focus on the πcalculus [30,31] in this paper. This is not only because the π-calculus is widely used and powerful, but also because of a classical result by Sangiorgi [39,42], which is the starting point of our development.

Sangiorgi, in the early 90s, gave translations between the conventional, firstorder π-calculus and its higher-order variant [39,42]. This translation allows us to regard the π-calculus as a higher-order programming language.

Let us review the observation by Sangiorgi, using a core of the asynchronous π-calculus: P ::= **0** | (P|Q) | a¯x | a(x).P. <sup>1</sup> The idea is to decompose the inputprefixing a(x).P into a and (x).P. Let us write a[(x).P] for a(x).P to emphasise the decomposition. Then a reduction can also be decomposed as

$$\bar{a} \langle x \rangle \mid a[(y).P] \mid Q \longrightarrow [(y).P] \\ \langle x \rangle \mid Q \longrightarrow P \\ \{x/y\} \mid Q,$$

where the first step is the communication and the second step is the β-reduction (i.e. (λy.P) x −→ P{x/y} in the λ-calculus notation). Hence we regard


Now, ignoring the mysterious operator a[−], what we had are the core operations of functional programming languages (i.e. abstraction and application). This functional programming language is effectful; in fact, communication via channels is a side effect.

This observation leads us to base our categorical model for the π-calculus on a model for effectful functional programs. Among several models, we choose *closed Freyd category* [37] for modelling the functional part.

Then what is the categorical counterpart of a[−]? As this operation seems responsible for communication, this question can be rephrased as: what is the categorical structure for communication? An observation by Abramsky et al. [2] answered this question. They pointed out the importance of *compact closed category* [21] in concurrency theory, which nicely describes CCS-like processes interconnected via ports.

By combining the two structures described above, this paper introduces a categorical structure, which we call *compact closed Freyd category*, as a categorical model of the π-calculus.<sup>2</sup> Despite its simplicity, compact closed Freyd

<sup>1</sup> This calculus slightly differs from the calculus we shall introduce in Sect. 2, but the differences are not important here.

<sup>2</sup> Here is the reason why we do not use a monad for modelling the effect: it is unclear for us how to integrate a monad with the compact closed structure. On the contrary, a Freyd category has a (pre)monoidal category as its component; we can simply require that it is compact closed.

category captures the strong expressive power of the π-calculus. The compact closed structure allows us to connect ports in an arbitrary way, in return for the possibility of deadlocks; the Freyd structure allows us to duplicate objects, and duplication of input channels introduces the possibility of race conditions.

**Reconstructing Calculi.** This paper introduces two calculi that are sound and complete with respect to the compact closed Freyd category model. One is a variant of the π-calculus, named π<sup>F</sup> ; the design of π<sup>F</sup> is based on the observations described above. The other is a higher-order programming language λ*ch* defined as an instance of the computational λ-calculus [33]. Designing λ*ch* is not so difficult because we can make use of the correspondence between computational λ-calculus and closed Freyd category (see Sect. 4). The λ*ch* -calculus have operations for creating a channel and for sending a value via the channel and, therefore, can be seen as a core calculus of *Concurrent ML* (or *CML*) [38].

Since the higher-order calculus λ*ch* and π<sup>F</sup> correspond to the same categorical model, we can obtain translations between these calculi by simple semantic computations. These translations are "correct by definition" and, interestingly, coincide with those between higher-order and first-order π-calculus [39,42].

**On** *β*- **vs.** *βη***-theories.** The categorical analysis of this paper reveals that many conventional behavioural equivalences for the π-calculus are problematic from a viewpoint of categorical type theory. The problem is that they induce only *semicategories*, which may not have identities for some objects. This is a reminiscent of the β-theory of the λ-calculus, of which categorical model is given by semi-categorical notions [16].

Adding a single rule (which we call the η*-rule*) resolves the problem. Our categorical type theory deals with only equivalences that admits the η-rule, and the simplicity of the theory of this paper essentially relies on the η-rule.

Interestingly the η-rule seems to explain some phenomenon in the literature. For example, Sangiorgi observed that a syntactic constraint called *locality* [28,49] is essential for his translation [39,42]. The correctness of the translation can be proved without using the η-rule, when one restricts the calculus local; we expect that Sangiorgi's observation can be related to this phenomenon.

**Contributions.** This paper introduces a new variant of the **i**/**o**-typed πcalculus, which we call π<sup>F</sup> . A remarkable feature of π<sup>F</sup> is that it has a categorical counterpart, called compact closed Freyd category. The correspondence is fairly firm; the categorical semantics is sound and complete, and the term model is the classifying category. The relevance of the model is demonstrated by a semantic reconstruction of Sangiorgi's translation [39,42]. These results open a new frontier in the Curry-Howard-Lambek correspondence for concurrency; session-type is not the only base for a Curry-Howard-Lambek correspondence for π-calculi.

*Organisation of this Paper.* Section 2 introduces the calculus π<sup>F</sup> and discuss equivalences on processes. Section 3 gives the categorical semantics of π<sup>F</sup> and shows soundness and completeness. A connection to a higher-order programming language with channels is studied in Sect. 4. In Sect. 5, we (1) discuss how our work relates to linear logic and (2) present some ideas for how to extend the application range of our model. We discuss related work in Sect. 6 and conclude in Sect. 7. Omitted proofs, as well as detailed definitions, are available in the full version.

# **2 A Polyadic, Asynchronous** *π***-calculus with i***/***o-types**

This section introduces a variant of π-calculus, named π<sup>F</sup> . It is based on a fairly standard calculus, namely polyadic and asynchronous π-calculus with **i**/**o**-types, but the details are carefully designed so that π<sup>F</sup> has a categorical model.

#### **2.1 The** *πF* **-calculus**

This subsection defines the calculus π<sup>F</sup> , which is based on an asynchronous variant of the polyadic π-calculus with **i**/**o**-types in [35]. The aim of this subsection is to explain what are the differences from the conventional π-calculus. Although π<sup>F</sup> has some uncommon features, each of them was studied in the literature; see Related Work (Sect. 6) for related ideas and calculi.

**Types.** The set of *types*, ranged over by S and T, is given by

$$\mathbf{h}^{\circ}S, T ::= \mathbf{ch}^{\circ}[T\_1, \dots, T\_n] \quad | \quad \mathbf{ch}^{i}[T\_1, \dots, T\_n] \qquad (n \ge 0).$$

The type **ch**<sup>o</sup> [T1,...,Tn] is for output channels sending n arguments of types <sup>T</sup>1,...,Tn. The type **ch**<sup>i</sup> [T1,...,Tn] is for input channels. The *dual* T <sup>⊥</sup> of type T is defined by **ch**<sup>o</sup> [T] <sup>⊥</sup> def = **ch**<sup>i</sup> [T] and **ch**<sup>i</sup> [T] <sup>⊥</sup> def = **ch**<sup>o</sup> [T]. For a sequence T def = T1,...,T<sup>n</sup> of types, we write T <sup>⊥</sup> for T <sup>⊥</sup> <sup>1</sup> ,...,T <sup>⊥</sup> <sup>n</sup> .

An important difference from [35] is that no channel allows both input and output operations. We will refer this feature of π<sup>F</sup> as **i**/**o***-separation*.

**Processes.** Let N be a denumerable set of *names*, ranged over by x, y and z. Each name is either input-only or output-only, because of **i**/**o**-separation.

The set of *processes*, ranged over by P, Q and R, is defined by

$$|P,Q,R\coloneqq \mathbf{0}|\ (P|Q)\ |\ (\nu\_{\mathbf{ch}^\circ[\![T]\!]}xy)P\ |\ x\langle \vec{y}\rangle\ |\ !x(\vec{y}).P.$$

The notion of *free names*, as well as *bound names*, is defined as usual. The set of free names (resp. bound names) of P is written as **fn**(P) (resp. **bn**(P)). We allow tacit renaming of bound names, and identify α-equivalent processes.

The meaning of the constructs should be clear, except for (*ν*<sup>T</sup> xy)P which is less common. The process **0** is the inaction; P | Q is a parallel composition; xy is an output; and !x(x).P is a replicated input. The restriction (*ν*<sup>T</sup> xy)P hides the names x and y of type T and T <sup>⊥</sup> and, at the same time, establishes a connection between x and y. Communication takes place only over bound names explicitly connected by *ν*. This is in contrast to the conventional π-calculus, in which input-output correspondence is *a priori* (i.e. ¯a is the output to a).

Γ - **0** : Γ - P : Γ - Q : Γ - P | Q : Γ, x : **ch**<sup>o</sup>[T], y : **ch**<sup>i</sup> [T] - P : Γ - (*ν***ch***o*[T-] xy)P : (x : **ch**<sup>i</sup> [T]) <sup>∈</sup> Γ Γ, y : T - P : Γ - !x(y).P : (<sup>x</sup> : **ch**<sup>o</sup>[T]) <sup>∈</sup> Γ y : T <sup>⊆</sup> <sup>Γ</sup> Γ xy :

**Fig. 1.** Typing rules for processes

The π<sup>F</sup> -calculus does not have non-replicated input x(y).P.

**Typing Rules.** A *type environment* Γ is a finite sequence of type bindings of the form x: T. We assume the names in Γ are pairwise distinct. If x = x1,...,x<sup>n</sup> and T <sup>=</sup> <sup>T</sup>1,...,Tn, we write x: T for <sup>x</sup><sup>1</sup> : <sup>T</sup>1,...,x<sup>n</sup> : <sup>T</sup>n. We write (x: T) <sup>⊆</sup> <sup>Γ</sup> to mean x<sup>i</sup> : T<sup>i</sup> ∈ Γ for every i.

A *type judgement* is of the form Γ P : , meaning that P is a well-typed process under Γ. The typing rules are listed in Fig. 1.

*Notation 1.* We define (*ν***ch***i*[T-] xy)P as (*ν***ch***o*[T-] yx)P; then (*ν*<sup>T</sup> xy)P is defined for every T. We abbreviate (*ν*<sup>T</sup><sup>1</sup> x1y1)...(*ν*<sup>T</sup>*<sup>n</sup>* xnyn)P as (*ν*T xy)P. We often omit type annotations and write (*ν*xy) for (*ν*<sup>T</sup> xy) and (*ν*xy) for (*ν*T xy). We use a and b for names of input channel types and ¯a and ¯b for output. Note that a and ¯a are connected only if they are bound by the same occurrence of *ν*. 

**Operational Semantics.** *Structural congruence*, written ≡, is the smallest congruence relation on processes that satisfies the following rules:

$$\begin{aligned} P \mid \mathbf{0} &\equiv P & P \mid Q \equiv Q \mid P & (P \mid Q) \mid R \equiv P \mid (Q \mid R) \\ (\nu xy)(P \mid Q) &\equiv ((\nu xy)P) \mid Q & (\nu wx)(\nu yz)P \equiv (\nu yz)(\nu wx)P \end{aligned}$$

where x, y /∈ **fn**(Q) in the fourth rule and w, x, y, z are distinct in the fifth rule.

The *reduction relation* on processes, written −→, is defined by the base rule

$$(\nu \vec{w} \vec{z})(\nu \bar{a} a)(!a(\vec{x}).P \mid \bar{a} \langle \vec{y} \rangle \mid Q) \longrightarrow (\nu \vec{w} \vec{z})(\nu \bar{a} a)(!a(\vec{x}).P \mid P \{ \vec{y}/\vec{x} \} \mid Q)$$

(where P{x/y} is the capture-avoiding substitution) and the structural rule which concludes P −→ Q from ∃P Q . P ≡ P −→ Q ≡ Q. Note that, unlike conventional π-calculi, communication only occurs over bound names connected by ν. We write −→<sup>∗</sup> for the reflexive and transitive closure of −→.

It should be clear that deadlocks and racy communications can be expressed in π<sup>F</sup> . An example of race is (*ν*aa¯ )(¯ay | !a(x).P | !a(x).Q), where two input actions are trying to consume the output regarded as a resource. A similar process (*ν*aa¯ )(!a(x).P |a¯y | a¯z) does not have a race since the receiver !a(x).P is replicated. In general, race conditions on output actions do not occur in π<sup>F</sup> .

#### **2.2 Equivalences on Processes**

To establish a Curry-Howard-Lambek correspondence is to find a nice algebraic or categorical structure of terms. For example, the original Curry-Howard-Lambek correspondence reveals the cartesian closed structure of λ-terms.

Such a nice structure would become visible only when appropriate notions of composition and of equivalence could be identified, such as substitution and βη-equivalence for the λ-calculus.

As for process calculi, so-called "parallel composition + hiding" paradigm [17] has been used to compose processes. Given typed processes

$$
\vec{x}:\vec{T},\ \vec{y}:\vec{S}\vdash P:\diamond \quad \text{and} \quad \vec{w}:\vec{S}^{\perp},\ \vec{u}:\vec{U}\vdash Q:\diamond,\,
$$

their composite via (y, w) is defined as

$$
\vec{x} : \vec{T}, \ \vec{u} : \vec{U} \vdash (\nu\_{\vec{S}} \vec{y} \vec{w}) (P \mid Q) : \diamond .
$$

This kind of composition appears quite often in logical studies of π-calculi [1, 5,19]. It also plays a central role in *interaction category paradigm* proposed by Abramsky, Gay and Nagarajan [2].

So it remains to determine an equivalence on π-calculus processes, appropriate for our purpose. This subsection approaches the problem from two directions:


Let us clarify the notion of equivalence discussed below. An *equation-incontext* is a judgement of the form Γ P = Q, where Γ P : and Γ Q : . An *equivalence* E is a set of equations-in-context that is reflexive, transitive and symmetric (e.g. (Γ P = P) ∈ E for every Γ P : ).

**Behavioural Equivalences.** As mentioned above, we are interested in the structure of π<sup>F</sup> -processes modulo existing behavioural equivalences. Among the various behavioural equivalence, we start with studying *barbed congruence* [32], which is one of the most widely used equivalences.

We define (asynchronous and weak) barbed congruence for π<sup>F</sup> . For each name a¯, we write P↓<sup>a</sup>¯ if P ≡ (*ν*xy)(¯az | Q) and ¯a is free, and P⇓<sup>a</sup>¯ if ∃Q. P −→<sup>∗</sup> Q↓<sup>a</sup>¯. A (Γ/Δ)*-context* is a context C such that Γ C[P] : for every Δ P : .

**Definition 1.** *A* barbed bisimulation *is a symmetric relation* R *on processes such that, whenever* P R Q*, (1)* P↓<sup>a</sup>¯ *implies* Q⇓<sup>a</sup>¯ *and (2)* P −→ P *implies* ∃Q .(Q −→<sup>∗</sup> Q ) ∧ (P R Q )*.* Barbed bisimilarity • ≈ *is the largest barbed bisimulation. Typed processes* Δ P : *and* Δ Q : *are* barbed congruent at Δ*, written* <sup>Δ</sup> <sup>P</sup> <sup>c</sup> Q*, if* C[P] • ≈ C[Q] *for every* (Γ/Δ)*-context* C*.* 

Let us consider a category-like structure C in which an object is a type and a morphism is an equivalence class of π<sup>F</sup> -processes modulo barbed congruence. More precisely, a morphism from T to S is a process x: T, y : S<sup>⊥</sup> P : modulo barbed congruence (and renaming of free names x and y). Then the composition (i.e. "parallel composition + hiding") is well-defined on equivalence classes, because barbed congruence is a congruence. This is a fairly natural setting.

We have a strikingly negative result.

# **Theorem 1.** C *is not a category.*

*Proof.* In every category, if f : A −→ A is a left-identity on A (i.e. f ◦ g = g for every <sup>g</sup> : <sup>A</sup> −→ <sup>A</sup>), then <sup>f</sup> is the identity on <sup>A</sup>. The process <sup>a</sup> : **ch**<sup>o</sup> [], ¯b : **ch**<sup>i</sup> [] !a(). ¯b- : seen as a morphism (**ch**<sup>o</sup> []) −→ (**ch**<sup>o</sup> []) is a left-identity but not the identity. The former means that c : **ch**<sup>o</sup> [], ¯b : **ch**<sup>i</sup> [] - (*ν*aa¯ )(!a(). ¯b- | P) c <sup>P</sup>{¯b/a¯} for every <sup>c</sup> : **ch**<sup>o</sup> [], a¯ : **ch**<sup>i</sup> [] P : , which is a consequence of the *replicator theorems* [35]. To prove the latter, observe that (*ν*¯bb)(!a(). ¯b- | **0**) and **0** are not barbed congruent. Indeed the context <sup>C</sup> def = (*ν*aa¯ )(¯a- | !a().o¯- | [ ]) distinguishes the processes, where ¯o is the observable. 

Note that race condition is essential for the proof, specifically, for the part proving that the process !a(). ¯b is not the identity. A race condition occurs in C[(*ν*¯bb)(!a(). ¯b-| **0**)], where ¯a in C has two receivers.

The process !a(). ¯b is called *forwarder*, and forwarders will play a central role in this paper. Its general form is <sup>a</sup><sup>→</sup> ¯<sup>b</sup> def = !a(x). ¯bx. When x : T and y : T <sup>⊥</sup>, we write x <sup>y</sup> to mean <sup>x</sup><sup>→</sup> <sup>y</sup> if <sup>T</sup> <sup>=</sup> **ch**<sup>i</sup> [S] and otherwise <sup>y</sup><sup>→</sup> <sup>x</sup>.

*Remark 1.* The argument in the proof of Theorem 1 is widely applicable to **i**/**o**typed calculi, not specific to π<sup>F</sup> . In particular, **i**/**o**-separation (i.e. absence of **ch**i/o[T]) is not the cause, but the existence of **ch**<sup>o</sup> [T] or **ch**<sup>i</sup> [T] is. 

*Remark 2.* Session-typed calculi in Caires, Pfenning and Toninho [8,9], which correspond to linear logic, do not seem to suffer from this problem. In our understanding, this is because of race-freedom of their calculi. 

To obtain a category, we should think of a coarser equivalence that identifies (*ν*¯bb)(!a(). ¯b- | **0**) with **0**. Such an equivalence should be very coarse; even *musttesting equivalence* [11] fails to equate them. As long as we have checked, only *may-testing equivalence* [11] defined below satisfies the requirement.

**Definition 2.** *Typed processes* Δ P : *and* Δ Q : *are* may-testing equivalent at Δ*, written* Δ P =*may* Q*, if* C[P]⇓<sup>a</sup>¯ ⇔ C[Q]⇓<sup>a</sup>¯ *for every* (Γ/Δ) *context* C *and name* a¯*.* 

As we shall see, π<sup>F</sup> -processes modulo may-testing equivalence behaves well. May-testing equivalence is, however, often too coarse.

**Category-Driven Approach.** In this approach, we first guess an appropriate categorical structure sufficient for interpreting π<sup>F</sup> , based on intuitions discussed in Introduction (see also Sect. 3.1), and then design an equivalence so that it is sound and complete with respect to the categorical semantics.

Figure 2 defines the equivalence, described as a set of rules. A π<sup>F</sup> *-theory* is an equivalence that behaves well from the categorical perspective.

$$\begin{array}{c} a \notin \text{fn}(P, C) \qquad \bar{a} \notin \text{bn}(C) \\ \hline \Gamma \vdash (\nu \bar{a} a)(!a(\overline{x}).P \mid C[\bar{a} \langle \bar{y} \rangle]) = (\nu \bar{a} a)(!a(\overline{x}).P \mid C[P \{ \bar{y} \langle \bar{x} \rangle \}]) \quad \text{(E-Bɛra)} \\ \hline a, \bar{a} \notin \text{fn}(P) \\ \hline \Gamma \vdash (\nu \bar{a} a) !a(\overline{y}).P = \mathbf{0} \quad \text{(E-CC)} \quad \frac{\bar{a}, a \notin \text{fn}(\bar{c} \langle \bar{x} \rangle)}{\Gamma \vdash \bar{c} \langle \bar{x} \rangle = (\nu \bar{a} a)(a \to \bar{b} \mid \bar{c} \langle \bar{x} \langle \bar{a} \rangle \bar{b} \rangle)} \text{ (E-F0urt)} \\ \cline{4-4} \\ \cline{4-4} \\ \cline{2-4} \\ P \equiv Q \\ \text{(\$P \equiv Q\$ \$ \$ \text{if } P\$)} \quad \text{(E-SCong)} \\ \hline \end{array} \begin{array}{c} \text{(E-BCTa)} \\ \hline \end{array} \text{(E-BCTa)}$$

**Fig. 2.** Inference rules of equations-in-context. Each rule has implicit assumptions that the both sides of the equation are well-typed processes.

**Definition 3.** *An equivalence* E *is a* π<sup>F</sup> -theory *if it is closed under the rules in Fig. 2. Any set Ax of equations-in-context has the minimum theory Th*(*Ax* ) *that contains Ax . We write Ax* Γ P = Q *if* (Γ P = Q) ∈ *Th*(*Ax* )*.* 

Let us examine each rule in Fig. 2.

The rule (E-Beta) should be compared with the reduction relation. When <sup>C</sup> = ([ ] <sup>|</sup> <sup>Q</sup>), then (E-Beta) claims

$$(\nu \bar{a}a)(!a(\vec{x}).P \mid \bar{a} \langle \vec{y} \rangle \mid Q) = (\nu \bar{a}a)(!a(\vec{x}).P \mid P\{\vec{y}/\vec{x}\} \mid Q).$$

provided that a /∈ **fn**(P, Q), which is indeed an instance of the reduction.

A significant difference from reduction is the side condition. It is essential in the presence of race conditions. Without the side condition, every π<sup>F</sup> -theory would be forced to contain the symmetric and transitive closure of the reduction relation; thus it would identify P | (*ν*aa¯ )(!a().P | !a().Q) with Q | (*ν*aa¯ )(!a().P | !a().Q) for every processes P and Q (where ¯a, a are fresh), because

$$\begin{array}{ccc} (\nu \bar{a}a)(\bar{a}\langle\rangle \,|\, !a().P \,|\, !a().Q) & \longrightarrow & P \,|\, (\nu \bar{a}a)(\!|a().P \,|\, !a().Q) \\ (\nu \bar{a}a)(\bar{a}\langle\rangle \,|\, !a().P \,|\, !a().Q) & \longrightarrow & Q \,|\, (\nu \bar{a}a)(\!|a().P \,|\, !a().Q). \end{array}$$

The side condition prevents π<sup>F</sup> -theories from collapsing.

Another, relatively minor, difference is that application of (E-Beta) is not limited to the contexts of the form [ ] | Q. This kind of extension can be found in, for example, work by Honda and Laurent [19] studying π-calculus from a logical perspective.

The rule (E-GC) runs "garbage-collection". Because no one can send a message to the hidden name a, the process !a(x).P will never be invoked and thus is safely discarded. This rule is sound with respect to many behavioural equivalences, including barbed congruence. Rules of this kind often appear in the literature studying logical aspects of concurrent calculi (as in Honda and Laurent [19] and Wadler [48]). There is, however, a subtle difference in the side condition: (E-GC) requires that a and ¯a do not appear at all in P.

The rule (E-FOut) can be seen as the η-rule of abstractions, as in the λcalculus and in the higher-order π-calculus [39]. In the latter, an output name ¯b can be identified with an abstraction (y). ¯by. Then we have, for example,

$$(\nu \bar{a}a)(a \longleftrightarrow \bar{b} \mid \bar{c} \langle \bar{a} \rangle) \, = \, (\nu \bar{a}a)(a \longleftrightarrow \bar{b} \mid \bar{c} \langle (\vec{y}).\bar{a} \langle \vec{y} \rangle \rangle) \, = \, \bar{c} \langle (\vec{y}).\bar{b} \langle \vec{y} \rangle \rangle \, = \, \bar{c} \langle \bar{b} \rangle$$

where we use (E-Beta) and (E-GC) in the second step. An important usage of (E-FOut) is to replace an output of free names with that of bound names. This kind of operation has been studied in [7,28] as a part of translations from the π-calculus to its local/internal fragments.<sup>3</sup>

The rule (E-Eta) requires the forwarders are left-identities, directly describing the requirement discussed above.<sup>4</sup>

The rules (E-SCong) and (E-Ctx) are easy to understand. The former requires that structurally congruent processes should be identified; the latter says that a π<sup>F</sup> -theory is a congruence.

These rules can be justified from the operational viewpoint, as well. A wellknown result on the **i**/**o**-typed π-calculus (see, e.g., [35,43]) shows the following propositions.

**Proposition 1.** *Barbed congruence is closed under all rules but* (E-Eta)*.* 

**Proposition 2.** *May-testing equivalence is a* π<sup>F</sup> *-theory.*

In particular, the latter means that may-testing equivalence is in the scope of the categorical framework of this paper; see Theorem 5.

# **3 Categorical Semantics**

This section introduces the class of *compact closed Freyd categories* and discusses the interpretation of the π<sup>F</sup> -calculus in the categories. We show that the categorical semantics is sound and complete with respect to the equational theory given in Sect. 2.2, and that the syntax of the π<sup>F</sup> -calculus induces a model.

This section, by its nature, is slightly theoretical compared with other sections. Section 3.1 explains the ideas of this section without heavily using categorical notions; the subsequent subsections require familiarity with categorical type theory.

#### **3.1 Overview**

As mentioned in Sect. 1, the categorical model of π<sup>F</sup> is *compact closed Freyd category*, which has both closed Freyd and compact closed structures. Here we

<sup>3</sup> Free outputs can be eliminated from π<sup>F</sup> -processes by using the rules (E-FOut) and (E-Eta), i.e. external mobility can be encoded by internal mobility [7,40]. If the calculus is local [28,49], then we do not need (E-Eta) to eliminate free outputs.

<sup>4</sup> A forwarder behaves as a right-identity with respect to every π<sup>F</sup> -theory. This is a consequence of rules (E-Beta), (E-GC) and (E-FOut).

informally discuss what is a compact closed Freyd category and how to interpret π<sup>F</sup> by using syntactic representation.

A *closed Freyd category* is a model of higher-order programs with side effects. It has, among others, the structures to interpret the function type A ⇒ B and its constructor and destructor, namely, abstraction λx.t and application t u. It also has a mechanism for unrestricted duplication of variables; in terms of logic, contraction is admissible.

A *compact closed category* can be seen as MLL [14] with the left rule:

$$\frac{\Gamma, A^\*, A \vdash I}{\Gamma \vdash I} \qquad \left[ \frac{\Gamma \vdash A^\* \qquad \Delta \vdash A}{\Gamma, \Delta \vdash I} \right].$$

(The right rule is the companion, which itself is derivable in MLL.)

A *compact closed Freyd category* has all the constructs. It has the structures corresponding to the following type constructors:

(closed Freyd) I,A ⊗ B,A ⇒ B (compact closed) I,A ⊗ B,A∗.

Note that the pair type A ⊗ B (as well as the unit I) coming from the closed Freyd structure is identified with that from the compact closed structure. Inference rules for a compact closed Freyd category is those for functional languages and the above rules of the compact closed structure.

Interpreting π<sup>F</sup> in a compact closed Freyd category is to interpret it by using these constructs. As mentioned in Sect. 1, following Sangiorgi [39], we regard


We interpret the output action by using the function application. Hence the type **ch**<sup>o</sup> [T] is regarded as a function type T ⇒ I (where the unit type I is the type for processes i.e. ); then the typing rule for output actions becomes

$$\frac{\Gamma, \bar{a} \colon (T \Rightarrow I), x \colon T \vdash \bar{a} \colon T \Rightarrow I \qquad \Gamma, \bar{a} \colon (T \Rightarrow I), x \colon T \vdash x \colon T}{\Gamma, \bar{a} \colon (T \Rightarrow I), x \colon T \vdash \bar{a} \langle x \rangle \colon I}$$

The type **ch**<sup>i</sup> [T] is understood as (T ⇒ I)∗; the input-prefixing rule becomes

$$\begin{array}{cc} \hline \begin{array}{c} \Gamma, a \mathrel{\mathop{:} (T \Rightarrow I)^{\*} \vdash a \mathrel{\mathop{:} (T \Rightarrow I)^{\*}}} \quad \begin{array}{c} \Gamma, a \mathrel{\mathop{:} (T \Rightarrow I)^{\*}, x \mathrel{\mathop{:} P \vdash P} : I \\ \Gamma, a \mathrel{\mathop{:} (T \Rightarrow I)^{\*} \vdash (x) . P \mathrel{\mathop{:} T \Rightarrow I}} \end{array} \\ \hline \end{array} \\ \begin{array}{c} \Gamma, a \mathrel{\mathop{:} (T \Rightarrow I)^{\*} \vdash (x) . P \mathrel{\mathop{:} T \Rightarrow I}} \end{array} \\ \hline \begin{array}{c} \Gamma, a \mathrel{\mathop{:} (T \Rightarrow I)^{\*} \vdash (x) . P \mathrel{\mathop{:} T \Rightarrow I}} \end{array} \\ \hline \end{array}$$

This derivation directly expresses the intuition that an input-prefixing is abstraction followed by allocation; here allocation is interpreted by using the compact closed structure, i.e. connection of ports. The name restriction also has a natural derivation:

$$\frac{\Gamma, a \colon (T \Rightarrow I)^\*, \bar{a} \colon (T \Rightarrow I) \vdash P : I}{\Gamma \vdash (\nu \bar{a} a) P : I}$$

#### **3.2 Compact Closed Freyd Category**

Let us formalise the ideas given in Sect. 3.1. Hereafter in this section, we assume basic knowledge of category theory and of categorical type theory.

We recall the definitions of compact closed category and closed Freyd category. For simplicity, the structures below are strict and chosen; a functor is required to preserve the chosen structures on the nose.

**Definition 4 (Compact closed category** [21]**).** *Let* (C, ⊗, I) *be a symmetric strict monoidal category. The* dual *of an object* A *in* C *is an object* A<sup>∗</sup> *equipped with* unit η<sup>A</sup> : I −→ A⊗A<sup>∗</sup> *and* counit <sup>A</sup> : A<sup>∗</sup>⊗A −→ I *that satisfy the "triangle identities"* (η<sup>A</sup> ⊗idA); (id<sup>A</sup> ⊗<sup>A</sup>) = id<sup>A</sup> *and* (id<sup>A</sup><sup>∗</sup> ⊗ηA); (<sup>A</sup> ⊗id<sup>A</sup><sup>∗</sup> ) = id<sup>A</sup><sup>∗</sup> *. The category* C *is* compact closed *if each object is equipped with a chosen dual.* 

**Definition 5 (Closed Freyd category** [37]**).** *A* Freyd category *is given by (1) a category with chosen finite products* (C, ⊗, I)*, called* value category*, (2) a symmetric strict monoidal category* (K, ⊗,I, **symm**)*, called* producer category*, and (3) an identity-on-object strict symmetric monoidal functor* J : C→K*. A Freyd category is a* closed Freyd category *if the functor* J(−) ⊗ A: C→K *has the (chosen) right adjoint* A ⇒ −: K→C *for every object* A*. We write* ΛA,B,C *for the natural bijection* K(J(A) ⊗ B,C) −→ C(A, B ⇒ C) *and* **eval**A,B *for* <sup>Λ</sup>−<sup>1</sup>(id<sup>A</sup>⇒<sup>B</sup>): (<sup>A</sup> <sup>⇒</sup> <sup>B</sup>) <sup>⊗</sup> <sup>A</sup> −→ <sup>B</sup> *in* <sup>K</sup>*.* 

*Remark 3.* The above definition is a restriction of the original one [37], in which K is a *premonoidal* [36] category. This change reflects concurrency of the calculus. In fact, it validates the following law, expressed by the syntax of the computational λ-calculus [33],

$$\text{let } x = M \text{ in } \text{let } y = N \text{ in } L \quad = \quad \text{let } y = N \text{ in } \text{let } x = M \text{ in } L.$$

Then one can evaluate M by using the left form and N by using the right form. This law allows us to evaluate M and N in arbitrary order, or concurrently. 

We now introduce the categorical structure corresponding to the π<sup>F</sup> -calculus.

**Definition 6 (Compact closed Freyd category).** *A* compact closed Freyd category *is a Freyd category* J : C −→ K *such that (1)* K *is compact closed, and (2)* J *has the (chosen) right adjoint* I ⇒ −: K→C*.* 

We shall often write J for a compact closed Freyd category J : C <sup>⊥</sup> K.

A compact closed Freyd category is a closed Freyd category:

$$\mathcal{K}(J(A)\otimes B,C)\cong\mathcal{K}(J(A),B^\*\otimes C)\cong\mathcal{C}(A,I\Rightarrow(B^\*\otimes C)).$$

*Example 1.* The most basic example of a compact closed Freyd category is (the strict monoidal version of) J : **Sets** <sup>⊥</sup> **Rel**: P. Here J is the identity-on-object functor that maps a function to its graph and P is the "power set functor"

$$\begin{aligned} \left[\mathsf{ch}^{\mathsf{h}}[T\_{1},\ldots,T\_{n}]\right] & \stackrel{\text{def}}{=} \left(\left[\left[T\_{1}\right]\otimes\cdots\otimes\left[T\_{n}\right]\right)\Rightarrow I\right)^{\*}\\ \left[\mathsf{ch}^{\mathsf{o}}[T\_{1},\ldots,T\_{n}]\right] & \stackrel{\text{def}}{=} \left(\left[T\_{1}\right]\otimes\cdots\otimes\left[T\_{n}\right]\right)\Rightarrow I\\ \left[\varGamma\vdash\mathbf{0}:\diamond\right] & \stackrel{\text{def}}{=} J(\varGamma)\\ \left[\varGamma\vdash\!a(\vec{x}).P:\diamond\right] & \stackrel{\text{def}}{=} J(\left\{\pi\_{a}^{\mathsf{T}},A\_{\varGamma,\mathcal{T},\mathcal{I}}(\left[\varGamma,\vec{x}:\mathcal{T}\vdash P:\diamond\right]\right\}); \mathsf{e}\_{\mathsf{ch}[\mathcal{T}]}\\ \left[\varGamma\vdash\!a(\vec{x}):\diamond\right] & \stackrel{\text{def}}{=} J(\left\{\pi\_{a}^{\mathsf{T}},\pi\_{x\_{1}}^{\mathsf{T}},\ldots,\pi\_{x\_{n}}^{\mathsf{T}}\right\}); \mathsf{eval}\_{\mathcal{I},I}\\ \left[\varGamma\vdash P\mid Q:\diamond\right] & \stackrel{\text{def}}{=} J(\varDelta\mathsf{r}); \left(\varPi\vdash P:\diamond\right)\otimes\left[\varGamma\vdash Q:\diamond\right] \end{aligned}$$

$$\left[\varGamma\vdash\!\nu(\nu\boldsymbol{x})P:\diamond\right] \stackrel{\text{def}}{=} \left(\operatorname{id}\_{\varGamma}\rhd\mathbb{Q}\right); \left[\varGamma,x:T,y:T^{\bot}\vdash P:\diamond\right]$$

**Fig. 3.** Interpretation of types and processes. Here !<sup>Γ</sup> , Δ<sup>Γ</sup> and π<sup>Γ</sup> <sup>y</sup> are maps in C induced by the cartesian structure, namely, !<sup>Γ</sup> : -Γ −→ I is the terminal map, Δ<sup>Γ</sup> : -Γ −→ -Γ ⊗ -Γ is the diagonal map and, when Γ = (y<sup>1</sup> : T1,...,y<sup>n</sup> : Tn) and x = y<sup>j</sup> , the morphism π<sup>Γ</sup> <sup>x</sup> : -Γ −→ -T<sup>j</sup> is the j-th projection. The interpretation of a type environment x<sup>1</sup> : T1,...,x<sup>n</sup> : T<sup>n</sup> is -T1 ⊗···⊗ -Tn.

that maps a relation R ⊆ <sup>A</sup> <sup>×</sup> <sup>B</sup> to a function <sup>P</sup>(R) def = {(SA, SB) | S<sup>B</sup> = {b | a ∈ SA, a R b}}. Another example is obtained by replacing sets with posets, functions with monotone functions and relations with downward closed relations. 

*Example 2.* A more sophisticated example is taken from Laird's game-semantic model of π-calculus [22]. Precisely speaking, the model in [22] itself is not compact closed Freyd, but its variant (with non-negative arenas) is. This model is important since it is fully abstract w.r.t. may-testing equivalence [22, Theorem 1]; hence our framework has a model that captures the may-testing equivalence. 

#### **3.3 Interpretation**

Given a compact closed Freyd category J : C <sup>⊥</sup> K, this section defines the interpretation -−<sup>J</sup> . It maps types and type environments to objects as usual, and a well-typed process Γ P : to a morphism -P : -Γ → I in K (recall that the tensor unit I is the interpretation of the type for processes).

Figure 3 defines the interpretation of types and processes. It simply formalises the ideas presented in Sect. 3.1: for example, the interpretation of !a(x).P is the abstraction Λ (from the closed Freyd structure) followed by location  (from the compact closed structure). There are some points worth noting.


*Example 3.* Let us consider y : T (*ν*aa¯ )(¯ay | !a(x).P) : , where ¯a, a, y /∈ **fn**(P) and a: **ch**<sup>i</sup> [T]. By (E-Beta) and (E-GC), this process is equal to <sup>P</sup>{y/x}. It is natural to expect that the interpretations of the two processes coincide; indeed it is. As the following calculation indicates, our semantics factorises the reduction into two steps: (1) the "transmission" of the closure λx.P by the triangle identity of the compact closed structure, and (2) the β-reduction modelled by **eval** of the closed Freyd structure:

$$\begin{split} & \left[ y:T \vdash (\nu \bar{a}a)(\bar{a} \langle y \rangle \mid !a(x).P): \diamond \right] \\ &= (\operatorname{id}\_{T} \otimes \eta\_{\operatorname{ch}^{o}[T]}); \left[ \big[ y:T, \bar{a} : \operatorname{ch}^{o}[T], a: \operatorname{ch}^{i}[T] \vdash \bar{a} \langle y \rangle \mid !a(x).P: \diamond \right] \\ &= (\operatorname{id} \otimes \eta); (\left[ y:T, \bar{a} : \operatorname{ch}^{o}[T] \vdash \bar{a} \langle y \rangle : \diamond \right] \otimes \left[ a: \operatorname{ch}^{i}[T] \vdash \operatorname{l}(a).P: \diamond \right]) \\ &= (\operatorname{id} \otimes \eta); ((\operatorname{symm}\_{T, \operatorname{ch}^{o}[T]}; \operatorname{eval}\_{T, I}) \otimes (\operatorname{id}\_{\operatorname{ch}[T]^{\*}} \otimes J(\Lambda[[x: T \vdash P : \diamond]))); \epsilon\_{T \Rightarrow I} \\ &= (\operatorname{id}\_{T} \otimes J(A([x: T \vdash P : \diamond]))); \operatorname{symm}\_{T, \operatorname{ch}^{o}[T]}; \operatorname{eval}\_{T, I} \quad (\operatorname{By triangle identity}) \\ &= (J(A([x: T \vdash P : \diamond])) \otimes \operatorname{id}\_{T}); \operatorname{eval}\_{T, I} \\ &= [x: T \vdash P] \\ &= [y: T \vdash P] \{ y \, x \} : \operatorname{c} \rceil. \end{split}$$

(Here we implicitly use derived rules for weakening and exchange.) 

*Example 4.* The interpretation of a forwarder a : **ch**<sup>i</sup> [T], ¯b : **ch**<sup>o</sup> [T] <sup>a</sup><sup>→</sup> ¯<sup>b</sup> : is the counit **ch***o*[T-] : **ch**<sup>o</sup> [T]<sup>∗</sup> <sup>⊗</sup> **ch**<sup>o</sup> [T] −→ <sup>I</sup> in <sup>K</sup>, which is the one-sided form of the identity. Recall that a forwarder is the identity in every π<sup>F</sup> -theory. 

The semantics is sound and complete. That means, a judgement *Ax* Γ P = Q is provable if and only if Γ P = Q is valid in all models J of *Ax* .

Here we define the related notions and prove soundness; completeness is the topic of the next subsection.

**Definition 7.** *An equational judgement* Γ P = Q *is* valid in J *if* -Γ P : <sup>J</sup> = -Γ Q : <sup>J</sup> *. Given a set Ax of non-logical axioms,* J *is a* model of *Ax , written* J |= *Ax , if it validates all judgements in Ax . We write Ax* Γ P = Q *if* Γ P = Q *is valid in every* J *such that* J |= *Ax .* 

**Theorem 2 (Soundness).** *If Ax* Γ P = Q*, then Ax* Γ P = Q*.* 

#### **3.4 Term Model**

A *term model* is a category whose objects are type environments and whose morphisms are terms (i.e. processes in this setting). This section gives a construction of the term model, by which we show completeness. This subsection basically follows the standard arguments in categorical type theory; we mainly focus on the features unique to our model, giving a sketch to the common part.

Given a set *Ax* of axioms, we define the term model J*Ax* : C*Ax* <sup>⊥</sup> K*Ax* , which we also write as *Cl*(*Ax* ).

The definition of the producer category K*Ax* follows the standard recipe. As usual, its objects are finite lists of types. The monoidal product T <sup>⊗</sup> S is the concatenation of the lists and the dual T <sup>∗</sup> is T <sup>⊥</sup>. Given objects T and S, a morphism from T to S is a process x: T, <sup>y</sup>: S<sup>⊥</sup> <sup>P</sup> : (modulo renaming of variables x and y). If *Ax* x: T, <sup>y</sup>: S<sup>⊥</sup> <sup>P</sup> <sup>=</sup> <sup>Q</sup> is provable, then <sup>P</sup> and Q are regarded as the same morphism. Composition of morphisms is defined as "parallel composition plus hiding": For morphisms <sup>P</sup> : T −→ S and <sup>Q</sup> : S −→ U , i.e. processes such that x: T, <sup>y</sup>: S<sup>⊥</sup> <sup>P</sup> : and z : S, <sup>w</sup>: U <sup>⊥</sup> <sup>Q</sup> : , their composite is x: T, <sup>w</sup>: U <sup>⊥</sup> (*ν*yz)(<sup>P</sup> <sup>|</sup> <sup>Q</sup>) : . The monoidal product <sup>P</sup> <sup>⊗</sup> <sup>Q</sup> of morphisms is the parallel composition P | Q. The identity, as well as the symmetry of the monoidal product and the unit and counit of the compact closed structure, is a parallel composition of forwarders: for example, the identity on S is x : S, <sup>y</sup>: S<sup>⊥</sup> <sup>x</sup><sup>1</sup> y<sup>1</sup> | ··· | x<sup>n</sup> <sup>y</sup><sup>n</sup> : where <sup>n</sup> is the length of S. The facts that most structural morphisms are forwarders and that forwarders compose are the keys to show that K*Ax* is a compact closed category.

We then see the definition of C*Ax* , of which the definition of morphisms has a subtle point. The objects of C*Ax* are by definition the same as K*Ax* , i.e. lists of types. The definition of morphisms relies on the notion of *values*. The values are defined by the grammar V ::= x | (x).P, where P is a process and (x).P is called an *abstraction*. Typing rules for values are as follows:

$$\frac{x:T\in\varGamma}{\varGamma\vdash x:T} \qquad \frac{\varGamma,\vec{x}:\vec{T}\vdash P}{\varGamma\vdash (\vec{x}).P:\mathbf{ch}^{o}[\vec{T}]}.$$

(To understand the right rule, recall that **ch**<sup>o</sup> [T] = -T <sup>⇒</sup> <sup>I</sup>.) A morphism from T to S = (S1,...,Sn) is an <sup>n</sup>-tuple (V1,...,Vn) of values of type x: T V<sup>i</sup> : S<sup>i</sup> for each i (modulo renaming of x). Composition is intuitively defined by "substitution followed by β-reduction" whose definition is omitted here.<sup>5</sup>

The functor J*Ax* places the values to the channels. For example, let T = (**ch**<sup>i</sup> [U1], **ch**<sup>o</sup> [U2]) and consider the morphism in C*Ax* given by

$$a: \mathbf{ch}^i[T\_1], \overline{b}: \mathbf{ch}^o[T\_2] \vdash (a, \overline{b}, (\overline{x}).P): (\mathbf{ch}^i[T\_1], \mathbf{ch}^o[T\_2], \mathbf{ch}^o[\overline{S}])$$

where S is the type for x. The image of this morphism by the functor J*Ax* is

$$a: \mathbf{ch}^i[T\_1], \overline{b}: \mathbf{ch}^o[T\_2], \overline{c}: \mathbf{ch}^o[T\_1], d: \mathbf{ch}^i[T\_2], e: \mathbf{ch}^i[\overline{S}] \vdash a \longleftrightarrow \overline{c} \mid d \hookrightarrow \overline{b} \mid !e(\overline{x}).P: \diamondsuit.$$

This example contains all the three ways to place a value to a given channel.

**Theorem 3.** *Cl*(*Ax* ) *is a compact closed Freyd category for every Ax .* 

In the model *Cl*(*Ax* ), the interpretation of a process Γ P : is the equivalence class that P belongs to. This fact leads to completeness.

<sup>5</sup> Here is a subtle technical issue that we shall not address in this paper; see the long version for the formal definition. We think, however, that this paragraph conveys a precise intuition.

#### **Theorem 4 (Completeness).** *If Ax* Γ P = Q*, then Ax* Γ P = Q*.*

**Theorem 5.** *There exists a compact closed Freyd category* J *that is fully abstract w.r.t. may-testing equivalence, i.e.* Γ P =*may* Q *iff* -P<sup>J</sup> = -Q<sup>J</sup> *.*

*Proof.* Let J be the term model *Cl*(=*may* ) and use Proposition 2.

#### **3.5 Theory/Model Correspondence**

It is natural to expect that *Cl*(*Ax* ) is the *classifying category* as in the standard categorical type theory. This means, to give a model of *Ax* in J is equivalent to give a structure-preserving functor *Cl*(*Ax* ) −→ J. This subsection clarifies and studies this claim.

The set Mod(*Ax* , J) of models of *Ax* in J is defined as follows. If J |= *Ax* , then Mod(*Ax* , J) is a singleton set<sup>6</sup>; otherwise Mod(*Ax* , J) is the empty set.

We then define the notion of structure-preserving functors.

**Definition 8.** *A* strict compact closed Freyd functor *from* J : C <sup>⊥</sup> K: I ⇒ (−) *to* J : C <sup>⊥</sup> K : I ⇒ (−) *is a pair of functor* (Φ, Ψ) *such that*


The collection of (small) compact closed Freyd categories and strict compact closed Freyd functors form a 1-category, which we write as *CCFC* .

Now the question is whether Mod(*Ax* , J) ? ∼= *CCFC*(*Cl*(*Ax* ), J) in **Set**.

Unfortunately this does not hold. More precisely, the left-to-right inclusion does not hold in general. This means that the term model satisfies some additional axioms reflecting some aspects of the π<sup>F</sup> -calculus.

The additional axioms reflect the definition of the dual T <sup>∗</sup> in the term model; we have T <sup>∗</sup> def <sup>=</sup> T <sup>⊥</sup> by definition, and thus T ∗∗ <sup>=</sup> T and (T <sup>⊗</sup> S)<sup>∗</sup> <sup>=</sup> T <sup>∗</sup> <sup>⊗</sup> S∗. It might be surprising that these equations are harmful because isomorphisms A∗∗ ∼= A and (A ⊗ B)<sup>∗</sup> ∼= A<sup>∗</sup> ⊗ B<sup>∗</sup> exist in every compact closed category. The point is that the equations also require C to have isomorphisms A∗∗ ∼= A and (A ⊗ B)<sup>∗</sup> ∼= A<sup>∗</sup> ⊗ B<sup>∗</sup> (witnessed by the respective identities).

We formally define the additional axioms, which we call **(I)** and **(D)**:

**(I)** The canonical isomorphism A∗∗ −→ A in K is the identity.

**(D)** The canonical isomorphism (A ⊗ B)<sup>∗</sup> −→ A<sup>∗</sup> ⊗ B<sup>∗</sup> in K is the identity.

**Theorem 6.** Mod(*Ax* , J) ∼= *CCFC*(*Cl*(*Ax* ), J) *if J satisfies (I) and (D).* 

 

<sup>6</sup> Because we consider only the empty signature, the set of valuations is singleton.

$$\begin{aligned} \sigma &::= \tau \to \tau' & \xi &::= \sigma & \tau &::= (\xi\_1, \dots, \xi\_n) & \xi &::= \cdot \cdot \mid \sigma \\ V &::= x \mid \lambda \langle \vec{x} \rangle.M & & V &::= \cdot \cdot \mid \text{channel}\_{\sigma} \mid \text{send}\_{\sigma} \\ M &::= \langle \vec{V} \rangle \mid V \; \langle \vec{V} \rangle \mid \text{let} \; \langle \vec{x} \rangle = M \, \text{in} \, M' \\ & & \text{(a) } \lambda\_c & & \text{(b) } \lambda\_{ch} \text{ (difference from } \lambda\_c\text{)} \end{aligned}$$

**Fig. 4.** Syntax of types and terms of the λc- and λ*ch* -calculi. The syntax of λ<sup>c</sup> is adapted to the setting of this paper.

### **4 A Concurrent** *λ***-calculus and (de)compilation**

In order to demonstrate the relevance of our semantic framework, this section tries to give a semantic reconstruction of fully-abstract compilation and decompilation from a higher-order calculus to the (first-order) π-calculus, such as [39,42]. We first design an instance of the computational λ-calculus [33], named λ*ch* , that is sound and complete with respect to compact closed Freyd categories. It is obtained by a straightforward extension of the coincidence between the computational λ-calculus and closed Freyd categories (Sect. 4.1). There are translations between π<sup>F</sup> and λ*ch* since both are sound and complete with respect to compact closed Freyd categories. Section 4.2 actually calculates the translations, and compare them with those in [39,42].

#### **4.1 The** *λch* **-calculus**

The λ*ch* -calculus is a computational λ-calculus with additional constructors dealing with channels. This section introduces and explains the calculus.

The situation is nicely expressed by the following intuitive equation:

$$\frac{\lambda\_{ch}}{\lambda\_c} \approx \frac{(\text{compact closed Freyd category} + \mathbf{I} + \mathbf{D})}{(\text{closed Freyd category})}.$$

The base calculus λ<sup>c</sup> is the *computational* λ*-calculus*, which corresponds to closed Freyd category [33,37]. It is a call-by-value higher-order programming language, given in Fig. 4(a). Our calculus λ*ch* is obtained by adding type and term constructors originating from the compact closed structure, which λ<sup>c</sup> does not have.

**Syntax.** As for types, λ*ch* has a new constructor coming from the dual object A∗. Normalising occurrences of the dual A<sup>∗</sup> using the axioms **(I)** A∗∗ = A and **(D)** (A ⊗ B)<sup>∗</sup> = A<sup>∗</sup> ⊗ B∗, we obtain the following grammar of types:

$$
\sigma ::= \tau \to \tau' \qquad \xi ::= \sigma \mid \sigma^\* \qquad \tau ::= (\xi\_1, \dots, \xi\_n).
$$

where n ≥ 0 and (ξ1,...,ξn) is an alternative notation for ξ1⊗···⊗ξn. Compared with λc, the only new type is the dual type σ<sup>∗</sup> of a function type σ.

As for terms, λ*ch* has constructors corresponding to the unit and counit

$$
\eta\_A: I \longrightarrow A \otimes A^\* \qquad \epsilon\_A: A^\* \otimes A \longrightarrow I \qquad \qquad \text{(for each object } A\text{)}$$

of the compact closed structure. We simply add these morphisms as constants:

$$\begin{array}{ccc} \overline{\Gamma \vdash \textbf{channel}\_{\sigma} : ( ) \to ( \sigma, \sigma^{\*} )} & \text{and} & \overline{\Gamma \vdash \textbf{send}\_{\sigma} : ( \sigma^{\*}, \sigma) \to ( )}^{\cdot} . \end{array}$$

We shall often omit the subscript σ.

In summary, we obtain the syntax of λ*ch* shown in Fig. 4. Interestingly, λ*ch* can be seen as a very core of Concurrent ML [38], a practical higher-order concurrent language, although λ*ch* is developed from purely semantic considerations.

**Semantics.** Let us first discuss the intuitive meanings of the new constructors. The type σ<sup>∗</sup> is for *output channels*; **channel** creates and returns a pair of an input channel and an output channel that are connected; and **send** α, V sends the value V via the output channel α. The following points are worth noting.


The first two points reflect the asynchrony of π<sup>F</sup> , and the last point reflects the absence of non-replicated input (cf. Sect. 4.2).

Based on this intuition, we develop the operational, axiomatic and categorical semantics of λ*ch* . We shall use the following abbreviations:

$$(\nu xy)M \stackrel{\text{def}}{=} \mathbf{let}\,\langle x,y\rangle = \mathbf{channel}\,\langle\rangle \,\text{in}\,M \qquad M \parallel N \stackrel{\text{def}}{=} \mathbf{let}\,\langle\rangle = M \,\mathbf{in}\,N.$$

*Operational Semantics.* Assume an infinite set X of *channels*, ranged over by α and β. For each channel α, we write α for the input name and ¯α for the output name, both of which are values. A *configuration* is a tuple (M, α, μ) of a term M, a sequence α of generated channels and a sequence μ of performed send operations, i.e. μ = (**send** <sup>β</sup>¯1, V1,..., **send** <sup>β</sup>¯k, V<sup>k</sup>). The *reduction relation* is defined by the following rules for channels

$$\begin{aligned} (E[\mathsf{channel}\,\langle\rangle], \,\vec{\alpha}, \,\mu) &\longrightarrow (E[\langle\beta, \vec{\beta}\rangle], \,\vec{\alpha}\cdot\beta, \,\mu) \\ (E[\mathsf{send}\,\langle\vec{\beta}, V\rangle], \,\vec{\alpha}, \,\mu) &\longrightarrow (E[\langle\rangle], \,\vec{\alpha}, \,\mu\cdot\mathsf{send}\,\langle\vec{\beta}, V\rangle) \\ (E[\beta V], \,\vec{\alpha}, \,\mu) &\longrightarrow (E[WV], \,\vec{\alpha}, \,\mu) \\ \end{aligned} \tag{\text{ $\mu$ }}$$

in addition to the standard rules for λ-abstractions and let-expressions, which change only M. Here the set of *evaluation contexts* is given by the grammar:

$$E ::= \left\lbrack \right\rbrack \mid \mathbf{let} \left\langle \vec{x} \right\rangle = E \text{ in } M \mid \mathbf{let} \left\langle \vec{x} \right\rangle = M \text{ in } E.$$

Note that M and N in **let** x = M **in** N are evaluated in parallel (cf. Remark 3). This justifies the notation M N, an abbreviation for **let** -= M **in** N.

*Axiomatic Semantics.* The inference rules of the equational logic for λ*ch* are those for λ<sup>c</sup> with the rule of concurrent evaluation

$$\text{let } \langle \vec{x} \rangle = M \text{ in } \text{let } \langle \vec{y} \rangle = N \text{ in } L \quad = \quad \text{let } \langle \vec{y} \rangle = N \text{ in } \text{let } \langle \vec{x} \rangle = M \text{ in } L;$$

the β- and η-rules for channels

$$\begin{array}{rcl} (\nu x \bar{x})(\mathsf{send} \,\langle \bar{x}, V \rangle \,\|\, M) &=& (\nu x \bar{x})(\mathsf{send} \,\langle \bar{x}, V \rangle \,\|\, M \{V/x\})\\ (\nu y \bar{y})(\mathsf{send} \,\langle \bar{z}, y \rangle \,\|\, N \rangle) &=& N \{\bar{z}/\bar{y}\} \end{array}$$

where ¯x /∈ **Fv**(V ) ∪ **Fv**(M), y /∈ **Fv**(N) and ¯z = ¯y; and a GC rule.

*Categorical Semantics.* One can interpret λ*ch* -terms in a compact closed Freyd category with **(I)** and **(D)**. The interpretation of the λc-calculus part is standard [24,37]; the constant **channel**<sup>σ</sup> (resp. **send**σ) is interpreted as the "closure" whose body is η<sup>σ</sup> (resp. <sup>σ</sup>) as expected.

$$\left[I \vdash \mathbf{channel}\_{\sigma} : () \to (\sigma, \sigma^{\*})\right] \stackrel{\text{def}}{=} J(!\_{I}; A\_{I, I, \sigma \otimes \sigma^{\*}}(\eta\_{\sigma})) $$
 
$$\left[I \vdash \mathbf{send}\_{\sigma} : (\sigma^{\*}, \sigma) \to ()\right] \stackrel{\text{def}}{=} J(!\_{I}; A\_{I, \sigma \otimes \sigma^{\*}, I}(\epsilon\_{\sigma})) .$$

The categorical semantics is sound and complete with respect to the equational theory of the λ*ch* -calculus. The proofs are basically straightforward but there is a subtle issue in the definition of the term model: we have different definitions of the right adjoint I ⇒ (−), which are of course equivalent but do not coincide on the nose. Our choice here is I ⇒ ξ def = (ξ⊥) <sup>→</sup> ().

#### **4.2 Translations Between** *λch* **and** *πF*

The higher-order calculus λ*ch* is equivalent to π<sup>F</sup> . This is because both calculi correspond to the same class of categories, namely, the class of compact closed Freyd categories with **(I)** and **(D)**, i.e.,

$$(\lambda\_{ch}) \approx \text{ (compact closed freely category } + \mathbf{I} + \mathbf{D}) \approx (\pi\_F).$$

This subsection studies translations derived from this semantic correspondence.

The translations are defined by the interpretations in the term models. For example, the translation − from λ*ch* to π<sup>F</sup> is induced by the interpretation of λ*ch* -terms in the term model *Cl*(∅). The interpretation -M*Cl*(∅) of a λ*ch* term M is an equivalence class of π<sup>F</sup> -processes, since a morphism in *Cl*(∅) is an equivalence class of π<sup>F</sup> -processes. The translation M is defined by choosing a representative of the equivalence class. The other direction [(−)] is obtained by the interpretation of π<sup>F</sup> in the term model of λ*ch* .

Figures 5 and 6 are concrete definitions of the translations for a natural choice of representatives. Let us discuss the translations in more details.

The translation from π<sup>F</sup> to λ*ch* (Fig. 5) is easy to understand. It directly expresses the higher-order view of the first-order π-calculus. For example, an

$$\left[\mathsf{ch}^{\mathsf{o}}[\vec{T}]\right] \stackrel{\mathsf{def}}{=} \left[\vec{T}\right] \to \left(\begin{array}{c} \mathsf{ch}^{\mathsf{i}}[\vec{T}]\right] \stackrel{\mathsf{def}}{=} \left([\vec{T}] \to ()\right)^{\*} \qquad \left[\left(T\_{1}, \ldots, T\_{n}\right)\right] \stackrel{\mathsf{def}}{=} \left([T\_{1}], \ldots, [T\_{n}]\right)$$

$$\left[\mathbf{0}\right] \stackrel{\mathsf{def}}{=} \langle\rangle \qquad \left[P \mid Q\right] \stackrel{\mathsf{def}}{=} \left[P\right] \parallel \left[Q\right] \qquad \left[\left(\nu xy\right)P\right] \stackrel{\mathsf{def}}{=} \left(\nu xy\right)\left[P\right]$$

$$\left[\vec{a}\left(\vec{x}\right)\right] \stackrel{\mathsf{def}}{=} \vec{a}\left\langle\vec{x}\right\rangle \qquad \left[!a(\vec{x}).P\right] \stackrel{\mathsf{def}}{=} \mathsf{send}\langle a, \lambda(\vec{x}).\{P\}\rangle$$

#### **Fig. 5.** Translation from π<sup>F</sup> to λ*ch*

$$\begin{split} \{\tau\_{1} \rightarrow \tau\_{2}\} \stackrel{\text{def}}{=} \mathsf{ch}^{o}[\{\tau\_{1}\}, \{\tau\_{2}\}^{\perp}] \\_ \qquad \{\sigma^{\*}\} \stackrel{\text{def}}{=} \{\sigma\}^{\perp} \qquad \{\langle\tau\_{1}, \ldots, \tau\_{n}\rangle\} \stackrel{\text{def}}{=} \{\langle\tau\_{1}\rangle, \ldots, \langle\tau\_{n}\rangle\} \\ \{x\}\_{p} \stackrel{\text{def}}{=} \langle p \hookrightarrow x\rangle \qquad \{\lambda\vec{x}.M\}\_{p} \stackrel{\text{def}}{=} \langle p(\vec{x},\vec{q}).\{M\}\_{\vec{q}} \qquad \{\langle\vec{V}\rangle\}\_{p} \stackrel{\text{def}}{=} \langle V\_{1}\rangle\_{p\_{1}} \mid \dashv \mid \langle V\_{n}\rangle\_{p\_{n}} \\ \{V\left\langle\vec{W}\right\rangle\}\_{\vec{p}} \stackrel{\text{def}}{=} \langle \nu a\overline{a}\rangle \langle\nu \vec{r}\vec{s}\rangle \langle\langle V\rangle\_{a} \mid \langle\langle\vec{W}\rangle\rangle\_{\vec{s}} \mid \vec{a}\langle\vec{r},\vec{p}\rangle\rangle \\ \qquad \qquad \langle\text{let }\wedge\rangle = M\,\text{in}\,\mathsf{N}\rangle\_{\vec{p}} \stackrel{\text{def}}{=} \langle \nu\vec{x}\vec{q}\rangle \langle\langle M\rangle\_{\vec{q}} \mid \langle\!N\rangle\_{\vec{s}}\rangle \\ \quad \quad \langle\text{channel}\rangle\_{\mathbb{P}} \stackrel{\text{def}}{=} !p(x,y).x \hookrightarrow y \qquad \{\mathsf{s}\textbf{end}\}\_{p} \stackrel{\text{def}}{=} !p(x,y).x \hookrightarrow y \end{split}$$

**Fig. 6.** Translation from λ*ch* to π<sup>F</sup>

output action is mapped to an application and an input-prefixing !a(x).P to a send operation of the value λx.P via the channel a.

An interesting (and perhaps confusing) phenomenon is that an input channel in π<sup>F</sup> is mapped to an output channel in λ*ch* . This can be explained as follows. In the name-passing viewpoint, the reduction

$$(\nu xy)(!y(\vec{z}).P \mid x\langle \vec{u} \rangle) \quad \longrightarrow \quad (\nu xy)(!y(\vec{z}).P \mid P\{\vec{u}/\vec{z}\})$$

sends u to the process !y(z).P, and thus x is output and y is input. In the process-passing viewpoint, the abstraction (z).P is sent to the location of x, and thus y is the output and x is the input.

Next, we explain the translation from λ*ch* to π<sup>F</sup> (Fig. 6).

Let us first examine the translation of types. The most non-trivial part is the translation of a function type τ<sup>1</sup> → τ2. A key to understand the translation is the isomorphism τ<sup>1</sup> → τ<sup>2</sup> ∼= τ<sup>1</sup> ⊗ τ <sup>⊥</sup> <sup>2</sup> → (). The latter form of function type corresponds to an output channel type in π<sup>F</sup> . Hence a function is understood as a process additionally taking channels to which the return values are passed.

The translation Mp of a λ*ch* -term Γ M : (ξ1,...,ξn) takes extra parameters p = p1,...,p<sup>n</sup> to which the values should be placed. This is a consequence of the definition in the <sup>π</sup><sup>F</sup> -term model that a morphism T −→ S is a process x: T, <sup>y</sup>: S<sup>⊥</sup> <sup>P</sup> : . Here p corresponds to y, <sup>Γ</sup> to x: T and ξ to S.

Now it is not so difficult to understand the interpretations of constructs in the λc-calculus. For example, the abstraction λx.M<sup>p</sup> is mapped to an abstraction (x, q).Mq placed at p, which takes additional channels q to which the results of the evaluation of M should be sent.

It might be surprising that the interpretations of **channel** and **send** coincide. This is because of the one-sided formulation of π<sup>F</sup> . In the two-sided formulation, the unit η and counit  of the compact closed structure, corresponding to **channel** and **send**, can be written as logical inference rules

$$\begin{array}{llll} \langle \mathbf{0} \rangle \stackrel{\text{def}}{=} \mathbf{0} & \langle P \mid Q \rangle \stackrel{\text{def}}{=} \langle P \rangle \mid \langle Q \rangle & \langle (\nu xy)P \rangle \stackrel{\text{def}}{=} (\nu xy) \langle P \rangle & \langle !x \, v \rangle \stackrel{\text{def}}{=} \langle v \rangle\_{x} \\\\ \langle v \langle w\_{1}, \dots, w\_{n} \rangle \rangle & \stackrel{\text{def}}{=} (\nu \bar{a} a) (\nu \bar{b}\_{1} b\_{1}) \dots (\nu \bar{b}\_{n} b\_{n}) (\langle v \rangle\_{a} \mid \langle w\_{1} \rangle\_{b\_{1}} \mid \cdot \cdot \cdot \mid \langle w\_{n} \rangle\_{b\_{n}} \mid \bar{a} \langle \bar{b}\_{1}, \dots, \bar{b}\_{n} \rangle) \\\\ & & \langle x \rangle\_{a} \stackrel{\text{def}}{=} (a \leftrightarrow x) \qquad \langle (\bar{x}) . P \rangle\_{a} \stackrel{\text{def}}{=} ! a \langle \bar{x} \rangle . \langle P \rangle \end{array}$$

**Fig. 7.** Translation from AHOπ to π<sup>F</sup>

$$\frac{\Gamma, A, A^\perp \vdash \Delta}{\Gamma \vdash \Delta} \qquad \text{and} \qquad \frac{\Gamma \vdash A^\perp, A, \Delta}{\Gamma \vdash \Delta},$$

which are different. In the one-sided formulation, however, they become

$$\frac{\Gamma, A, A^\perp, \Delta^\perp \vdash}{\Gamma, \Delta^\perp \vdash}.$$

Hence η and  (or **channel** and **send**) cannot be distinguished in π<sup>F</sup> .

The translation − must be the inverse of [(−)] because both the term models are the initial compact closed Freyd category with **(I)** and **(D)**. That means, ∅ Γ P = [(P)] and ∅ Γ M = [(M)] are provable for every P and M. This result is independent of the choice of representatives.

#### **4.3 Relation to Other Calculi and Translations**

A number of higher-order concurrent calculi, as well as their translations to the first-order π-calculus, have been proposed and studied (e.g. [29,39,40,42,45,47]). The calculus λ*ch* and the translations have a lot of ideas in common with those calculi and translations; see Sect. 6.

This subsection mainly discusses the relationship to the translations by Sangiorgi [42] (see also [43]) between *asynchronous higher-order* π*-calculus* (*AHO*π for short) and *asynchronous local* π*-calculus* (*L*π for short). Here we focus on this work because it is closest to ours. We shall see that our semantic or categorical development provides us with a semantic reconstruction of Sangiorgi's translations, as well as an extension.

A variant of AHOπ can be seen as a fragment of λ*ch* . The syntax of processes of AHOπ and representation by λ*ch* -terms are given as follow:

$$\begin{aligned} \left| v, w ::= x \mid \left( \vec{x} \right) P \; \; P, Q \; ::= \mathbf{0} \mid \left( P \; \mid Q \right) \mid \left( \nu xy \right) P \; \mid \quad \left\vert x \; v \right\vert \; \mid \; v \langle \vec{w} \rangle \\\ x \; \; \; \lambda \langle \vec{x} \rangle \; P \; \; \qquad \left\langle \right| \; \; \; \; P \; \parallel Q \; \; \quad \left( \nu xy \right) P \; \; \; \mathsf{send} \; \langle x, v \rangle \; \; \; v \; \langle \vec{w} \rangle \;. \end{aligned}$$

(It slightly differs from the original syntax, as *ν* binds a pair of names.)

This fragment is nicely described as the limitation on types:

$$
\sigma ::= (\vec{\sigma}) \to () \qquad \xi ::= \sigma \mid \sigma^\* \qquad \tau ::= ().
$$

Recall that σ is a type for abstractions, ξ is a type for variables, and τ is a type for terms. This limitation means that (1) an abstraction cannot take a channel as an argument, and (2) a term M must be of the unit type, i.e. a process.

Once regarding AHOπ as a fragment of λ*ch* , the translation from AHOπ to π<sup>F</sup> is obtained by restricting − to AHOπ. The resulting translation is in Fig. 7. As mentioned, the translation is the same as that of Sangiorgi [42] except for minor differences due to the slight change of the syntax.

Sangiorgi also gave a translation in the opposite direction, from Lπ to AHOπ in the same paper. The calculus Lπ is a fragment of the π-calculus in which only output channels can be passed. The **i**/**o**-separation of π<sup>F</sup> allows us to characterise the local version of π<sup>F</sup> by a limitation on types. In the local variant, the output channel type is restricted to T ::= **ch**<sup>o</sup> [T], expressing that only output channels can be passed via an output channel. Then the definition of type environment should be changed accordingly: Γ ::= · | x : T | x : T <sup>⊥</sup> (since the syntactic class represented by T is not closed under the dual (−)<sup>⊥</sup> in the local setting).

Interestingly the limitation on types in AHOπ coincides with that in Lπ, when one identify **ch**<sup>o</sup> [T] with (T) <sup>→</sup> () (as we have done in many places). In other words, the syntactic restrictions of AHOπ and Lπ are the same semantic conditions described in different syntax. As a consequence, the image of Lπ by [(−)] is indeed in AHOπ.

*Remark 4.* There is, however, a notable difference from Sangiorgi's work [42]. Sangiorgi proved that the translation is fully-abstract with respect to barbed congruence; in contrast, we only show that M = N iff M = N. In particular, the η-rule is inevitable for our argument. The presence of the ηrules significantly simplifies the argument, at the cost of operational justification (recall that the η-rule is not sound with respect to barbed congruence).

It is natural to ask how one can reconstruct the full-abstraction result with respect to barbed congruence. An interesting observation is that, if M and N are AHOπ processes, then M = N iff M = N, where means provability without using η-rules. We expect that this semantic observation explains why locality is essential as noted in [42]; we leave the details for future work. 

# **5 Discussions**

**Connection to Logics.** We have so far studied a connection between compact closed Freyd category and π-calculus. Here we briefly discuss the missing piece of the Curry-Howard-Lambek correspondence, namely logic.

The model of this paper is closely related to linear logic. Actually, every compact closed Freyd category is a model of linear logic (more precisely, MELL), as an instance of linear-non-linear model [6] (see, e.g., [27] for categorical models of linear logic). The interpretation of formulas is shown in Table 1. It differs from the translations by Abramsky [1] and Bellin and Scott [5] and from the Curry-Howard correspondence for session types by Caires and Pfenning [8], but resembles the connection between a variant of local π-calculus and a polarised linear logic by Honda and Laurent [19]; a detailed analysis of the translation is left for future work.

The logic corresponding to compact closed Freyd category should be a proper extension of linear logic, since compact closed Freyd categories form a proper


**Table 1.** The categorical and π<sup>F</sup> -calculus interpretations of MELL formulas

subclass of linear-non-linear models. For example, the following rules are invalid in linear logic but admissible in compact closed Freyd categories:

$$\begin{array}{ccc} \vdash \Gamma & \vdash \Delta \\ \hline \vdash \Gamma, \Delta \end{array} \quad \begin{array}{ccc} \vdash \Gamma, A, B & \vdash \Delta, A^\perp, B^\perp \\ \vdash \Gamma, \Delta \end{array} \quad \begin{array}{ccc} \vdash \Gamma, A, A^\perp \\ \vdash \Gamma \end{array}.$$

These rules, especially the second rule called *multicut*, were often studied in concurrency theory; see Abramsky et al. [2] for their relevance to concurrency.

Do the above rules fill the gap between linear logic and compact closed Freyd category? Recent work by Hasegawa [15] suggests that MELL with above rules is still weaker than compact closed Freyd category. First observe that the above rules can be interpreted in any linear-non-linear model of which the monoidal category is compact closed. Hasegawa showed that a linear-non-linear model whose monoidal category is compact closed induces a closed Freyd category of which the monoidal category is *traced* (and vice versa) but the induced Freyd category is not necessarily compact closed. Hence the logic corresponding to compact closed Freyd category has further axioms or rules in addition to the above ones. A reasonable candidate for the additional axiom is ! ∼= ?; interestingly, Atkey et al. [3] reached a similar rule from a different perspective. Further investigation is left for future work.

**Non-empty Signature.** The categorical type theory for the λ-calculus considers a family parameterised by *signatures*, consisting of atomic types and constants. It covers, for example, the λ-calculus with natural number type and arithmetic constants (such as addition and multiplication), as well as a calculus with integer reference type and read and update functions.

Although this paper only considers the calculus with the empty signature, which has no additional type nor constant, extending our theory to handle nonempty signatures is, in a sense, not difficult. The easiest way is to apply the established theory of the computational λ-calculus [33,37]. As we have seen in Sect. 4, the π<sup>F</sup> -calculus can be seen as a computational λ-calculus λ*ch* having constants for manipulating channels; hence the π<sup>F</sup> -calculus with additional constants is λ*ch* with the additional constants, which is still in the family of computational λ-calculus.

The π<sup>F</sup> -calculus with non-empty signature has several applications. We shall briefly discuss some of them.

An important example of π<sup>F</sup> with non-empty signature is the calculus with non-replicated input, which we regard as a calculus with additional "process constants" but without any additional type. A key observation is that every non-replicated input process a(x).P can be expressed as

$$a(\vec{x}).P \cong^c (\nu \bar{b}b)(a(\vec{x}).\bar{b}\langle \vec{x} \rangle \mid b(\vec{x}).P) \qquad (\cong^c \text{ is weak barbed congruence})$$

and thus it suffices to deal with non-replicated input processes in special form, namely a : **ch**<sup>i</sup> [T], ¯b : **ch**<sup>o</sup> [T] <sup>a</sup>(x). ¯bx : . Adding these processes as constants and the computational rules of a(x). ¯bx as equational axioms results in a calculus with non-replicated inputs. The categorical model is a compact closed Freyd category with distinguished morphisms (A ⇒ I) −→ (A ⇒ I) for each object A which satisfy certain axioms.

This technique is applicable to synchronous output as well. Because

$$
\bar{a} \langle \vec{x} \rangle.P \cong^c (\nu \bar{b} b)(\bar{a} \langle \vec{x} \rangle.\bar{b} \langle \rangle \mid !b ().P),
$$

it suffices to consider constants representing ¯a: **ch**<sup>o</sup> [T], x: T , ¯b : **ch**<sup>o</sup> [] a¯x. ¯b-: .

# **6 Related Work**

**Logical Studies of** *π***-calculi.** There is a considerable amount of studies on connections between process calculi and linear logic. Here we divide these studies into two classes. These classes are substantially different; for example, one regards the formula A ⊗ B as a type for processes with two "ports" of type A and B, whereas the other as the session-type !A.B. Our work is more closely related to the former than the latter, but some interesting coincidence to the latter kind of studies can also be found.

The former class of research dates back to the work by Abramsky [1] and Bellin and Scott [5], where they discovered that π-calculus processes can encode proof-nets of classical linear logic. Later, Abramsky et al. [2] introduced the *interaction categories* to give a semantic description of a CCS-like process calculus. In their work, they observed that the compact closed structure is important to capture the strong expressive power of process calculi.

A tighter connection between π-calculus and proof-nets was recently presented by Honda and Laurent [19]. They showed that an **i**/**o**-typed π-calculus corresponds to *polarised proof-nets*, and introduced the notion of *extended reduction* for the π-calculus to simulate cut-elimination. The π-calculus used in this work is very similar to π<sup>F</sup> in terms of syntax and reduction. Their calculus is asynchronous, does not allow non-replicated inputs, and requires **i**/**o**-separation. Furthermore, the extended reduction is almost the same as the rules (E-Beta) and (E-GC) except for the side conditions. A significant difference compared to our work is that their calculus is *local* [28,49], reflecting the fact that the corresponding logic is polarised.

Our work is inspired by these studies. The idea of **i**/**o**-separation can already be found in the work by Bellin and Scott and the use of compact closed category is motivated by the study of interaction category. It is worth mentioning here that the design of π<sup>F</sup> is also influenced by the calculus introduced by Laird [22], although it is not a logical study but categorical (see below).

The latter approach started with the Curry-Howard correspondences between session-typed π-calculi and linear logic established by Caires, Pfenning and Toninho [8,9] and subsequently by Wadler [48]. These correspondences are exact in the sense that every process has a corresponding proof, and vice versa. As a consequence, processes of the calculi inherit good properties of linear logic proofs such as termination and confluence of cut-elimination. In terms of process calculi, process of these calculi do not fall into deadlock or race condition. This can be seen as a serious restriction of expressive power [3,26,48].

Several extensions to increase the expressiveness of these calculi have been proposed and studied. Interestingly, ideas behind some of these extensions are related to our work, in particular to Sect. 5 discussing the multicut rule [2] and the axiom ! ∼= ?. Atkey et al. [3] studied *CP* [48] with the multicut rule and ! ∼= ? and discussed how these extensions increase the expressiveness of the calculus, at the cost of losing some good properties of CP. Dardha and Gay [10] studied another extension of CP with multicut, keeping the calculus deadlock-free by an elaborated type system.

Balzer and Pfenning [4] proposed a session-typed calculus with shared (mutable) resources, inspired by linear-non-linear adjunction [6].

**Categorical Semantics of** *π***-calculi.** The idea of using a closed Freyd category to model the π-calculus is strongly inspired by Laird [22]. He introduced the *distributive-closed Freyd category* to describe abstract properties of a gamesemantic model of the asynchronous π-calculus and showed that distributiveclosed Freyd categories with some additional structures suffice to interpret the asynchronous π-calculus. The additional structures are specific to his game model and not completely axiomatised.<sup>7</sup> Our notion of compact closed Freyd category might be seen as a reformulation of his idea, obtained by filtering out some structures difficult to axiomatise and by strengthening some others to make axioms simpler. A significant difference is that our categorical model does not deal with non-replicated inputs, which we think is essential for a simple axiomatisation.

Another approach for categorical semantics of the π-calculus has been the presheaf based approach [12,44]. These studies gave particular categories that nicely handles the nominal aspects of the π-calculus; these studies, however, do not aim for a correspondence between a categorical structure and the π-calculus.

**Higher-Order Calculi with Channels.** Besides the λ*ch* -calculus, there are numbers of functional languages augmented by communication channels, from theoretical ones [13,25,46,48] to practical languages [34,38].

On the practical side, Concurrent ML (CML) [38], among others, is a welldeveloped higher-order concurrent language. CML has primitives to create channels and threads, and primitives to send and accept values through channels.

<sup>7</sup> A list of properties in [22] does not seem to be complete. We could not prove some claims in the paper only from these properties, but with ones specific to his model.

Since our λ*ch* -calculus can create (non-linear) channels and send values via channels, the λ*ch* -calculus can be seen as a core calculus of CML despite its origin in categorical semantics. The major difference between CML and the λ*ch* -calculus is that communications in CML are synchronous whereas communications in the λ*ch* -calculus are asynchronous.

On the theoretical side, session-typed functional languages have been actively studied [13,25,46,48]. Notably, some of these languages [25,46,48] are built upon the Curry-Howard foundation between linear logic and session-typed processes. It might be interesting to investigate whether we can relate these languages and the λ*ch* -calculus through the lens of Curry-Howard-Lambek correspondence.

**Higher-Order vs. First-Order** *π***-calculus.** A number of translations from higher-order languages to the π-calculus have been developed [39,40,42,45,47] since Milner [29] presented the encodings of the λ-calculus into the π-calculus. The basic idea shared by these studies is to transform λx.M to a process !a(x, p).P that receives the argument x together with a name p where the rest of the computation will be transmitted. In our framework, this idea is described as the isomorphism A ⇒ B ∼= A ⊗ B<sup>∗</sup> ⇒ I.

Among others, the translation from AHOπ to Lπ [42] is the closest to our translation from the λ*ch* -calculus to the π<sup>F</sup> -calculus. Sangiorgi [41] observed that Milner's translation can be established via the translation of AHOπ by applying the CPS transformation to the λ-calculus. This observation also applies to our translation. That is, we can obtain Milner's translation by combining CPS transformation and the compilation of the λ*ch* -calculus.

# **7 Conclusion and Future Work**

We have introduced an **i**/**o**-typed π-calculus (π<sup>F</sup> -calculus) as well as the categorical counterpart of π<sup>F</sup> -calculus (compact closed Freyd category) and showed the categorical type theory correspondence between them. The correspondence was established by regarding the π-calculus as a higher-order programming language, introducing the **i**/**o**-separation, and introducing the η-rule, a rule that explains the mismatch between behavioural equivalences and categorical models.

As an application of our semantic framework we introduced a higher-order calculus λ*ch* -calculus "equivalent" to the π<sup>F</sup> -calculus. We have demonstrated that translations between λ*ch* -calculus and π<sup>F</sup> -calculus can be derived by a simple semantic argument, and showed that the translation from λ*ch* to π<sup>F</sup> is a generalisation of the translation from AHOπ to Lπ given by Sangiorgi [42].

There are three main directions for future work. First, further investigation on the η-rule is indispensable. We plan to construct a categorical model of the π<sup>F</sup> -calculus with an additional constant that captures barbed congruence. Revealing the relationship between locality and the η-rule is another important problem. Second, the operational properties of the λ*ch* -calculus and its relation to the equational theory needs a further investigation. Third, finding the logical counterpart of compact closed Freyd category to establish a proper Curry-Howard-Lambek correspondence is an interesting future work.

**Acknowledgement.** We would like to thank Naoki Kobayashi, Masahito Hasegawa and James Laird for discussions, and anonymous referees for valuable comments. This work was supported by JSPS KAKENHI Grant Number 15H05706 and 16K16004.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Process Algebra for Link Layer Protocols**

Rob van Glabbeek1,2(B) , Peter H¨ofner1,2, and Michael Markl1,3

<sup>1</sup> Data61, CSIRO, Sydney, Australia

rvg@cs.stanford.edu <sup>2</sup> Computer Science and Engineering, University of New South Wales, Sydney, Australia

<sup>3</sup> Institut f¨ur Informatik, Universit¨at Augsburg, Augsburg, Germany

**Abstract.** We propose a process algebra for link layer protocols, featuring a unique mechanism for modelling frame collisions. We also formalise suitable liveness properties for link layer protocols specified in this framework. To show applicability we model and analyse two versions of the Carrier-Sense Multiple Access with Collision Avoidance (CSMA/CA) protocol. Our analysis confirms the hidden station problem for the version without virtual carrier sensing. However, we show that the version with virtual carrier sensing not only overcomes this problem, but also the exposed station problem with probability 1. Yet the protocol cannot guarantee packet delivery, not even with probability 1.

# **1 Introduction**

The (data) link layer is the 2nd layer of the ISO/OSI model of computer networking [18]. Amongst others, it is responsible for the transfer of data between adjacent nodes in Wide Area Networks (WANs) and Local Area Networks (LANs).

Examples of link layer protocols are Ethernet for LANs [16], the Point-to-Point Protocol [24] and the High-Level Data Link Control protocol (e.g. [14]). Part of this layer are also multiple access protocols such as the Carrier-Sense Multiple Access with Collision Detection (CSMA/CD) protocol for re-transmission in Ethernet bus networks and hub networks, or the Carrier-Sense Multiple Access with Collision Avoidance (CSMA/CA) protocol [17,19] in wireless networks.

One of the unique characteristics of the link layer is that when devices attempt to use a medium simultaneously, *collisions of messages* occur. So, any modelling language and formal analysis of layer-2 protocols has to support such collisions. Moreover, some protocols are of probabilistic nature: CSMA/CA for example chooses time slots probabilistically with discrete uniform distribution.

As we are not aware of any formal framework with primitives for modelling data collisions, this paper introduces a process algebra for modelling and analysing link layer protocols. In Sect. 2 we present an algebra featuring a unique mechanism for modelling collisions, 'hard-wired' in the semantics. It is the nonprobabilistic fragment of the Algebra for Link Layer protocols (ALL), which we introduce in Sect. 3. In Sect. 4 we formulate *packet delivery*, a liveness property that ideally ought to hold for link layer protocols, either outright, or with a high probability. In Sect. 5 we use this framework to formally model and analyse the CSMA/CA protocol.

Our analysis confirms the hidden station problem for the version of CSMA/CA without virtual carrier sensing (Sect. 5.2). However, we also show that the version with virtual carrier sensing overcomes not only this problem, but also the exposed station problem with probability 1. Yet the protocol cannot guarantee packet delivery, not even with probability 1.

# **2 A Non-probabilistic Subalgebra**

In this section we propose a timed process algebra that can model the collision of link layer messages, called *frames*. <sup>1</sup> It can be used for link layer protocols that do not feature probabilistic choice, and is inspired by the (Timed) Algebra for Wireless Networks ((T-)AWN) [2,12,13], a process algebra suitable for modelling and analysing protocols on layers 3 (network) and 4 (transport) of the OSI model.

The process algebra models a (wired or wireless) network as an encapsulated parallel composition of network nodes. Due to the nature of the protocols under consideration, on each node exactly one sequential process is running. The algebra features a discrete model of time, where each sequential process maintains a local variable now holding its local clock value—an integer. We employ only one clock for each sequential process. All sequential processes in a network synchronise in taking time steps, and at each time step all local clocks advance by one unit. Since this means that all clocks are in sync and do not run at different speeds it is clear that we do not consider the problem of clock shift. For the rest, the variable now behaves like any other variable maintained by a process: its value can be read when evaluating guards, thereby making progress time-dependant, and any value can be assigned to it, thereby resetting the local clock. Network nodes communicate with their direct neighbours—those nodes that are in transmission range. The algebra provides a mobility option that allows nodes to move in or out of transmission range. The encapsulation of the entire network inhibits communications between network nodes and the outside world, with the exception of the receipt and delivery of data packets from or to clients (the higher OSI layers).

#### **2.1 A Language for Sequential Processes**

The internal state of a process is determined, in part, by the values of certain data variables that are maintained by that process. To this end, we assume a data structure with several types, variables ranging over these types, operators and predicates. Predicate logic yields terms (or *data expressions*) and formulas

<sup>1</sup> As it is the nonprobabilistic fragment of a forthcoming algebra we do not name it.

to denote data values and statements about them. Our data structure always contains the types TIME, DATA, MSG, CHUNK, ID and *P*(ID) of discrete *time values*, which we take to be integers, *network layer data*, *messages*, *chunks* of messages that take one time unit to transmit, *node identifiers* and *sets of node identifiers*. We further assume that there are variables now of type TIME and rfr of type CHUNK. In addition, we assume a set of *process names*. Each process name X comes with a *defining equation*

$$X(\mathbf{var}\_1, \dots, \mathbf{var}\_n) \stackrel{def}{=} P\,,$$

in which <sup>n</sup> <sup>∈</sup> IN, var<sup>i</sup> are variables and <sup>P</sup> is a *sequential process expression* defined by the grammar below. It may contain the variables var<sup>i</sup> as well as X. However, all occurrences of data variables in P have to be *bound*. <sup>2</sup> The choice of the underlying data structure and the process names with their defining equations can be tailored to any particular application of our language.

The *sequential process expressions* are given by the following grammar:

$$\begin{aligned} P &::= X(\exp\_1, \dots, \exp\_n) \quad | \quad [\varphi]P \quad | \quad [\mathsf{var} := \exp]P \quad | \quad \alpha.P \quad | \quad P + P \\ \alpha &::= \mathsf{transmit}(\mathsf{ms}) \quad | \quad \mathsf{newpkt}(\mathsf{data}, \mathsf{dest}) \quad | \quad \mathsf{deliver}(\mathsf{data}) \end{aligned}$$

Here X is a process name, *exp*<sup>i</sup> a data expression of the same type as vari, ϕ a data formula, var := *exp* an assignment of a data expression *exp* to a variable var of the same type, *ms* a data expression of type MSG, and data, dest data variables of types DATA, ID respectively.

Given a valuation of the data variables by concrete data values, the sequential process **[**ϕ**]**P acts as P if ϕ evaluates to true, and deadlocks if ϕ evaluates to false. In case ϕ contains free variables that are not yet interpreted as data values, values are assigned to these variables in any way that satisfies ϕ, if possible. The process **[[**var := *exp***]]**P acts as P, but under an updated valuation of the data variable var. The process P + Q may act either as P or as Q, depending on which of the two processes is able to act at all. In a context where both are able to act, it is not specified how the choice is made. The process α.P first performs the action α and subsequently acts as P. The above behaviour is identical to AWN, and many other standard process algebras. The action **transmit**(*ms*) transmits (the data value bound to the expression) *ms* to all other network nodes within transmission range. The action **newpkt**(data, dest) models the injection by the network layer of a data packet data to be transmitted to a destination dest. Technically, data and dest are variables that will be bound to the obtained values upon receipt of a **newpkt**. Data is delivered to the network layer by **deliver**(data). In contrast to AWN, we do not have a primitive for

<sup>2</sup> An occurrence of a data variable in <sup>P</sup> is *bound* if it is one of the variables var*<sup>i</sup>*, one of the two special variables now or rfr, a variable var occurring in a subexpression **[[**var := *exp***]]**Q, an occurrence in a subexpression **[**ϕ**]**<sup>Q</sup> of a variable occurring free in <sup>ϕ</sup>, or a variable data or dest occurring in a subexpression **newpkt**(data, dest).Q. Here Q is an arbitrary sequential process expression.

receiving messages from neighbouring nodes, because our processes are *always* listening to neighbouring nodes, in parallel with anything else they do.

As in AWN, the internal state of a sequential process described by an expression P is determined by P, together with a *valuation* ξ associating values ξ(var) to variables var maintained by this process. Valuations naturally extend to ξ*closed* expressions—those in which all variables are either bound or in the domain of ξ. We denote the valuation that assigns the value v to the variable var, and agrees with <sup>ξ</sup> on all other variables, by <sup>ξ</sup>[var := <sup>v</sup>]. The valuation <sup>ξ</sup>|<sup>S</sup> agrees with <sup>ξ</sup> on all variables var <sup>∈</sup> <sup>S</sup> and is undefined otherwise. Moreover we use ξ[var ++] as an abbreviation for ξ[var := ξ(var) + 1], for suitable types.

To capture the durational nature of transmitting a message between network nodes, we model a message as a sequence of *chunks*, each of which takes one time unit to transmit. The function dur : MSG → TIME><sup>0</sup> calculates the amount of time steps needed for a sending a message, i.e. it calculates the number of chunks. We employ the internal data type CHUNK := {m:<sup>c</sup> <sup>|</sup> <sup>m</sup> <sup>∈</sup> MSG, <sup>1</sup> <sup>≤</sup> <sup>c</sup> <sup>≤</sup> dur(m)}∪{conflict, idle}. The chunk <sup>m</sup>:<sup>c</sup> indicates the <sup>c</sup> th fragment of a message m. Data conflicts—junk transmitted via the medium—is modelled by the special chunk conflict, and the absence of an incoming chunk is modelled by idle.

Our process algebra maintains a variable rfr of type CHUNK, storing the fragment of the current message received so far.

As a value of this variable, m:c indicates that the first c chunks of message m have been received in order; conflict indicates that the last incoming chunk was not the expected (next) part of a message in progress, and idle indicates that the channel was idle during the last time step. The table on the right, with ∗ a wild card, shows how the value of rfr evolves upon receiving a new chunk *ch*.


Specifications may refer to the data type CHUNK only through the Boolean functions new—having a single argument *msg* of type MSG—and idle, defined by new(*msg*) := (rfr = (*msg* : dur(*msg*)) and idle := (rfr = idle). A guard [new(*msg*)] evaluates to true iff a new message *msg* has just been received; [idle] evaluates to true iff in the last time slice the medium was idle.

The structural operational semantics of Table 1 describes how one internal state can evolve into another by performing an *action*. The set Act of actions consists of **transmit**(m:c, *ch*), **wait**(*ch*), **newpkt**(*d*, *dest*), **deliver**(*d*), and internal actions <sup>τ</sup>, for each choice of <sup>m</sup> <sup>∈</sup> MSG, <sup>c</sup>∈{1,..., dur(m)}, *ch* <sup>∈</sup> CHUNK, *<sup>d</sup>*∈DATA and *dest*<sup>∈</sup> ID, where the first two actions are time consuming. On every timeconsuming action, each process receives a chunk *ch* and updates the variable rfr accordingly; moreover, the variable now is incremented on all process expressions in a (complete) network synchronously.

Besides the special variables now and rfr, the formal semantics employs an internal variable cntr ∈ IN that enumerates the chunks of split messages and is


**1.** Structural operational semantics for sequential process expressions

> **Table**

used to identify which chunk needs to be sent next. The variables now, rfr and cntr are not meant to be changed by ALL specifications, e.g. by using assignments. We call them read-only and collect them in the set RO <sup>=</sup> {now, rfr, cntr}.

Let us have a closer look at the rules of Table 1.

The first two rules describe the sending of a message *ms*. Remember that dur(*ms*) calculates the time needed to send *ms*. The counter cntr keeps track of the time passed already. The action **transmit**(m:c, *ch*) occurs when the node transmits the fragment m:c; simultaneously, it receives the fragment *ch*. <sup>3</sup> The counter cntr is 0 before a message is sent, and is incremented before the transmission of each chunk. So, each chunk sent has the form ξ(*ms*):ξ(cntr)+1. To ease readability we abbreviate ξ(cntr)+1 by c+. In case the (already incremented) counter c+ is strictly smaller than the number of chunks needed to send ξ(*ms*), another **transmit**-action is needed (Rule 1); if the last fragment has been sent (c+ = dur(ξ(*ms*))) the process can continue to act as P (Rule 2).

The actions **newpkt**(d, *dest*) and **deliver**(*d*) are instantaneous and model the submission of data d from the network layer, destined for *dest*, and the delivery of data d to the network layer, respectively. The process **newpkt**(d, *dest*).P has also the possibility to wait, namely if no network layer instruction arrives.

Rule 6 defines a rule for assignment in a straightforward fashion; only the valuation of the variable var is updated.

In Rules 7 and 8, which define recursion, <sup>ξ</sup>|RO[var<sup>i</sup> := <sup>ξ</sup>(*exp*i)]<sup>n</sup> <sup>i</sup>=1 is the valuation that *only* assigns the values ξ(*exp*i) to the variables vari, for i = 1,...,n, and maintains the values of the variables now, rfr and cntr. These rules state that a defined process X has the same transitions as the body p of its defining equation. In case of a **wait**-transition, the sequential process does not progress, and accordingly the recursion is not yet unfolded.

Most transition rules so far feature statements of the form ξ(*exp*) where *exp* is a data expression. The application of the rule depends on ξ(*exp*) being defined. Rule 9 covers all cases where the above rules cannot be applied since at least one data expression in an action α is not defined. A state ξ,P is *unvalued*, denoted by <sup>ξ</sup>(p)↑, if <sup>P</sup> has the form **transmit**(*ms*).P, **deliver**(data).P, **[[**var := *exp***]]**<sup>P</sup> or X(*exp*1,..., *exp*n) with either ξ(*ms*) or ξ(data) or ξ(*exp*) or some ξ(*exp*i) undefined. From such a state the process can merely wait.

A process P + Q can wait *only* if both P and Q can do the same; if either P or Q can achieve 'proper' progress, the choice process P + Q always chooses progress over waiting. A simple induction shows that if ξ,P **wait**(*ch*) −−−−−→ ζ,P and ξ,Q **wait**(*ch*) −−−−−→ <sup>ζ</sup> , Q then P = P , Q = Q and ζ = ζ .

The first rule of (12), describing the semantics of guards **[**ϕ**]**, is taken from AWN. Here ξ <sup>ϕ</sup> <sup>→</sup> <sup>ζ</sup> says that <sup>ζ</sup> is an extension of <sup>ξ</sup>, i.e. a valuation that agrees with ξ on all variables on which ξ is defined, and evaluates other variables occurring free in ϕ, such that the formula ϕ holds under ζ. All variables not free in ϕ and not evaluated by ξ are also not evaluated by ζ. Its negation ξ <sup>ϕ</sup> −→ says

<sup>3</sup> Normally, a node is in its own transmission range. In that case the received chunk *ch* will be either the chunk <sup>m</sup>:<sup>c</sup> it is transmitting itself, or conflict in case some other node within transmission range is transmitting as well.

that no such extension exists, and thus, that ϕ is false in the current state, no matter how we interpret the variables whose values are still undefined. If that is the case, the process [ϕ]p will idle by performing the action **wait**(*ch*).

#### **2.2 A Language for Node Expressions**

We model network nodes in the context of a (wireless) network by *node expressions* of the form

$$id \colon (\xi, P) \colon R \dots$$

Here *id* <sup>∈</sup> ID is the *address* of the node, <sup>P</sup> is a sequential process expression with a valuation <sup>ξ</sup>, and <sup>R</sup> <sup>∈</sup> *<sup>P</sup>*(ID) is the *range* of the node, defined as the set of nodes within transmission range of *id*. Unlike AWN, the process algebra does not offer a parallel operator for combining sequential processes; such an operator is not needed due to the nature of link layer protocols.

In the semantics of this layer it is crucial to handle frame collisions. The idea is that all chunks sent are recorded, together with the respective recipient. In case a node receives more than one chunk at a time, a conflict is raised, as it is impossible to send two or more messages via the same medium at the same time.

The formal semantics for node expressions, presented in Table 2, uses transition labels **traffic**(<sup>T</sup> , <sup>R</sup>), *id* : **deliver**(*d*), *id* : **newpkt**(d, *id* ), **connect**(*id*, *id* ), **disconnect**(*id*, *id* ) and <sup>τ</sup> , with partial functions <sup>T</sup> , <sup>R</sup> : ID CHUNK, *id*, *id* <sup>∈</sup> ID, and <sup>d</sup> <sup>∈</sup> DATA.


**Table 2.** Structural operational semantics for node expressions

All time-consuming actions on process level (**transmit**(m:c,*ch*) and **wait**(*ch*)) are transformed into an action **traffic**(<sup>T</sup> , <sup>R</sup>) on node level: the first argument

**Table 3.** Structural operational semantics for network expressions

<sup>T</sup> maps *dest* to <sup>m</sup>:<sup>c</sup> if and only if the chunk <sup>m</sup>:<sup>c</sup> is transmitted to *dest*. The second argument <sup>R</sup> maps *id* to <sup>m</sup>:<sup>c</sup> if and only if the chunk <sup>m</sup>:<sup>c</sup> is received on process level at node *id*. For the sos-rules of Table 2 we use the set-theoretic presentation of partial functions. The two rules for **wait** set T := ∅, as no chunks are transmitted; the rules for **transmit** allow a transmitted chunk m:c to travel to all nodes within transmission range: <sup>T</sup> := {(r, m:c)<sup>|</sup> <sup>r</sup> <sup>∈</sup> <sup>R</sup>}. In case that during the transmission or waiting no chunk is received (*ch* = idle) we set <sup>R</sup> <sup>=</sup> <sup>∅</sup>; otherwise <sup>R</sup> <sup>=</sup> {(*id*, *ch*)}, indicating that chunk *ch* is received by node *id*.

The actions *id* : **newpkt**(d, *dest*) and *id* : **deliver**(*d*) as well as the internal actions τ are simply inherited by node expressions from the processes that run on these nodes.

The remaining rules of Table 2 model the mobility aspect of wireless networks; the rules are taken straight from AWN [12,13]. We allow actions **connect**(*id*, *id* ) and **disconnect**(*id*, *id* ) for *id*, *id* <sup>∈</sup> ID modelling a change in network topology. These actions can be thought of as occurring nondeterministically, or as actions instigated by the environment of the modelled network protocol. In this formalisation node *id* is in the range of node *id*, meaning that *id* can receive messages sent by *id*, if and only if *id* is in the range of *id* . To break this symmetry, one just skips the last four rules of Table 2 and replaces the synchronisation rules for **connect** and **disconnect** in Table 3 by interleaving rules (like the ones for **deliver**, **newpkt** and τ ) [12]. For some applications a wired or non-mobile network need to be considered. In such cases the last six rules of Table 2 are dropped.

Whether a node *id*:P :R receives its own transmissions depends on whether *id* <sup>∈</sup> <sup>R</sup>. Only if *id* <sup>∈</sup> <sup>R</sup> our process algebra will disallow the transmission from and to a single node *id* at the same time, yielding a conflict.

#### **2.3 A Language for Networks**

<sup>A</sup> *partial network* is modelled by a *parallel composition* of node expressions, one for every node in the network. A *complete network* is a partial network within an *encapsulation operator* [ ], which limits the communication between network nodes and the outside world to the receipt and delivery of data packets to and from the network layer.

The syntax of networks is described by the following grammar:

$$N ::= [M\_T^T] \qquad M\_{S\_1 \cup S\_2}^T ::= M\_{S\_1}^T \parallel M\_{S\_2}^T \qquad M\_{\{id\}}^T ::= id \colon (\xi, P) \colon R \ , \ \xi$$

with {*id*} ∪ <sup>R</sup> <sup>⊆</sup> <sup>T</sup> <sup>⊆</sup> ID. Here <sup>M</sup><sup>T</sup> <sup>S</sup> models a partial network describing the behaviour of all nodes *id* <sup>∈</sup> <sup>S</sup>. The set <sup>T</sup> contains the identifiers of all nodes that are part of the complete network. This grammar guarantees that node identifiers of node expressions—the first component of *id*:P :R—are unique.

The operational semantics of network expressions is given in Table 3. Internal actions τ as well as the actions *id* : **deliver**(*d*) and *id* : **newpkt**(d,*id*) are interleaved in the parallel composition of nodes that makes up a network, and then lifted to encapsulated networks (Line 1 of Table 3).

Actions **traffic** and (**dis**)**connect** are synchronised. The rule for synchronising the action **traffic** (Line 3), the only action that consumes time on the network layer, uses the union  of partial functions. It is formally defined as

$$(\mathcal{R}\_1 \uplus \mathcal{R}\_2)(id) := \begin{cases} \underline{\operatorname{conf} \mathbf{1} \uplus \mathbf{t}} & \text{if } id \in \mathsf{dom}(\mathcal{R}\_1) \cap \mathsf{dom}(\mathcal{R}\_2) \\ \mathcal{R}\_1(id) & \text{if } id \in \mathsf{dom}(\mathcal{R}\_1) - \mathsf{dom}(\mathcal{R}\_2) \\ \mathcal{R}\_2(id) & \text{if } id \in \mathsf{dom}(\mathcal{R}\_2) - \mathsf{dom}(\mathcal{R}\_1) \end{cases}$$

The synchronisation of the sets R<sup>i</sup> and T<sup>i</sup> has the following intuition: if a node identifier *id* <sup>∈</sup> ID is in both dom(T1) and dom(T2) then there exist two nodes that transmit to node *id* at the same time, and therefore a frame collision occurs. In our algebra this is modelled by the special chunk conflict. The sos rules of Tables 2 and 3 guarantee that there cannot be collisions within the set of received chunks R. The reason is that each node merely contributes to R a chunk for itself; it can be the chunk conflict though. Therefore we could have written R<sup>1</sup> ∪ R<sup>2</sup> instead of R<sup>1</sup>  R<sup>2</sup> in the sixth rule of Table 3.

The last rule propagates a **traffic**(<sup>T</sup> , <sup>R</sup>)-action of a partial network <sup>M</sup> to a complete network [M]. By then <sup>T</sup> consists of all chunks (after collision detection) that are being transmitted by any member in the network, and R consists of all chunks that are received. The condition R = T determines the content of the messages in <sup>R</sup>. The **traffic**(<sup>T</sup> , <sup>R</sup>)-actions become internal at this level, as they cannot be steered by the outside world; all that is left is a time-step **tick**.

#### **2.4 Results on the Process Algebra**

As for the process algebra T-AWN [2], but with a slightly simplified proof, one can show that our processes have no *time deadlocks*:

**Theorem 2.1.** *A complete network* N *in our process algebra always admits a transition, independently of the outside environment, i.e.* <sup>∀</sup>N, <sup>∃</sup><sup>a</sup> *such that* <sup>N</sup> <sup>a</sup>−→ *and* <sup>a</sup> ∈ {*connect*(*id*, *id* ), *disconnect*(*id*, *id* ), *id*:newpkt(*d*,*dest*)}*. More precisely, either* N **tick** −−→*, or* <sup>N</sup> *id* : *deliver(d)* −−−−−−−−→ *or* <sup>N</sup> <sup>τ</sup> −→*.*

The following results (statements and proofs) are very similar to the results about the process algebra AWN, as presented in [13]. A rich body of foundational meta theory of process algebra allows the transfer of the results to our setting, without too much overhead work.

Identical to AWN and its timed version T-AWN, our process algebra admits a translation into one without data structures (although we cannot describe the target algebra without using data structures). The idea is to replace any variable by all possible values it can take. The target algebra differs from the original only on the level of sequential processes; the subsequent layers are unchanged. The construction closely follows the one given in the appendix of [2]. The inductive definition contains the rules

*T*ξ(**deliver**(*data*).P) = **deliver**(ξ(*data*)).*T*ξ(P) and

*T*ξ(**[[**var := *exp***]]**P) = τ.*T*<sup>ξ</sup> - var := ξ(*exp*) (P).

Most other rules require extra operators that keep track of the passage of time and the evolution of other internal variables. The resulting process algebra has a structural operational semantics in the (infinitary) *de Simone* format, generating the same transition system—up to strong bisimilarity, ↔ —as the original. It follows that ↔, and many other semantic equivalences, are congruences on our language [23].

**Theorem 2.2.** *Strong bisimilarity is a congruence for all operators of our language.*

This is a deep result that usually takes many pages to establish (e.g. [25]). Here we get it directly from the existing theory on structural operational semantics, as a result of carefully designing our language within the disciplined framework described by de Simone [23].

**Theorem 2.3.** *The operator is associative and commutative, up to* ↔*.*

*Proof.* The operational rules for this operator fits a format presented in [6], guaranteeing associativity up to ↔. The *A*SSOC-de Simone format of [6] applies to all transition system specifications (TSSs) in de Simone format, and allows 7 different types of rules (named 1–7) for the operators in question. Our TSS is in de Simone format; the four rules for of Table 3 are of types 1, 2 and 7, respectively. To be precise, it has rules 1<sup>a</sup> and 2<sup>a</sup> for <sup>a</sup> ∈ {<sup>τ</sup> ,*id* : **deliver**(*d*),*id* : **newpkt**(d, *dest*)}, rules 7(a,b) for

$$\mathcal{C}(a,b) \in \{ (\mathtt{true}(\mathcal{T}\_1, \mathcal{R}\_1), \mathtt{true}(\mathcal{T}\_2, \mathcal{R}\_2)) \mid \mathcal{R}\_1, \mathcal{R}\_2, \mathcal{T}\_1, \mathcal{T}\_2 \in \mathtt{TD} \to \mathtt{CHUN} \}.$$

and rules 7(c,c) for <sup>c</sup> ∈ {**connect**(*id*, *id* ), **disconnect**(*id*, *id* ) <sup>|</sup> *id*, *id* <sup>∈</sup> ID}. Moreover, the partial *communication function* <sup>γ</sup> : Act <sup>×</sup> Act Act is given by <sup>γ</sup>(**traffic**(T1, <sup>R</sup>1), **traffic**(T2, <sup>R</sup>2)) = **traffic**(T<sup>1</sup> <sup>T</sup>2, <sup>R</sup><sup>1</sup> <sup>R</sup>2) and <sup>γ</sup>(c, c) = <sup>c</sup>. The main result of [6] is that an operator is guaranteed to be associative, provided that γ is associative and six conditions are fulfilled. In the absence of rules of types 3, 4, 5 and 6, five of these conditions are trivially fulfilled, and the remaining one reduces to

$$(7\_{(a,b)} \Rightarrow (1\_a \Leftrightarrow 2\_b) \land (2\_a \Leftrightarrow 2\_{\gamma(a,b)}) \land (1\_b \Leftrightarrow 1\_{\gamma(a,b)}) \dots$$

Here 1<sup>a</sup> says that rule 1<sup>a</sup> is present, etc. This condition is trivially met for as there neither exists a rule of the form 1**traffic**(T,R) nor of the form 2**traffic**(T,R), or 1c, 2<sup>c</sup> with c as above. As on **traffic** actions γ is basically the union of partial functions (), where a collision in domains is indicated by an error conflict, it is straightforward to prove associativity of γ.

Commutativity of follows by symmetry of the sos rules.

### **3 An Algebra for Link Layer Protocols**

We now introduce ALL, the *Algebra for Link Layer protocols*. It is obtained from the process algebra presented in the previous section by the addition of a probabilistic choice operator <sup>n</sup> <sup>0</sup> . As a consequence, the semantics of the algebra is no longer a labelled transition system, but a *probabilistic labelled transition system* (pLTS) [8]. This is a triple (S, Act,→), where


As with LTSs, we usually write s <sup>α</sup> −→ <sup>Δ</sup> instead of (s, α, Δ) ∈ →. The *point distribution* <sup>δ</sup>s, for <sup>s</sup> <sup>∈</sup> <sup>S</sup>, is the distribution with <sup>δ</sup>s(s) = 1. We simply write s <sup>α</sup> −→ <sup>t</sup> for <sup>s</sup> <sup>α</sup> −→ <sup>δ</sup>t. An LTS may be viewed as a degenerate pLTS, in which only point distributions occur. For a uniform distribution over <sup>s</sup>0,...,s<sup>n</sup> <sup>∈</sup> <sup>S</sup> we write U n <sup>i</sup>=0si. The pLTS associated to ALL takes <sup>S</sup> to be the disjoint union of the pairs ξ,P, with P a sequential process expression, and the network expressions. Act is the collection of transition labels, and → consists of the transitions derivable from the structural operational semantics of the language.

Rules (1)–(6), (9), (11) and (12) of Table 1 are adopted to ALL unchanged, whereas in Rules (7), (8) and (10) the state ζ,P (or ζ,Q ) is replaced by an arbitrary distribution Δ. Add to those the following rule for the probabilistic choice operator:

$$\xi, \bigoplus\_{i=0}^{n} P \stackrel{\tau}{\longrightarrow} \mathcal{U}\_{i=0}^{\xi(n)} \xi[\colon = i] \text{ } P$$

Here the data variable i may occur in P. The rules of Tables 2 and 3 are adapted to ALL unchanged, except that P , M and N are now replaced by arbitrary distributions over sequential processes and network expressions, respectively. Here we adapt the convention that a unary or binary operation on states lifts to distributions in the standard manner. For example, if Δ is a distribution over sequential processes, *id* <sup>∈</sup> ID and <sup>R</sup> <sup>⊆</sup> ID, then *id*:Δ:<sup>R</sup> describes the distribution over node expressions that only has probability mass on nodes with address *id* and range R, and for which the probability of *id*:P :R is Δ(P). Likewise, if Δ and <sup>Θ</sup> are distributions over network expressions, then <sup>Δ</sup> Θ is the distribution over network expressions of the form <sup>M</sup> <sup>N</sup>, where (<sup>Δ</sup> <sup>Θ</sup>)(<sup>M</sup> <sup>N</sup>) = <sup>Δ</sup>(M) · <sup>Θ</sup>(N).

# **4 Formalising Liveness Properties of Link Layer Protocols**

Link layer protocols communicate with the network layer through the actions *id* : **newpkt**(d, *dest*) and *id* : **deliver**(*d*). The typical liveness property expected of a link layer protocol is that if the network layer at node *id* injects a data packet d for delivery at destination *dest* then this packet is delivered eventually. In terms of our process algebra, this says that every execution of the action *id* : **newpkt**(d, *dest*) ought to be followed by the action *dest*: **deliver**(*d*). This property can be formalised in Linear-time Temporal Logic [22] as

$$\mathbf{G}\left(id \colon \mathbf{newpkt}(d, dest) \Rightarrow \mathbf{F}(dest \colon \mathbf{deliver}(d))\right) \tag{1}$$

for any *id*, *dest* <sup>∈</sup> ID and <sup>d</sup> <sup>∈</sup> DATA. This formula has the shape **<sup>G</sup>** <sup>φ</sup>*pre* <sup>⇒</sup> **<sup>F</sup>**φ*post* , and is called an *eventuality property* in [22]. It says that whenever we reach a state in which the precondition φ*pre* is satisfied, this state will surely be followed by a state were the postcondition φ*post* holds. In [7,13] it is explained how action occurrences can be seen or encoded as state-based conditions. Here we will not define how to interpret general LTL-formula in pLTSs, but below we do this for eventuality properties with specific choices of φ*pre* and φ*post*.

Formula (1) is too strong and does not hold in general: in case the nodes *id* and *dest* are not within transmission range of each other, the delivery of messages from *id* to *dest* is doomed to fail. We need to postulate two side conditions to make this liveness property plausible. Firstly, when the request to deliver the message comes in, *id* needs to be connected to *dest*. We introduce the predicate **cntd**(*id*, *dest*) to express this, and hence take φ*pre* to be **cntd**(*id*, *dest*) <sup>∧</sup> *id* : **newpkt**(d, *dest*). Secondly, we assume that the link between *id* and *dest* does not break until the message is delivered. As remarked in [13], such a side condition can be formalised by taking <sup>φ</sup>*post* to be *dest* : **deliver**(*d*) <sup>∨</sup> **disconnect**(*id*, *dest*). Thus the liveness property we are after is

$$\begin{array}{c} \mathbf{G} \left( \texttt{cntd} (id, dest) \land id \mathrel{\mathtt{newpkt}} (d, dest) \Rightarrow \\ \mathbf{F} (dest \mathrel{\mathtt{oldver}} (d) \lor \texttt{disconnect} (id, dest) \lor \texttt{disconnect} (dest, id)) \right) \end{array} \tag{2}$$

We now define the validity of eventuality properties **G** <sup>φ</sup>*pre* <sup>⇒</sup> **<sup>F</sup>**φ*post* . Here φ*pre* and φ*post* denote sets of transitions and actions, respectively, and hold if one of the transitions or actions in the set occurs. In (2), φ*pre* denotes the transitions with label *id* : **newpkt**(d, *dest*) that occur when the side condition **cntd**(*id*, *dest*) is met, whereas <sup>φ</sup>*post* <sup>=</sup> {*dest* : **deliver**(*d*), **disconnect**(*id*, *dest*), **disconnect**(*dest*, *id*)} is a set of actions.

<sup>A</sup> *path* in a pLTS (S, Act,→) is an alternating sequence <sup>s</sup>0, α1, s1, α2,... of states and actions, starting with a state and either being infinite or ending with a state, such that there is a transition <sup>s</sup><sup>i</sup> <sup>α</sup>*i*+1 −−−→ <sup>Δ</sup>i+1 with <sup>Δ</sup>i+1(si+1) <sup>&</sup>gt; 0 for each i. The path is *rooted* if it starts with a state marked as 'initial', and *complete* if either it is infinite, or there is no transition starting from its last state. A state or transition is *reachable* if it occurs in a rooted path.

In a pLTS with an initial state, an eventually formula **G** <sup>φ</sup>*pre* <sup>⇒</sup> **<sup>F</sup>**φ*post* , with φ*pre* and φ*post* denoting sets of transitions and actions, *holds outright* if all complete paths starting with a reachable transition from φ*pre* contain a transition with a label from φ*post*.

Definitions 3 and 5 in [9] define the set of probabilities that a pLTS with an initial state will ever execute the action ω. One obtains a set of probabilities rather than a single probability due to the possibility of nondeterministic choice. This definition generalises to *sets* of actions φ*post* (seen as disjunctions) by first renaming all actions in such a set into ω. It also generalises trivially to pLTSs with an *initial transition*. For t a transition in a pLTS, let *Prob*(t, φ*post*) be the infimum of the set of probabilities that the pLTS in which t is taken to be the initial transition will ever execute φ*post*. Now in a pLTS with an initial state, an eventually formula **G** <sup>φ</sup>*pre* <sup>⇒</sup> **<sup>F</sup>**φ*post holds with probability at least* <sup>p</sup> if for all reachable transitions <sup>t</sup> in <sup>φ</sup>*pre* we have *Prob*(t, φ*post*) <sup>≥</sup> <sup>p</sup>.

Possible correctness criteria for link layer protocols are that the liveness property (2) either holds outright, holds with probability 1, or at least holds with probability p for a sufficiently high value of p.

Sometimes we are content to establish that (2) holds under the additional assumptions that the network is stable until our packet is delivered, meaning that no links between any nodes are broken or established, and/or that the network layer refrains from injecting more packets. This is modelled by taking

<sup>φ</sup>*post* <sup>=</sup> {*dest* : **deliver**(*d*), **disconnect**(∗, <sup>∗</sup>), **connect**(∗, <sup>∗</sup>), **newpkt**(∗, <sup>∗</sup>)}. (3)

We will refer to this version of (2) as the *weak packet delivery* property. *Packet delivery* is the strengthening without **newpkt**(∗, <sup>∗</sup>) in (3), i.e. not assuming that the network layer refrains from injecting more packets.

# **5 Modelling and Analysing the CSMA/CA Protocol**

In this section we model two versions of the CSMA/CA protocol, using the process algebra ALL. Moreover, we briefly discuss some results we obtained while analysing these protocols.

The *Carrier-Sense Multiple Access* (CSMA) protocol is a media access control (MAC) protocol in which a node verifies the absence of other traffic before transmitting on a shared transmission medium. If a carrier is sensed, the node waits for the transmission in progress to end before initiating its own transmission. Using CSMA, multiple nodes may, in turn, send and receive on the same medium. Transmissions by one node are generally received by all other nodes connected to the medium.

The CSMA protocol with Collision Avoidance (CSMA/CA) [17,19] 4 improves the performance of CSMA. If the transmission medium is sensed busy

<sup>4</sup> The primary medium access control (MAC) technique of IEEE 802.11 [19] is called *distributed coordination function* (DCF), which is a CSMA/CA protocol.

before transmission then the transmission is deferred for a *random* time interval. This interval reduces the likelihood that two or more nodes waiting to transmit will simultaneously begin transmission upon termination of the detected transmission. CSMA/CA is used, for example, in Wi-Fi.

It is well known that CSMA/CA suffers from the *hidden station problem* (see Sect. 5.2). To overcome this problem, CSMA/CA is often supplemented by the request-to-send/clear-to-send (RTS/CTS) handshaking [19]. This mechanism is known as the IEEE 802.11 RTS/CTS exchange, or *virtual carrier sensing*. While this extension reduces the amount of collisions, wireless 802.11 implementations do not typically implement RTS/CTS for all transmissions because the transmission overhead is too great for small data transfers.

We use the process algebra ALL to model both the CSMA/CA without and with virtual carrier sensing.

#### **5.1 A Formal Model for CSMA/CA**

Our formal specification of CSMA/CA consists of four short processes written in ALL. It is precise and free of ambiguities—one of the many advantages formal methods provide, in contrast to specifications written in English prose.

The syntax of ALL is intended to look like pseudo code, and it is our belief that the specification can easily be read and understood by software engineers, who may or may not have experience with process algebra.

As the underlying data structure of our model is straightforward, we do not present it explicitly, but introduce it while describing the different processes.

The basic process CSMA, depicted in Process 1, is the protocol's entry point.

```
Process 1. The Basic Routine
```

```
CSMA(id) def
         =
 1. newpkt(data,dest). INIT(id,0,dataframe(data,id,dest))
 2. + [new(dataframe(data,src,id))] deliver(data) .
 3. (
 4. [[timeout := now + sifs]] [now ≥ timeout]
 5. transmit(ackframe(src)) . CSMA(id)
 6. )
```
This process maintains a single data variable id in which it stores its own identity. It waits until either it receives a request from the network layer to transmit a packet data to destination dest, or it receives from another node in the network a CSMA message (data frame) destined for itself.

In case of a newly injected data packet (Line 1), the process INIT is called; this process (described below) initiates the sending of the message via the medium. When passing the message on to INIT we use a function dataframe : DATA×ID× ID → MSG that generates a message in a format used by the protocol: next to the header fields (from which we abstract) it contains the injected data as well as the designated receiver dest and the sender id—the current node.

In case of an incoming dataframe destined for this node (the third argument carrying the destination is id) (Line 2)—any other incoming message is ignored by this process—the data is handed over to the network layer (**deliver**(data)) followed by the transmission of an acknowledgement back to the sender of the message (src). CSMA/CA requires a short period of idling medium before sending the acknowledgement: in [19] this interval is called *short interframe space* (sifs). The process waits until the time of the interframe spacing has passed, and then transmits the acknowledgement. The acknowledgement sent is not always received by src, e.g. due to data collision; therefore src could send the same message again (see Process 4) and id could deliver the same data to the network layer again.


The process INIT (Process 2) initiates the sending of a message via the medium. Next to the variable id, which is maintained by all processes, it maintains the variable tries and dframe: tries stores the number of attempts already made to send message dframe. When the process is called the first time for a message dframe (Line 1 of Process 1) the value of tries is 0.

The constant max retransmit specifies the maximum number of attempts the protocol is allowed to retransmit the same message. If the limit is not yet reached (Line 1) the message dframe is sent. As mentioned above, CSMA/CA defers messages for a *random* time interval to avoid collision. The node must start transmission within the contention window cw, a.k.a. backoff time. cw is calculated in Line 2; it increases exponentially.<sup>5</sup> After cw is determined, the process CCA is called, which performs the actual **transmit**-action. In case the maximum number of retransmits is reached (Line 4), the process notifies the network layer and restarts the protocol, awaiting new instructions from the application layer, or a new incoming message.

Process 3 takes care of the actual transmission of dframe. However, the protocol has a complicated procedure when to send this message.

First, the process senses the medium and awaits the point in time when it is idle (Line 6). In case, before this happens, it receives from another node in the network a CSMA message destined for itself (Line 1), this message is handled just as in Process 1, except that after acknowledging this message the protocol returns to Process 3.

<sup>5</sup> A typical value for cwmin is 16; it must satisfy cwmin <sup>&</sup>gt; 0.

**Process 3.** Clear Channel Assessment With Physical Carrier Sense

```
CCA(id,b,tries,dframe) def
                     =
 1. [new(dataframe(data,src,id))] deliver(data) .
 2. (
 3. [[timeout := now + sifs]] [now ≥ timeout]
 4. transmit(ackframe(src)) . CCA(id,b,tries,dframe)
 5. )
 6. + [idle]
 7. [[timeout:=now+difs]] /* start wait for duration difs */
 8. (
 9. [¬idle] CCA(id,b,tries,dframe)
10. + [idle ∧ now ≥ timeout]
11. [[timeout := now + b]]
12. (
13. [¬idle] /* busy during backoff time */
14. [[b := timeout − now]] CCA(id,b,tries,dframe)
15. + [idle ∧ now ≥ timeout] /* idle for backoff time */
16. transmit(dframe) .
17. ACKRECV(id,tries,now+max ack wait,dframe)
18. )
19. )
```
To guarantee a gap between messages sent via the medium, CSMA/CA (as well as other protocols) specifies the *distributed (coordination function) interframe space* (difs <sup>∈</sup> TIME), which is usually small,<sup>6</sup> but larger than sifs, so that acknowledgements get priority over new data frames. When the medium becomes busy during the interframe space, another node started transmitting and the process goes back to listening to the medium (Line 9). In case nothing happens on the medium and the end of the interframe space is reached (Line 10), the process determines the actual time to start transmitting the message, taking the backoff time b into account (Line 11). If the medium is idle for the entire backoff period (Line 15), the message is transmitted (Line 16), and the process calls the process ACKRECV that will await an acknowledgement from the recipient of dframe (Line 17); the third argument specifies the maximum time the process should wait for such an acknowledgement. (As mentioned before an acknowledgement may never arrive.) If another node transmits on the medium during the backoff period, the protocol restarts the routine (Lines 13 and 14), with an adjusted backoff value b—the process already started waiting and should not be punished when the waiting is restarted; this update guarantees fairness of the protocol.

The process awaiting an acknowledgement (Process 4) is straightforward. It waits until either it receives a CSMA message destined for itself (Line 1), or it receives an acknowledgement (Line 6), or it has waited for this acknowledgement as long as it is going to (Line 8).

<sup>6</sup> Recommended values for the constant difs are given in [19].

In the first case, the message is handled just as in Process 1, except that after acknowledging this message the protocol returns to Process 4. In the second case the network layer is informed that the sending of dframe was successful and the process loops back to Process 1 (Line 7). Line 8 describes the situation where no acknowledgement message arrives and the process times out. Here CSMA/CA retries to send the message; the counter tries is incremented.

**Process 4.** Receiving an ACK

```
ACKRECV(id,tries,acktimeout,dframe) def
                                     =
 1. [new(dataframe(data,src,id))] deliver(data) .
 2. (
 3. [[timeout := now + sifs]] [now ≥ timeout]
 4. transmit(ackframe(src)) . ACKRECV(id,tries,acktimeout,dframe)
 5. )
 6. + [new(ackframe(id))] /* acknowledgement received */
 7. deliver(success) . CSMA(id)
 8. + [now ≥ acktimeout] INIT(id,tries+1,dframe)
```
### **5.2 The Hidden Station Problem**

As mentioned in the introduction to this section, CSMA/CA suffers from the hidden station problem. This refers to the situation where two nodes A and C are not within transmission range of each other, while a node B is in range of both. In this situation C may be transmitting to B, but A is not able to sense this, and thus may start a transmission to B at roughly the same time, leading to data collisions at B.

While CSMA/CA is not able to avoid such collisions as a whole—it is always possible that two (or more) nodes hidden from each other happen to (randomly) choose the same backoff time to send messages—it is the exponential growth of the backoff slots that makes the problem less pressing in the long run, as the following theorem shows.

**Theorem 5.1.** If max retransmit=∞ then weak packet delivery holds with probability 1.

*Proof sketch.* Since the number of messages that nodes transmit is bounded, and all nodes select random times to start transmitting out of an increasing longer time span, with probability 1 each message will eventually go through. -

In practice, max retransmit is set to a value that is not high enough to approximate the idea behind the above proof. In fact, the transmission time of a single message may be larger than the maximal backoff period allowed. For this reason the hidden station problem does occur when running the CSMA/CA protocol, as studies have shown [5]. Nevertheless, the above analysis still shows that link layer protocols can be formally analysed by process algebra in general, and ALL in particular.

**Fig. 1.** RTS/CTS exchange

#### **5.3 A Formal Model for CSMA/CA with Virtual Carrier Sensing**

To overcome the hidden station problem the usage of a request-to-send/clearto-send (RTS/CTS) handshaking [19] mechanism is available. This mechanism is also known as *virtual carrier sensing*. The exchange of RTS/CTS messages happens just before the actual data is sent, see Fig. 1. The mechanism serves two purposes: (a) As the RTS and CTS messages are very short—they only contain two node identifiers as well as a natural number indicating the time it will take to send the actual data (plus overhead)—the likelihood of a collision is reduced. (b) While the handshaking does not help with solving the hidden station problem for the RTS message itself, it avoids the problem for the sending of data. The reason is that a hidden node, which could interfere with the sending of data will receive the CTS message from the designated recipient of data, and the hidden node will remain silent until the data has been sent.

As for the CSMA/CA protocol we have modelled this extension in ALL, based on the model of CSMA/CA we presented earlier.

Our extended model uses two functions to generate rts and cts messages, respectively. The signature of both is ID × ID × TIME → MSG. The first argument carries the sender (source) of the message, the second the indented destination, and the third argument a duration (time period) of silence that is requested/granted. For example, before the message rts(src,dest,d) is transmitted, the time period d is calculated by

The calculation is straightforward as it follows the protocol logic and determines the amount of time needed until the acknowledgement would be received (see Fig. 2). After the rts message has been received the medium should be idle for the interframe space sifs; then a cts message is sent back, which takes time dur cts; then another interframe space is needed, followed by the actual transmission of the message—the sending will take dur(dataframe(data,id,dest)) time units; after the message is received (hopefully) another interframe space is required before the acknowledgement is sent back.

**[[**d := sifs+dur cts+sifs+dur(dataframe(data,id,dest))+sifs+dur ack**]]**.

Process 2 remains essentially unchanged; it is merely equipped with the destination dest of the message that needs to be transmitted, and an additional timed variable nav ∈ TIME. These variables are not used in this process, but required later on. Variable nav holds the point in time until the process should

**Fig. 2.** The use of virtual channel sensing using CSMA/CA [3]

not transmit any rts or cts message. This period of silence is necessary as the node figures out that until time nav another node will transmit message(s).<sup>7</sup>

Process 5 is the modified version of Process 1. Identical to Process 1 it awaits an instruction from the network layer, or an incoming CSMA message destined for itself. Lines 1–3 are identical to Process 1. Lines 4–11 handle the two new message types. In case an rts message rts(src,dest,d) is received that is intended for another recipient (dest = id) the node concludes that another node wants to use the medium for the amount of d time units; the process updates the variable nav if needed, indicating the period the node should remain silent, by taking the maximum of the current value of nav, and now+d, the point in time until the sender src of the rts message requires the medium. The same behaviour occurs if a cts message is received that is not intended for the node itself (Line 4). If the incoming message is an rts message intended for the node itself (Line 6) by default the node answers with a clear-to-send message back to the sender (Line 9). However, when the receiver of the rts has knowledge about other nodes requiring the medium (now ≤ nav), a clear-to-send cannot be granted, and the request is dropped (Line 6). Similar to the sending of an acknowledgement (Line 2), the process waits for the short interframe space (sifs) before sending the CTS (Line 6). Line 8 handles the case where the medium becomes busy (¬idle) during this period; also here a clear-to-send cannot be granted, and the request is dropped.<sup>8</sup> Only when the medium stays idle during the entire interframe space the node id can inform the source of the rts message that the medium is clear to send; the cts is transmitted in Line 9. The time a receiver of this message has to be silent is adjusted by deducting the time elapsed before this happens. In Line 10 the process resets nav to remind itself not to issue any rts message until the present exchange has been completed.<sup>9</sup>

<sup>7</sup> After a successful RTS/CTS exchange, communicating nodes proceed with trans-

mitting the data and an acknowledgement regardless of the value of nav. <sup>8</sup> The condition now <sup>&</sup>gt; timeout−sifs prevents the process from dropping the request in the very first time slice that CSMA is running. Here the medium counts as busy,

but only because we have just received an rts message. <sup>9</sup> A case new(cts(src,dest,d)) <sup>∧</sup> dest <sup>=</sup> id is not required as a cts message is only expected in case an rts was sent, and hence handled in process RTSREACT.

**Process 5.** The Basic Routine (RTS/CTS)

```
CSMA(id,nav) def
             =
 1. newpkt(data,dest). INIT(id,dest,0,dataframe(data,id,dest),nav)
 2. + [new(dataframe(data,src,id))] deliver(data) . [[timeout := now + sifs]]
 3. [now ≥ timeout] transmit(ackframe(src)) . CSMA(id,nav)
 4. + [(new(rts(src,dest,d)) ∨ new(cts(src,dest,d))) ∧ dest = id ∧ nav < now+d]
 5. [[nav := now+d]] CSMA(id, nav)
 6. + [new(rts(src,id,d)) ∧ now > nav] [[timeout := now + sifs]]
 7. (
 8. [¬idle ∧ now > timeout−sifs] CSMA(id, nav)
 9. + [idle ∧ now ≥ timeout] transmit(cts(id,src,d−dur cts−sifs)) .
10. [[nav := now+d−dur cts−sifs]] CSMA(id, nav)
11. )
```
**Process 6.** Clear Channel Assessment With Virtual Carrier Sense

```
CCA(id,dest,b,tries,dframe,nav) def
                              =
 1. [new(dataframe(data,src,id))] deliver(data) . [[timeout := now + sifs]]
 2. [now ≥ timeout] transmit(ackframe(src)) . CCA(id,dest,b,tries,dframe,nav)
 3. + [(new(rts(src,dest,d)) ∨ new(cts(src,dest,d))) ∧ dest = id ∧ nav < now+d]
 4. [[nav := now+d]] CCA(id,dest,b,tries,dframe,nav)
 5. + [new(rts(src,id,d)) ∧ now > nav] [[timeout := now + sifs]]
 6. (
 7. [¬idle ∧ now > timeout−sifs] CCA(id,dest,b,tries,dframe,nav)
 8. + [idle ∧ now ≥ timeout] transmit(cts(id,src,d−dur cts−sifs)) .
 9. [[nav := now+d−dur cts−sifs]] CCA(id,dest,b,tries,dframe,nav)
10. )
11. + [idle ∧ now > nav]
12. [[timeout:=now+difs]]
13. (
14. [¬idle] CCA(id,dest,b,tries,dframe,nav)
15. + [idle ∧ now ≥ timeout]
16. [[timeout := now + b]]
17. (
18. [¬idle] /* busy during backoff time */
19. [[b := timeout − now]] CCA(id,dest,b,tries,dframe,nav)
20. + [idle ∧ now ≥ timeout] /* idle for backoff time */
21. [[d := sifs + dur cts + sifs + dur(dframe) + sifs + dur ack]]
22. transmit(rts(id,dest,d)) .
23. CTSRECV(id,dest,tries,now + max cts wait,dframe,nav)
24. )
25. )
```
Process 6 is the modified version of Process 3. The goal of this process is to send an rts message (Line 22). Before it can start its work, it waits until the medium is idle, and any time it is required to be silent has elapsed (Line 11). Until this happens incoming data frames, rts or cts messages are treated just as in Process 5: Lines 1–10 copy Lines 2–11 of Process 5, except that afterwards the process returns to itself. Then Lines 12–20 are copied from Lines 7–15 from Process 3. Line 21 calculates the time other nodes ought to keep silent when receiving the rts message, and Line 23 passes control to the process CTSRECV, which awaits a cts response to the rts message transmitted in Line 22. The fourth argument of CTSRECV specifies the maximum time that process should wait for such a response; a good value for max cts wait is sifs + dur cts.

Process CTSRECV listens for this time to a cts message with source dest and destination id. In case the expected cts message arrives in time (Line 1), the node waits for a time sifs (Line 2) and then transmits the data frame and proceeds to await an acknowledgement (Line 3). The fourth argument of ACKRECV specifies the maximum time the process should wait for such an acknowledgement; a good value for max ack wait is sifs+dur ack. If the cts message does not arrive in time (Line 6), the process returns to INIT to send another rts message, while incrementing the counter tries (Line 7). While waiting for the cts message, any incoming rts or cts message destined for another node is treated exactly as in Process 5 (Lines 4–5). Incoming data frames cannot arrive when this process is running, and incoming rts messages to id are ignored.

```
Process 7. Receiving a CTS
```

```
CTSRECV(id,dest,tries,ctstimeout,dframe,nav) def
                                               =
 1. [new(cts(dest,id,d))]
 2. [[timeout := now + sifs]] [now ≥ timeout]
 3. transmit(dframe) . ACKRECV(id,dest,tries,now + max ack wait,dframe,nav)
 4. + [(new(rts(src,dest,d)) ∨ new(cts(src,dest,d))) ∧ dest = id ∧ nav < now+d]
 5. [[nav := now+d]] CTSRECV(id,dest,tries,ctstimeout,dframe,nav)
 6. + [now ≥ ctstimeout]
 7. INIT(id,dest,tries+1,dframe,nav)
```
#### **Process 8.** Receiving an ACK

```
ACKRECV(id,dest,tries,acktimeout,dframe,nav) def
                                              =
 1. [new(ackframe(id))]
 2. deliver(success) . CSMA(id,nav)
 3. + [(new(rts(src,dest,d))∨new(cts(src,dest,d)))∧dest = id∧nav < now+d]
 4. [[nav := now+d]] ACKRECV(id,dest,tries,acktimeout,dframe,nav)
 5. + [now ≥ timeout] /* nothing received */
 6. INIT(id,dest,tries+1,dframe,nav)
```
Process 8 handles the receipt of an acknowledgement in response to a successful data transmission. If an acknowledgement arrives, it must be from the node to which id has transmitted a data frame. In that case (Line 1), the network layer is informed that the sending of dframe was successful and the process loops back to Process 5 (Line 2). Line 5 describes the situation where no acknowledgement message arrives and the process times out. Also here CSMA/CA retries to send the message; the counter tries is incremented. Lines 3–4 describe the usual handling of incoming rts or cts messages destined for another node.

#### **5.4 The Exposed Station Problem**

Another source of collisions in CSMA/CA is the well-known *exposed station problem*. This refers to a linear topology <sup>A</sup> <sup>−</sup> <sup>B</sup> <sup>−</sup> <sup>C</sup> <sup>−</sup> <sup>D</sup>, where an unending stream of messages between C and D interferes with attempts by A to get a message across to B. In the default CSMA/CA protocol as formalised in Sect. 5.1, transmissions from A to B may perpetually collide at B with transmissions from C destined for D. CSMA/CA with virtual carrier sensing mitigates this problem, for a cts sent by B in response to an rts sent by A will tell C to keep silent for the required duration. In fact, we can show that in the above topology, if max retransmit=∞ then packet delivery holds with probability 1. A nonprobabilistic guarantee cannot be given since nodes A and C could behave in the same way, meaning if one node is sending out a message the other does the same at the very same moment, and if one is silent the other remains silent as well. In this scenario all messages to be sent are doomed.

Based on our formalisation, we can prove that once the RTS/CTS handshake has been successfully concluded, meaning that all nodes within range of the intended recipient have received the cts, then packet delivery holds outright. So the only problem left is to achieve a successful RTS/CTS handshake. Since rts and cts messages are rather short, even by modest values of max retransmit it becomes likely that such messages do not collide.

In spite of this, CSMA/CA with (or without) virtual channel sensing cannot achieve packet delivery with probability 1 for general topologies. Assume the following network topology

Here it may happen that one of the Cis is always busy transmitting a large message to Di; any given C<sup>i</sup> is occasionally silent (not sending any message), but then one of the others is transmitting. As <sup>C</sup><sup>i</sup> is disconnected from <sup>C</sup><sup>j</sup> , for <sup>j</sup> <sup>=</sup> <sup>i</sup>, coordination between the nodes is impossible. As a consequence, the medium at A will always be busy, so that A cannot send an rts message from B.

#### **6 Related Work**

The CSMA protocol in its different variants has been analysed with different formalisms in the past.

Multiple analyses were performed for the CSMA/CD protocol (CSMA with collision detection), a predecessor of CSMA/CA that has a constant backoff, i.e. the backoff time is not increased exponentially, see [10,11,20,21,26]. In all these approaches frame collisions have to be modelled explicitly, as part of the protocol description. In contrast, our approach handles collisions in the semantics; thereby achieving a clear separation between protocol specifications and link layer behaviour.

Duflot et al. [10,11] use probabilistic timed automata (PTAs) to model the protocol, and use probabilistic model checking (PRISM) and approximate model checking (APMC) for their analysis. The model explained in [26] is based on PTAs as well, but uses the model checker Uppaal as verification tool. These approaches, although formal, have very little in common with our approach. On the one hand it is not easy to change the model from CSMA/CD to CSMA/CA, as the latter requires unbounded data structures (or alike) to model the exponential backoff. On the other hand, as usual, model checking suffers from state space explosion and only small networks (usually fewer than ten nodes) can be analysed. This is sufficient and convenient when it comes to finding counter examples, but these approaches cannot provide guarantees for arbitrary network topologies, as ours does.

Jensen et al. [20] use models of CSMA/CD to compare the tools SPIN and Uppaal. Their models are much more abstract than ours. It is proven that no collisions will ever occur, without stating the exact conditions under which this statement holds.

To the best of our knowledge, Parrow [21] is the only one who used process algebra (CCS) to model and analyse CSMA. His untimed model of CSMA/CD is extremely abstract and the analysis performed is limited to two nodes only, avoiding scenarios such as the hidden station problem.

There are far fewer formal analyses techniques available when it comes to CSMA/CA (with and without virtual medium sensing). Traditional approaches to the analysis of network protocols are simulation and test-bed experiments. This is also the case for CSMA/CA (e.g. [4]). While these are important and valid methods for protocol evaluation, in particular for quantitative performance evaluation, they have limitations in regards to the evaluation of basic protocol correctness properties.

Following the spirit of the above-mentioned research of model checking CSMA, Fruth [15] analyses CSMA/CA using PTAs and PRISM. He considers properties such as the minimum probability of two nodes successfully completing their transmissions, and maximum expected number of collisions until two nodes have successfully completed their transmissions. As before, this analysis technique does not scale; in [15] the experiments are limited to two contending nodes only.

Beyond model checking, simulation and test-bed experiments, we are only aware of two other formal approaches. In [1] Markov chains are used to derive an accurate, analytical model to compute the throughput of CSMA/CA. Calculating throughput is an orthogonal task to our vision of proving (functional) correctness.

An approach aiming at proving the correctness of CSMA/CA with virtual carrier sensing (RTS/CTS), and hence related to ours, is presented in [3]. Based on stochastic bigraphs with sharing it uses rewrite rules to analyse quantitative properties. Although it is an approach that is capable to analyse arbitrary topologies, to apply the rewrite rules a particular topology needs to be modelled by a directed acyclic graph structure, which is part of the bigraph.

# **7 Conclusion**

In this paper we have proposed a novel process algebra, called ALL, that can be used to model, verify and analyse link layer protocols. Since we aimed at a process algebra featuring aspects of the link layer such as frame collisions, as well as arbitrary data structures (to model a rich class of protocols), we could not use any of the existing algebras. The design of ALL is layered. The first layer allows modelling protocols in some sort of pseudo code, which hopefully makes our approach accessible for network and software researchers/engineers. The other layers are mainly for giving a formal semantics to the language. The layer of partial network expressions, the third layer, provides a unique and sophisticated mechanism for modelling the collision of frames. As it is hard-wired in the semantics there is no need to model collisions manually when modelling a protocol, as it was done before [21]. Next to primitives needed for modelling link layer protocols (e.g. **transmit**) and standard operators of process algebra (e.g. nondeterministic choice), ALL provides an operator for probabilistic choice.

This operator is needed to model aspects of link layer protocols such as the exponential backoff for the Carrier-Sense Multiple Access with Collision Avoidance protocol, the case study we have chosen to demonstrate the applicability of ALL. We have modelled and analysed two versions of CSMA/CA, without and with virtual carrier sensing. Our analysis has confirmed the hidden station problem for the version without virtual carrier sensing. However, we have also shown that the version with virtual carrier sensing overcomes not only this problem, but also the exposed station problem with probability 1. Yet the protocol cannot guarantee packet delivery, not even with probability 1.

To perform this analysis we had to formalise suitable liveness properties for link layer protocols specified in our framework.

**Acknowledgement.** We thank Tran Ngoc Ma for her involvement in this project in a very early phase. We also like to thank the German Academic Exchange Service (DAAD) that funded an internship of the third author at Data61, CSIRO.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Program Analysis and Automated Verification

# **Data Races and Static Analysis for Interrupt-Driven Kernels**

Nikita Chopra, Rekha Pai(B), and Deepak D'Souza

Indian Institute of Science, Bangalore, India {nikita,rekhapai,deepakd}@iisc.ac.in

**Abstract.** We consider a class of interrupt-driven programs that model the kernel API libraries of some popular real-time embedded operating systems and the synchronization mechanisms they use. We define a natural notion of data races and a happens-before ordering for such programs. The key insight is the notion of *disjoint blocks* to define the synchronizeswith relation. This notion also suggests an efficient and effective lockset based analysis for race detection. It also enables us to define efficient "sync-CFG" based static analyses for such programs, which exploit data race freedom. We use this theory to carry out static analysis on the FreeRTOS kernel library to detect races and to infer simple relational invariants on key kernel variables and data-structures.

**Keywords:** Static analysis · Interrupt-driven programs · Data races

# **1 Introduction**

Embedded software is widespread and increasingly employed in safety-critical applications in medical, automobile, and aerospace domains. These programs are typically multi-threaded applications, running on uni-processor systems, that are compiled along with a kernel library that provides priority-based scheduling, and other task management and communication functionality. The applications themselves are similar to classical multi-threaded programs (using lock, semaphore, or queue based synchronization) although they are distinguished by their priority-based execution semantics. The kernel on the other hand typically makes use of non-standard low-level synchronization mechanisms (like disablingenabling interrupts, suspending the scheduler, and flag-based synchronization) to ensure thread-safe access to its data-structures. In the literature such software (both applications and kernels) are referred to as *interrupt-driven* programs. Our interest in this paper is in the subclass of interrupt-driven programs corresponding to kernel libraries.

Efficient static analysis of concurrent programs is a challenging problem. One could carry out a precise analysis by considering the *product* of the control flow graphs (CFGs) of the threads, however this is prohibitively expensive due to the exponential number of program points in the product graph. A promising direction is to focus on the subclass of *race-free* programs. This is an important class of programs, as most developers aim to write race-free code, and one could try to exploit this property to give an efficient way of analyzing programs that fall in this class. In recent years there have been many techniques [7,11,12,18,21] that exploit the race-freedom property to perform sound and efficient static analysis. In particular [11,21] create an appealing structure called a "sync-CFG" which is the *union* of the control flow graphs of the threads augmented with possible "synchronization" edges, and essentially perform sequential analysis on this graph to obtain sound facts about the concurrent program. However these techniques are all for classical lock-based concurrent programs. A natural question asks if we can analyze interrupt-driven programs in a similar way.

There are several challenges in doing this. Firstly one needs to define *what* constitutes a data race in a generalized setting that includes these programs. Secondly, how does one define the happens-before order, and in particular the *synchronizes-with* relation that many of the race-free analysis techniques rely on, given the ad-hoc synchronization mechanisms used in these programs.

A natural route that suggests itself is to translate a given interrupt-driven program into one that uses classical locks, and faithfully captures the interleaved executions of the original program. One could then use existing techniques for lock-based concurrency to analyze these programs. However, this route is fraught with many challenges. To begin with, it is not clear how one would handle flagbased synchronization which is one of the main synchronization mechanisms used in these programs. Even if one could handle this, such a translation *may not* preserve data races, in that the original program might have had a race but the translated program does not. Finally, some of the synchronizes-with edges in the translated program are clearly unnecessary, leading to imprecise data-flow facts in the analyses.

In this paper, we show that it is possible to take a more organic route and address these challenges in a principled way that could apply to other nonstandard classes of concurrent systems as well. Firstly, we propose a general definition of a data race that is not based on a happens-before order, but on the operational semantics of the class of programs under consideration. The definition essentially says that two statements s and t can race, if two notional "blocks" around them can *overlap* in time during an execution. We believe that this definition accurately captures what it is that a programmer tries to avoid while dealing with shared variables whose values matter. Secondly we propose a way of defining the *synchronizes-with* relation, based on the notion of *disjoint blocks*. These are statically identifiable pairs of path segments in the CFGs of different threads that are guaranteed to never overlap (in time) during an execution of the program, much like blocks of code that lie between an acquire and release of the same lock. This relation now suggests a natural sync-CFG structure on which we can perform analyses like value-set (including interval, null-deference, and points-to analysis), and region-based relational invariant analysis, in a sound and efficient manner. We also use the notion of disjoint blocks to define an efficient and precise lock-set-based analysis for detecting races in interrupt-driven programs.

We implement some of these analyses on the FreeRTOS kernel library [3] which is one of the most widely used open-source real-time kernels for embedded systems, comprising about 3,500 lines of C code. Our race-detection analysis reports a total of 64 races in kernel methods, of which 18 turn out to be true positives. We also carry out a region-based relational analysis using an implementation based on CIL [22]/Apron [15], to prove several relational invariants on the kernel variables and abstracted data-structures.

# **2 Overview**

We give an overview of our contributions via an illustrative example modelled on a portion of the FreeRTOS kernel library. Figure 1 shows an interrupt-driven program that contains a main thread that first initializes the kernel variables. The variables represent components of a message queue, like msgw (the number of messages waiting in the queue), len (max length of the queue), wtosend (the number of tasks waiting to send to the queue), wtorec (the number of tasks waiting to receive from the queue), and RxLock (a counter which also acts as a synchronization flag that mediates access to the waiting queues). The main thread then creates (or spawns) two threads: *qsend* which models the kernel API method for sending a message to the queue, and *qrec ISR* which models a method for receiving a message, and which is meant to be called from an interrupt-service routine. The basic semantics of this program is that the ISR thread can interrupt *qsend* at any time (provided interrupts are not disabled), but always runs to completion itself. The threads use disableint/enableint to disable and enable interrupts, suspendsch/resumesch to suspend/resume the scheduler (thereby preventing preemption by another non-ISR thread), and finally flag-based synchronization (using the RxLock variable), as different means to ensure mutual exclusion.

Our first contribution is a general notion of data races which is applicable to such programs. We say that two conflicting statements s and t in two different threads are involved in a data race if assuming s and t were enclosed in a notional "block" of skip statements, there is an execution in which the two blocks "overlap" in time. The given program can be seen to be free of races. However if we were to remove the disableint statement of line 10, then the statements accessing msgw in lines 12 and 42 would be racy, since soon after the access of msgw in *qsend* at line 12, there could be preemption by *qrec ISR* which goes on to execute line 42.

Next we illustrate the notion of "disjoint blocks" which is the key to defining synchronizes-with edges, which we need in our sync-CFG analysis as well as to define an appropriate happens-before relation. Disjoint blocks are also used in our race-detection algorithm. A pair of blocks of code (for example any of the like-shaded blocks of code in the figure) are *disjoint* if they can never overlap during an execution. For example, the block comprising lines 11–14 in *qsend* and the whole of *qrec ISR*, form a pair of disjoint blocks.

Next we give an analysis for checking race-freedom, by adapting the standard lockset analysis [24] for classical concurrent programs. We associate a unique

**Fig. 1.** An interrupt-driven program modelled on the FreeRTOS kernel library. Similarly shaded blocks denote disjoint blocks. Some of the sync-with edges are shown in dashed lines. Some edges like 22 → 41 and 49 → 20 have been omitted for clarity.

lock with each pair of disjoint blocks, and add notional acquires and releases of this lock at the beginning and end (respectively) of these blocks. We now do the standard lockset analysis on this version of the program, and declare two accesses to be non-racy if they hold sets of locks with a non-empty intersection.

Finally, we show how to do data-flow analysis for such programs in a sound and efficient way. The basic idea is to construct a "sync-CFG" for the program by unioning the control-flow graphs of the threads, and adding *sync* edges that capture the synchronizes-with edges (going from the end of a block to the beginning of its paired block), for example line 14 to line 41 and line 49 to line 11. The sync-edges are shown by dashed arrows in the figure. We now do a standard "value-set" analysis (for example interval analysis) on this graph, keeping track of a set of values each variable can take. The resulting facts about a variable are guaranteed to be sound at points where the variable is accessed (or even "owned" in the sense that a notional read of the variable at that point is non-racy). For example an interval analysis on this program would give us that 0 < msgw at line 14. Finally, we could do a region-based value-set analysis, by identifying regions of variables that are accessed as a unit – for example msgw and len could be in one region, while wtosend and wtorec could be in another. The figure shows some facts inferred by a polyhedral analysis based on these regions, for the given program.

# **3 Interrupt-Driven Programs**

The programs we consider have a finite number of (static) threads, with a designated "main" thread in which execution begins. The threads access a set of shared global variables, some of which are used as "synchronization flags", using a standard set of commands like assignment statements of the form x := e, conditional statements (if-then-else), loop statements (while), etc. In addition, the threads can use commands like disableint, enableint (to disable and enable interrupts, respectively), suspendsch, resumesch (to suspend and resume the scheduler, respectively), while the main thread can also create a thread (enable it for execution). Table 1 shows the set of basic statements *cmd*V,T over a set of variables V and a set of threads T.

We allow standard integer and Boolean expressions over a set of variables V . For an integer expression e over V , and an environment φ for V , we denote by e<sup>φ</sup> the integer value that e evaluates to in φ. Similarly for a Boolean expression b, we denote the Boolean value (*true* or *false*) that b evaluates to in φ by bφ. For a set of environments Φ for a set of variables V , we define the set of integer values that e can evaluate to in an environment in Φ, by e<sup>Φ</sup> = {e<sup>φ</sup> | φ ∈ Φ}. Similarly, for a boolean expression b, we define the set of environments in Φ that satisfy b to be b<sup>Φ</sup> = {φ ∈ Φ | <sup>b</sup><sup>φ</sup> <sup>=</sup> *true*}.

Each thread is of one of two *types*: "task" threads that are like standard threads, and "ISR" threads that represent threads that run as interrupt service routines. The *main* thread is a task thread, which is the only task thread enabled initially. The *main* thread can enable other threads (both task and ISR) for execution using the create command. Task threads can be preempted by other task threads (whenever interrupts are not disabled, and the scheduler is not suspended) or by ISR threads (whenever interrupts are not disabled). On the other hand ISR threads cannot be preempted and are assumed to run to completion.

Only task threads are allowed to use disableint, enableint, suspendsch and resumesch commands. Similarly, if flag-based synchronization is used, only task threads can modify the flag variable, while an ISR can only check whether the flag is set or not, and perform some actions accordingly.

Formally we represent an interrupt-driven program P as a tuple (V,T) where V is a finite set of integer variables, and T is a finite set of named threads. Each thread <sup>t</sup> <sup>∈</sup> <sup>T</sup> has a *type* which is one of *task* or *ISR*, and an associated controlflow graph of the form G<sup>t</sup> = (Lt, *s*t, *inst* <sup>t</sup>) where L<sup>t</sup> is a finite set of *locations* of thread <sup>t</sup>, *<sup>s</sup>*<sup>t</sup> <sup>∈</sup> <sup>L</sup><sup>t</sup> is the *start* location of thread <sup>t</sup>, *inst* <sup>t</sup> <sup>⊆</sup> <sup>L</sup><sup>t</sup> <sup>×</sup> *cmd*V,T <sup>×</sup> <sup>L</sup><sup>t</sup> is a finite set of *instructions* of thread t.

Some definitions related to threads will be useful going forward. We denote by L<sup>P</sup> = - <sup>t</sup>∈<sup>T</sup> <sup>L</sup><sup>t</sup> the disjoint union of the thread locations. Whenever <sup>P</sup> is clear


**Table 1.** Basic statements *cmd V,T* over variables *V* and threads *T*

from the context we will drop the subscript of P from L<sup>P</sup> and its decorations. For a location *<sup>l</sup>* <sup>∈</sup> <sup>L</sup> we denote by *tid*(*l*) the thread <sup>t</sup> which contains location *<sup>l</sup>*. We denote the set of instructions of P by *inst*<sup>P</sup> = - <sup>t</sup>∈<sup>T</sup> *inst* <sup>t</sup>. For an instruction <sup>ι</sup> <sup>∈</sup> *inst* <sup>t</sup>, we will also write *tid*(ι) to mean the thread <sup>t</sup>. For an instruction <sup>ι</sup> <sup>=</sup> *l*, c, *<sup>l</sup>* , we call *<sup>l</sup>* the *source* location, and *<sup>l</sup>* the *target* location of ι.

We denote the set of commands appearing in program P by *cmd*(P). We will consider an assignment x := e as a *write-access* to x, and as a *read-access* to every variable that appears in the expression e. Similarly, assume(b) is considered to be a read-access of every variable that occurs in expression b. We say two accesses are *conflicting* accesses if they are read/write accesses to the same variable, and at least one of them is a write. We assume that the control-flow graph of each thread comes from a well-structured program. Finally, we assume that the *main* thread begins by initializing the variables to constant values. Figure 2 shows an example program and the control-flow-graphs of its threads.

We define the operational semantics of an interrupt-driven program using a labeled transition system (LTS). Let P = (V,T) be a program. We define an LTS T<sup>P</sup> = (Q, Σ, s,⇒) corresponding to P, where:


(a) Example program

(b) Control-flow-graph representation

**Fig. 2.** An example program and its CFG representation.

set to *main* (this is a dummy value as it is used only when the scheduler is suspended), interrupts are enabled, and the scheduler is not suspended. – For an instruction <sup>ι</sup> <sup>=</sup> *l*, c, *<sup>l</sup>* in *inst*<sup>P</sup> , with *tid*(ι) = <sup>t</sup>, we define

$$(pc, \phi, enab, rt, it, id, ss) \Rightarrow\_{\iota} (pc', \phi', enab', rt', it', id', ss')$$

iff the following conditions are satisfied:

	- <sup>∗</sup> If <sup>c</sup> is the skip command then <sup>φ</sup> <sup>=</sup> <sup>φ</sup>, *enab* <sup>=</sup> *enab*, *id* <sup>=</sup> *id*, and *ss* = *ss*.
	- ∗ If c is an assignment statement of the form x := e then φ = φ[x → eφ], *enab* = *enab*, *id* = *id*, and *ss* = *ss*.
	- ∗ If c is a command of the form assume(b) then b<sup>φ</sup> = *true*, φ = φ, *enab* = *enab*, *id* = *id*, and *ss* = *ss*.
	- <sup>∗</sup> If <sup>c</sup> is a create(u) command then <sup>t</sup> <sup>=</sup> *main*, <sup>φ</sup> <sup>=</sup> <sup>φ</sup>, *enab* <sup>=</sup> *enab*∪{u}, *id* = *id*, and *ss* = *ss*.
	- <sup>∗</sup> If <sup>c</sup> is the disableint command then <sup>φ</sup> <sup>=</sup> <sup>φ</sup>, *enab* <sup>=</sup> *enab*, *id* <sup>=</sup> *true*, and *ss* = *ss*.
	- <sup>∗</sup> If <sup>c</sup> is the enableint command then <sup>φ</sup> <sup>=</sup> <sup>φ</sup>, *enab* <sup>=</sup> *enab*, *id* <sup>=</sup> *false*, and *ss* = *ss*.
	- <sup>∗</sup> If <sup>c</sup> is the suspendsch command then <sup>φ</sup> <sup>=</sup> <sup>φ</sup>[ssflag → 1], *enab* <sup>=</sup> *enab*, *id* = *id*, and *ss* = *true*.
	- <sup>∗</sup> If <sup>c</sup> is the resumesch command then <sup>φ</sup> <sup>=</sup> <sup>φ</sup>[ssflag → 0], *enab* <sup>=</sup> *enab*, *id* = *id*, and *ss* = *false*.

• In addition, the transitions set the new running thread rt and interrupted task it as follows. If t is an ISR thread, *ss* is true, and ι is the first statement of t then it = rt, rt = t. If t is an ISR thread, *ss* is true, and ι is the last statement of t then it = it, rt = it. In all other cases, rt = t and it = it.

An execution σ of P is a finite sequence of transitions in T<sup>P</sup> from the initial state s: σ = τ0, τ1,...,τ<sup>n</sup> (n ≥ 0) from ⇒, such that there exists a sequence of states q0, q1,...,q<sup>n</sup>+1 from Q, with q<sup>0</sup> = s and τ<sup>i</sup> = (qi, ιi, q<sup>i</sup>+1) for each 0 ≤ i ≤ n. Wherever convenient we will also represent an execution like σ above as a sequence of the form q<sup>0</sup> ⇒<sup>ι</sup><sup>0</sup> q<sup>1</sup> ⇒<sup>ι</sup><sup>1</sup> ···⇒<sup>ι</sup>*<sup>n</sup>* q<sup>n</sup>+1. We say that a state q ∈ Q is *reachable* in program P if there is an execution of P leading to state q.

### **4 Data Races and Happens-Before Ordering**

In this section we propose a definition of a data race which has general applicability, and also define a natural happens-before order for interrupt-driven programs.

#### **4.1 Data Races**

Data races have typically been defined in the literature in terms of a *happensbefore* order on program executions. In the classical setting of lock-based synchronization, the happens-before relation is a partial order on the instructions in an execution, that is reflexive-transitive closure of the union of the *program-order* relation between two instructions in the same thread, and the *synchronizes-with* relation which relates a release of a lock in a thread to the next acquire of the same lock in another thread. Two instructions in an execution are then defined to be involved in a data race if they are conflicting accesses to a shared variable and are *not* ordered by the happens-before relation.

We feel it is important to have a definition of a data race that is based on the operational semantics of the class of programs we are interested in, and not on a happens-before relation. Such a definition would more tangibly capture what it is that a programmer typically tries to avoid when dealing with shared variables whose consistency she is worried about. Moreover, when coming up with a definition of the happens-before order (the synchronizes-with relation in particular) for non-standard concurrent programs like interrupt-driven programs, it is useful to have a reference notion to relate to. For instance, one could show that a proposed happens-before order is strong enough to ensure the absence of races.

We propose to define a race between two conflicting statements in a program in terms of whether two imaginary blocks enclosing each of these statements can *overlap* in an execution. Let us consider a multi-threaded program P in a class of concurrent programs with a certain operational execution semantics. Consider a block of contiguous instructions in a thread t of a program P and another block in thread t of P. We say that these two blocks are involved in a *high-level race* in an execution of P if they *overlap* with each other during the execution, in that one block begins *in between* the beginning and ending of the other. We say two conflicting statements s and t in P are involved in a *data race* (or are *racy*), if the following condition is true: Consider the program P which is obtained from P by replacing the statement s by the block "skip; s; skip", and similarly for statement t. Then there is an execution of P in which the two blocks containing s and t are involved in a high-level race. The definition is illustrated in Fig. 3. We say a program P is *race-free* if no pair of instructions in it are racy.

**Fig. 3.** Illustrating the definition of a data race on statements *s* and *t*. A program *P*, its transformation *P*- , and an execution of *P*in which the blocks overlap.

The rationale for this definition is that the concerned statements s and t may be compiled down to a sequence of instructions (represented by the blocks with skip's around s and t) depending on the underlying processor and compiler, and if these instructions interleave in an execution, it may lead to undesirable results.

To illustrate the definition, consider the program in Fig. 2a. The accesses to x in line 7 and line 11 can be seen to be racy, since there is an execution of the augmented program P in which *t1* performs the skip followed by the increment to x at line 7, followed by a context switch to thread *t2* which goes on to execute lines 9 and 10 and then the read of x in line 11. On the other hand, the version of the program in which line 7 is enclosed in a disableint-enableint block, does *not* contain a race.

We note that for classical concurrent programs, it might suffice to define a race as *consecutive* occurrences of conflicting accesses in an execution, as done in [4,17]. However, this definition is not general enough to apply to interrupt-driven programs. By this definition, the statements in lines 7 and 11 of the program in Fig. 2a are *not* racy, as there is *no* execution in which they happen consecutively. This is because the disableint-enableint block containing the access in line 11 is "atomic" in that the statements in the block must happen contiguously in any execution, and hence the instructions corresponding to line 7 and line 11 can never happen immediately one after another.

#### **4.2 Disjoint Blocks and the Happens-Before Relation**

Now that we have a proposed definition of races, we can proceed to give a principled way to define the happens-before relation for our class of interruptdriven programs. The main question is how does one define the synchronizeswith relation. Our insight here is that the key to defining the synchronizes-with relation lies in identifying what we call *disjoint blocks* for the class of programs. Disjoint blocks are statically identifiable pairs of path segments in the CFGs of different threads, which are guaranteed by the execution semantics of the class of programs never to *overlap* in an execution of the program. Disjoint block structures – for example in the form of blocks enclosed between locks/unlocks of the same lock – are the primary mechanism used by developers to ensure racefreedom. The synchronizes-with relation in an execution can then be defined as relating, for every pair (A, B) of disjoint blocks in the program, the end of block A to the beginning of the succeeding occurrence of block B in the execution. The happens-before order for an execution can now be defined, as before, in terms of the program order and the synchronizes-with order, and is easily seen to be sufficient to ensure non-raciness.

Let us illustrate this hypothesis on classical lock-based programs. The disjoint block pairs for this class of programs are segments of code enclosed between acquires and releases of the *same* lock; or the portion of a thread's code before it spawns a thread t, and the whole of thread t's code; and similarly for joins. The synchronizes-with relation between instructions in an execution essentially goes from a release to the succeeding acquire of the same lock. If two accesses are related by the resulting happens-before order, they clearly cannot be involved in a race.

We now focus on defining a happens-before relation based on disjoint blocks for our class of interrupt-driven programs. We have identified eight pairs of disjoint block patterns for this class of programs, which are depicted in Fig. 4. We use the following types of blocks to define the pairs. A block of type D is a path segment in a task thread that begins with a disableint and ends with an enableint with no intervening enableint in between. A block of type S is a path segment in a task thread that begins with a suspendsch and ends with a resumesch with no intervening resumesch. An I block is an initial and terminating path segment in an ISR thread (i.e. begins with the first instruction and ends with a terminating instruction). Similarly, for a task thread t, T<sup>t</sup> is an initial and terminating path in t, while M<sup>t</sup> is an initial segment of the main thread that ends with a create(t) command. A block of type Cssf lag is a path segment in an ISR thread corresponding to the then block of a conditional that checks if ssflag = 0. For a synchronization flag f, C<sup>f</sup> is the path segment in an ISR thread corresponding to the then block of a conditional that checks if f = 0. Finally F<sup>f</sup> is a segment between statements that set f to 1 and back to 0, in a task thread. We also require that an F<sup>f</sup> segment be within the scope of a suspendsch command.

We can now describe the pairs of disjoint blocks depicted in Fig. 4. Case (a) says that two D blocks in different task threads are disjoint. Clearly two such blocks can never overlap in an execution, since once one of the blocks begins execution no context-switch can occur until interrupts are enabled again. Case (b) says that D and I blocks are disjoint. Once again this is because once the D block

**Fig. 4.** Disjoint blocks in an interrupt-driven program.

begins execution no ISR can run until interrupts are enabled again, and once an ISR begins execution it runs to completion without any context-switches. Case (e) says that S blocks in different task threads are disjoint, because once the scheduler is suspended no context-switch to another task thread can occur. Case (f) says that M<sup>t</sup> and T<sup>t</sup> blocks are disjoint, since a thread cannot begin execution before it is created in main. Case (g) says that an S block is disjoint from a Cssflag block. This is because once the scheduler is suspended by the suspendsch command, and even if a context-switch to an ISR occurs, the then block of the if statement will not execute. Conversely, if the ISR is running there can be no context-switch to another thread. Finally, case (h) is similar to case (g). We note that the disjoint block pairs are not ordered (the relation is symmetric).

We can now define the synchronizes-with relation as follows. Let σ = q<sup>0</sup> ⇒<sup>ι</sup><sup>0</sup> <sup>q</sup><sup>1</sup> <sup>⇒</sup><sup>ι</sup><sup>1</sup> ···⇒<sup>ι</sup>*<sup>n</sup>* <sup>q</sup><sup>n</sup>+1 be an execution of <sup>P</sup>. We say instruction <sup>ι</sup><sup>i</sup> *synchronizeswith* an instruction <sup>ι</sup><sup>j</sup> of <sup>P</sup> in <sup>σ</sup>, if i<j, *tid*(ιi) <sup>=</sup> *tid*(ι<sup>j</sup> ), and there exists a pair of disjoint blocks A and B, with ι<sup>i</sup> ending block A and ι<sup>j</sup> beginning block B. As usual we say ι<sup>i</sup> is *program-order* related to ι<sup>j</sup> iff i<j and *tid*(ιi) = *tid*(ι<sup>j</sup> ). We define the *happens-before* relation on σ as the reflexive-transitive closure of the union of the program-order and synchronizes-with relations for σ.

We can now define a *HB-race* in an execution σ of P as follows: we say that two instructions ι<sup>i</sup> and ι<sup>j</sup> in σ are involved in a *HB-race* if they are conflicting instructions that are *not* ordered by the happens-before relation in σ. We say that two instructions in P are *HB-racy* if there is an execution of P in which they are involved in a HB-race. Finally, we say a program P is *HB-race-free* if no two of its instructions are HB-racy.

Once again, it is fairly immediate to see that if two statements of a program are not involved in a HB-race, they cannot be involved in a race. Further, if two statements belong to disjoint blocks, then they are clearly happens-before ordered in every execution. Hence belonging to disjoint blocks is sufficient to ensure that the statements are happens-before ordered, which in turn ensures that the statements cannot be involved in a race.

# **5 Sync-CFG Analysis for Interrupt-Driven Programs**

In this section we describe a way of lifting a sequential value-set analysis in a sound way for a HB-race free interrupt-driven program, in a similar way to how it is done for lock-based concurrent programs in [11]. A value-set analysis keeps track of the set of values each variable can take at each program point. The basic idea is to create a "sync-CFG" for a given interrupt-driven program P, which is essentially the union of the CFGs of each thread of P, along with "may-synchronize-with" edges between statements that may be synchronizeswith related in an execution of P, and then perform the value-set analysis on the resulting graph. Whenever the given program is *HB-race free*, the result of the analysis is guaranteed to be sound, in a sense made clear in Theorem 1.

#### **5.1 Sync-CFG**

We begin by defining the "sync-CFG" for an interrupt-driven program. It is on this structure that we will do the value-set analysis. Let P = (V,T) be an interrupt-driven program, and let G be the disjoint union (over threads <sup>t</sup> <sup>∈</sup> <sup>T</sup>) of the CFGs <sup>G</sup>t. We define a set of *may-synchronize-with* edges in <sup>G</sup>, denoted *MSW* (G), as follows. The edges correspond to the pairs of disjoint blocks depicted in Fig. 4, in that they connect the ending of one block to the beginning of the other block in the pair. Consider two instructions <sup>ι</sup> <sup>=</sup> *l*, c, m ∈ *inst* <sup>t</sup> and <sup>κ</sup> <sup>=</sup> *<sup>l</sup>* , c , m ∈ *inst* <sup>t</sup>- , with t = t . We add the edge (m, *l* ) in *MSW* (G), iff for some pair of disjoint blocks (A, B), ι ends a block of type A in thread t and κ begins a block of type B in thread t . For example, corresponding to a (D, D) pair of disjoint blocks, we add the edge (m, *l* ) when c is an enableint command, and c is a disableint command.

The sync-CFG induced by P is the control flow graph given by G along with the additional edges in *MSW* (G). Figure 6 shows a program P<sup>2</sup> and its induced sync-CFG.

#### **5.2 Value Set Analysis**

We first spell out the particular form of abstract interpretation we will be using. It is similar to the standard formulation of [9], except that it is a little more general to accommodate non-standard control-flow graphs like the sync-CFG.

An *abstract interpretation* of a program P = (V,T) is a structure of the form A = (D, ≤, do, F) where


An abstract interpretation A = (D, ≤, d0, F) of P induces a "global" transfer function F<sup>A</sup> : D → D, given by FA(d) = d<sup>0</sup> <sup>ι</sup>∈*inst<sup>P</sup>* <sup>F</sup>ι(d). This transfer function can also be seen to be monotonic. By the Knaster-Tarski theorem [28], <sup>F</sup><sup>A</sup> has a least fixed point (*LFP*) in <sup>D</sup>, which we denote by *LFP*(FA), and refer to as the resulting value of the analysis.

<sup>A</sup> *value set* for a set of variables <sup>V</sup> is a map *vs* : <sup>V</sup> <sup>→</sup> <sup>2</sup><sup>Z</sup>, associating a set of integer values with each variable in V . A value set *vs* induces a set of environments <sup>Φ</sup>*vs* in a natural way: <sup>Φ</sup>*vs* <sup>=</sup> {<sup>φ</sup> <sup>|</sup> for all <sup>x</sup> <sup>∈</sup> V, φ(x) <sup>∈</sup> *vs*(x)} (i.e. essentially the Cartesian product of the values sets). Conversely, a set of environments Φ for V , induces a value set *valset*(Φ) given by *valset*(Φ)(x) = {<sup>v</sup> <sup>∈</sup> <sup>Z</sup> | ∃<sup>φ</sup> <sup>∈</sup> Φ, φ(x) = <sup>v</sup>}, which is the "projection" of the environments to each variable x ∈ V . Finally, we define a point-wise ordering on value sets as follows: *vs vs* iff *vs*(x) <sup>⊆</sup> *vs* (x) for each variable x in V . We denote the least element in this ordering by *vs*<sup>⊥</sup> <sup>=</sup> λx.∅.

We can now define the value-set analysis A*vset* for an interrupt-driven program P = (V,T) as follows. Let A*vset* = (D, ≤, d0, F) where


$$d\_0 = \lambda l. \begin{cases} \lambda x.\{0\} \text{ if } l = s\_{main} \\ vs\_\perp \text{ } \text{ otherwise.} \end{cases}$$

	- If c is the command x := e then Fι(d) = d where

$$d'(m) = \begin{cases} vs\_l^d[x \mapsto [e]\_\#] \text{ if } m = l'\\ vs\_\perp & \text{otherwise.} \end{cases}$$

• If c is the command assume(b), then Fι(d) = d where

$$d'(m) = \begin{cases} valset(\lbrack b \rbrack\_{\Phi}) \text{ if } m = l'\\ vs\_{\perp} \qquad \text{otherwise.} \end{cases}$$

• If c is any other command (skip, disableint, enableint, suspendsch, resumesch, or create) then Fι(d) = d where

$$d'(m) = \begin{cases} vs\_l^d & \text{if } m = l'\\ vs\_\perp & \text{otherwise.} \end{cases}$$

Figure 6 shows the results of a value-set analysis on the sync-CFG of program P2. The data-flow facts are shown just before a statement, at selected points in the program.

*Soundness.* The value-set analysis is sound in the following sense: if P is a *HBrace free* program, and we have a reachable state of P at a location l in a thread where a variable x is *read*; then the value of x in this state is contained in the value-set for x, obtained by the analysis at point l. More formally:

**Theorem 1.** *Let* P = (V,T) *be an HB-race free interrupt-driven program, and let* <sup>d</sup><sup>∗</sup> *be the result of the analysis* <sup>A</sup>*vset on* <sup>P</sup>*. Let* <sup>l</sup> *be a location in a thread* <sup>t</sup> <sup>∈</sup> <sup>T</sup> *where a variable* <sup>x</sup> *is read (i.e.* <sup>P</sup> *contains an instruction of the form l*, c, *<sup>l</sup> where* <sup>c</sup> *is a read access of* <sup>x</sup>*). Let* <sup>φ</sup> *be an environment at* <sup>l</sup> *reachable via some execution of* <sup>P</sup>*. Then* <sup>φ</sup>(x) <sup>∈</sup> <sup>d</sup>∗(l)(x)*.*

The proof of this theorem is similar to the one for classical concurrent programs in [11] (see [10] for a more accurate proof). The soundness claim can be extended to locations where a variable is "owned" (which includes locations where it is read). We say a variable x is *owned* by a thread t at location l, if an inserted read of x at this point is non-HB-racy in the resulting program.

*Region-Based Analysis.* One problem with the value-set analysis is that it may not be able to prove *relational* invariants (like <sup>x</sup> <sup>≤</sup> <sup>y</sup>) for a program. One way to remedy this is to exploit the fact that concurrent programs often ensure racefree access to a *region* of variables, and to essentially do a region-based value-set analysis, as originally done in [21]. More precisely, let us say we have a partition of the set of variables V of a program P into a set of regions R1,...,Rn. We classify each read (write) access to a variable x in a region R, as an read (write) access to region R. We say that two instructions in an execution of P are involved in a *HB-region-race*, if the two instructions are conflicting accesses to the same region R, and are *not* happens-before ordered in the execution. A program is *HB-region-race free* if none of its executions contain a HB-region-race.

We can now define a region-based version of the value-set analysis for a program P, which we call A*rvset*. The value-set for a region R is a set of valuations (or sub-environments) for the variables in R. The transfer functions are defined in an analogous way to the value-set analysis. The analogue of Theorem 1 for regions gives us that for a HB-region-race free program, at any location where a region R is accessed, the region-value-set computed by the analysis at that point will contain every sub-environment of R reachable at that point.

# **6 Translation to Classical Lock-Based Programs**

In this section we address the question of why an execution-preserving translation to a classical lock-based program is not a fruitful route to take. In a nutshell, such a translation would not preserve races and would induce a sync-CFG with many unnecessary MSW edges, leading to much more imprecise facts than the analysis on the native sync-CFG described in the previous section. We also describe how our approach can be viewed as a *lightweight* translation of an interrupt-driven program to a classical lock-based one. The translation is "lightweight" in the sense that it does *not* attempt to preserve the execution semantics of the given interrupt-driven program, but instead preserves races and the sync-CFG structure of the original program.

#### **6.1 Execution-Preserving Lock Translation**

One could try to translate a given interrupt-driven program P into a classical lock-based program P <sup>L</sup> in a way that preserves the interleaved execution semantics of P. By this we mean that every execution of P has a corresponding execution in P <sup>L</sup> that follows essentially the same sequence of interleaved instructions from the different threads (modulo of course the synchronization statements which may differ); and vice-versa. For example, to capture the semantics of disableint-enableint, one could introduce an "execution" lock E which is acquired in place of disabling interrupts, and released in place of enabling interrupts. Every instruction in a task thread outside a disableint-enableint block must also acquire and release E immediately before and after the instruction. Note that the latter step is necessary if we want to capture the fact that once a thread disables interrupts it cannot be preempted by any thread. Figure 5a shows an interrupt-driven program P<sup>1</sup> and its lock translation P <sup>L</sup> <sup>1</sup> in Fig. 5b. There are still issues with the translation related to re-entrancy of locks and it is not immediately clear how one would handle flag-based synchronization – but let us keep this aside for now.

The first problem with this translation is that it does not preserve race information. Consider the program P<sup>1</sup> in Fig. 5a and its translation P <sup>L</sup> <sup>1</sup> . The original program clearly has a race on x in statements 4 and 9. However the translation P <sup>L</sup> <sup>1</sup> does *not* have a race as the accesses are protected by the lock E. Hence checking for races in P <sup>L</sup> does not substitute for checking in P. An alternative around this would be to first construct P (recall that this is the version of P in which we introduce the skip-blocks around statements we want to check for races), then construct its lock translation (P )<sup>L</sup>, and check this program for *high-level* races on the introduced skip-blocks. However this is expensive as it involves a 3x blow-up in going from P to P and another 3x blow-up in going from P to (P )<sup>L</sup>. Further, checking for high-level races (for example using a lock-set analysis) is more expensive than just checking for races. In contrast, as we show next, our lock-set analysis on the native program P does not incur any of these expenses.

```
main:
1. x := y := t := 0;
2. create(t1);
3. create(t2);
t1: t2:
4. x := x + 1; 8. disableint;
5. disableint; 9. t := x;
6. x := y; 10. enableint;
7. enableint;
                              main:
                              1. x := y := t := 0;
                              2. spawn(t1);
                              3. spawn(t2);
                              t1: t2:
                              4. lock(E) 10. lock(E);
                              5. x := x + 1; 11. t := x;
                              6. unlock(E) 12. unlock(E);
                              7. lock(E)
                              8. x := y;
                              9. unlock(E)
                                                            main:
                                                            1. x := y := t := 0;
                                                            2. spawn(t1);
                                                            3. spawn(t2);
                                                            t1: t2:
                                                            4. x := x + 1; 8. lock(A);
                                                            5. lock(A); 9. t := x;
                                                            6. x := y; 10. unlock(A);
                                                            7. unlock(A);
```
(a) Example program *P*<sup>1</sup> (b) Exec-preserving trans. *P <sup>L</sup>* 1 (c) Lightweight trans. *P <sup>W</sup>* 1

**Fig. 5.** Example program *P*1, and its lock and lightweight translations *P <sup>L</sup>* <sup>1</sup> , *P <sup>W</sup>* <sup>1</sup> .

The second problem with a precise lock translation is that the sync-CFG of the translated program has many unnecessary MSW-edges, leading to imprecision in the ensuing analysis. Consider the program P<sup>2</sup> in Fig. 6, and its lock translation P <sup>L</sup> <sup>2</sup> in Fig. 7. P<sup>2</sup> is similar to P<sup>1</sup> except that line 4 is now an increment of y instead of x, and the resulting program is race-free (in fact HB-race-free). Notice that the may-sync-with edges from line 13 to 4, and line 6 to 10 in the sync-CFG of P <sup>L</sup> <sup>2</sup> in Fig. 7 are *unnecessary* (they are not present in the native sync-CFG) and lead to imprecise facts in an interval analysis on this graph. Some of the final facts in an interval analysis on these graphs are shown alongside the programs in Figs. 6 and 7. In particular the analysis on P <sup>L</sup> <sup>2</sup> is unable to prove the assertion in line 10 of the original program.

#### **6.2 A Lightweight Lock-Translation**

Our disjoint block-based approach of Sect. 5 can be viewed as a *lightweight* lock translation which does not attempt to preserve execution semantics, but preserves disjoint blocks and hence also races and the sync-CFG structure of the original interrupt-driven program.

**Fig. 6.** Program *P*<sup>2</sup> with its Sync-CFG and facts from an interval analysis

**Fig. 7.** Lock translation *P <sup>L</sup>* <sup>2</sup> of *P*2, with its Sync-CFG and interval analysis facts

Let us first spell out the translation. Let us fix an interrupt-driven program P = (V,T). The idea is simply to introduce a lock corresponding to each pattern of disjoint block pairs listed in Fig. 4, and to insert at the entry and exit to these blocks an acquire and release (respectively) of the corresponding lock. For each of the cases (a) through (h) we introduce locks named A through H, with some exceptions. Firstly, for case (f) regarding the create of a thread t, we simply translate these as a spawn(t) command in a classical lock-based programming language, which has a standard acquire-release semantics. Secondly, for case (h), we need a copy of H for *each* thread t, which we call Ht. This is because the concerned blocks (say between a set and unset of the flag f) are *not* disjoint across *task* threads, but only with the "then" block of an ISR thread statement that checks if f = 0. The ISR thread now acquires the set of locks {H<sup>t</sup> | t ∈ T} at the beginning of the "then" block of the if statement, and releases them at the end of that block. We call the resulting classical lock-based program P <sup>W</sup> . Figure 5c shows this translation for the program P1.

Figure 8 shows this translation along with the sync-CFG edges and some of the final facts in an interval analysis for the program P2.

It is not difficult to see that P <sup>W</sup> allows all executions that are possible in P. However it also allows more: for example the execution of P <sup>W</sup> <sup>1</sup> (Fig. 5c) in which thread t1 preempts t2 at line 9 to execute the statement at line 4, is *not* allowed in P1. Thus it only *weakly* captures the execution semantics of P. However, every race in P is also a race in P <sup>W</sup> . To see this, suppose we have a race on statements s and t in P. This means there is a high-level race on the two skip blocks around s and t in the augmented program P . Since an execution exhibiting the highlevel race on these blocks would also be present in (P )<sup>W</sup> which is identical to (P <sup>W</sup> ) , it follows that the corresponding statements are racy in P <sup>W</sup> as well.

Further, since our translation preserves disjoint blocks by construction, if s and t are in disjoint blocks in P, the corresponding statements will be in disjoint blocks in P <sup>W</sup> ; and vice-versa. It follows that the sync-CFGs induced by P and P <sup>W</sup> are essentially isomorphic (modulo the synchronization statements). As a result, any value-set-based analysis will produce identical results on the two graphs.

Finally, if statements s and t are HB-racy in P, they must also be HB-racy in P <sup>W</sup> . This is because disjoint blocks are preserved and the synchronizes-with relation is inherited from the disjoint blocks. Hence the execution witnessing the HB-race in P would also be present in P <sup>W</sup> , and would also witness a HB-race on the corresponding statements.

We summarize these observations below:

**Proposition 1.** *Let* P *be an interrupt-driven program and* P <sup>W</sup> *the classical lock program obtained using our lightweight lock translation. Then:*


**Fig. 8.** Our lightweight translation *P <sup>W</sup>* <sup>2</sup> of *P*2, with its Sync-CFG and interval analysis facts

#### **6.3 Lockset Analysis for Race Detection**

For classical lock-based programs, the lockset analysis [24] essentially tracks whether two statements are in disjoint blocks. Here two blocks are disjoint if they hold the same lock for the duration of the block. When two statements are in disjoint blocks, they are necessarily happens-before ordered, and hence this gives us a way to declare pairs of statements to be non-HB-racy.

A lockset analysis computes the set of locks held at each program point as follows: at program entry it is assumed that no locks are held. When a call to acquire(*l*) is encountered, the analysis adds the lock l at the *out* point of the call. When a call to release(l) is encountered the lockset at the *out* point of the call is the lockset computed at the *in* point with the lock l removed. For any other statement, the lockset from the *in* point of the statement is copied to its *out* point. The *join* operation is the simple intersection of the input locksets. Once locksets are computed at each point, a pair of conflicting statements s and

t in different threads are declared to *may* HB-race if the locksets held at these points have no lock in common.

Using our lock translation above, we can detect races as follows. Given an interrupt-driven program P, we first translate it to the lock-based program P <sup>W</sup> , and do a lockset analysis on P <sup>W</sup> . If any pair of conflicting statements s and t are found to be may-HB-racy in P <sup>W</sup> , we declare them to be may-HB-racy in P. By Proposition 1(2), it follows that this is a sound analysis for interrupt-driven programs.

# **7 Analyzing the FreeRTOS Kernel Library**

We now perform an experimental evaluation of the proposed race detection algorithm and sync-CFG-based relational analysis for interrupt-driven programs. We use the FreeRTOS kernel library [3], on which our interrupt-driven program semantics are based, to perform our evaluation. FreeRTOS is a collection of functions mostly written in C, that an application developer compiles with and invokes in the application code. We view the FreeRTOS kernel library as an interrupt-driven program as follows: we build an interrupt-driven program out of

the FreeRTOS kernel as shown in the figure alongside. The main thread is responsible for initializing the kernel data structures and then creating two threads: a *task* thread which branches out calling each task kernel API function, and loops on this; and an *ISR* thread which similarly branches and loops on the ISR kernel API functions. FreeRTOS provides versions of API functions that can be called from interrupt service routines. These functions have "FromISR" appended to their name. While it is sufficient to have one ISR thread, we assume (in the analysis) that there could be any number of task threads

running. To achieve this we simply add sync-edges *within* each task kernel function, in addition to the usual sync-edges between task functions. We used FreeR-TOS version 10.0.0 for our experiments. We conducted these experiments on an Intel Core i7 machine with 32 GB RAM running Ubuntu 16.04.

#### **7.1 Race Detection**

We consider 49 task and queue API functions that can be called from an application (termed top-level functions) for race detection. The functions operating on semaphores and mutexes were not considered.

We prepared the API functions for analysis, in two steps: (1) inlining and (2) lock insertion, as follows: The function vTaskStartScheduler and the queue initialization code in the function xQueueGenericCreate were treated as part of the main thread, which initializes kernel data structures. All the helper function calls made inside the top-level functions were inlined. After inlining, the functions are modified to acquire and release locks using the strategy explained in Sect. 6.2. We consider each pair of disjoint blocks as taking the same distinct lock. For example, the pair of disjoint blocks protected by disableint-enableint take lock A. That is disableint is replaced with acquire(A) and enableint is replaced with release(A). A total of 9 locks corresponding to disjoint blocks were employed in the modification of the FreeRTOS code. The two steps outlined above are automated. Inlining is achieved using the inline pass in the CIL framework [22]. Lock insertion is accomplished using a script.

The modified code, which has over 3.5K lines of code, is used for race detection. We tracked 24 variables and check whether the statements accessing them are racy. These variables include fields in the queue data-structure, task control block, and queue registry, as well as variables related to tasks. FreeRTOS maintains lists for the states of the tasks like "ready", "suspended", "waiting to send", etc. The pointers to these lists are also analysed. Access to any portion of a list (like the delayed list) is treated as an access of a corresponding variable of the same name.

Races are detected in this modified FreeRTOS code in three steps - (1) compute locks held, (2) identify whether access of a variable is a read or write, and (3) report potential races. First a lockset analysis, as explained in Sect. 6.3, to compute locks held at each access to variables, is implemented as a pass in CIL. The modified FreeRTOS code is analyzed using this new pass and the lockset at each access to the 24 variables of interest is computed. Then, a writes pass to identify whether accesses to variables are "read" or "write", also implemented in CIL, is run on the modified FreeRTOS code. Finally, a shell script to interpret both the results in the previous steps and report potential races is employed. The script identifies the conflicting access pairs (using the writes pass) and the locks held by the conflicting accesses (using lockset pass).

Our analysis reports 64 pairs of conflicting accesses as being potentially racy. On manual inspection we classified 18 of them are real races and the rest as false positives. Table 2 summarizes our findings. The second column in the table lists the variables of interest involved in the race, like various task list pointers, queue registry fields pcQueueName and xHandle, task variable uxCurrentNumberOfTasks, tick count xTickCount, etc. The third column lists the functions in which the conflicting accesses are made and the fourth gives the number of racing pairs. The fifth column assesses the potential races based on our manual inspection of the code. The analysis took 3.91 s.

The false positives were typically due to the fact that we had abstracted data-structures (like the delayed list which is a linked-list) by a synonymous variable. Thus even if the accesses were to different parts of the structure (like the container field of a list item and the next pointer of a different list item) our analysis flagged them as races.

We were in touch with the developers of FreeRTOS regarding the 18 pairs we classified as true positives. The 14 races on the queue registry were deemed to be non-issues as the queue delete function is usually invoked only once the application is about to terminate. The 2 races on uxCurrentNumberOfTasks are known (going by comments in the code) but are considered benign as the variable is of "base type". The remaining couple of races on the delayed task lists appear to be real issues as they have been fixed (independent of our work) in v10.1.1.

#### **7.2 Region-Based Relational Analysis**

Our aim here is to do a region-based interval and polyhedral analysis of a regionrace-free subset of the FreeRTOS kernel APIs, and to prove some simple assertions about the kernel variables in each region.

We first identified six regions for this purpose. One region corresponds to variables protected by disabling interrupts (like xTickCount, xNextTaskUnblockTime, etc.), while variables protected by suspend and resume scheduler commands (like uxPendedTicks, xPendingReadyList, etc.) are in another region. Fields of the queue structure like pcHead, pcTail, etc. are in a third region, while the waiting lists for a queue form another region. The queue registry fields like pcQueueName and xHandle are in region 5. The pointer variable pxCurrentTCB, pointing to the current Task Control Block (TCB), is put in the sixth region.

The FreeRTOS code was modified further to reflect access to regions. For this new variables R1,...,R6, are declared. Wherever there is a write (or read) access to a variable in region i an assignment statement that defines (or reads from) variable R<sup>i</sup> is inserted just before the access. This is done using a script which takes the result of the writes pass to find where in the source code an appropriate assignment statement has to be inserted. We selected 15 APIs that did not contain any region races.

Next, we prepared the API functions for the analysis in two steps. They are described below:

*Abstraction of FreeRTOS API Functions.* We abstracted the FreeRTOS source code to prepare it for the relational analysis. In this abstraction, we basically model the various lists (ready list, delayed list) by their lengths and the value at the head of the list (if required). Using this abstraction, we are able to convert list operations to operations on integers.

Similarly, to model insertion into a list, we abstract it by incrementing the variable which represents the length of the list. We abstracted all the API functions in a similar fashion.

*Creation of the Sync-CFG.* The next step is to create a sync-CFG out of the abstracted program. For doing this, we used the abstracted version of the FreeR-TOS code (along with acquire-release added as explained in Sect. 7.1).


**Table 2.** Potential races

Next, we used a script to insert non-deterministic gotos from the point of release of a lock to the acquire of the same lock. Since we are using gotos for creation of sync-CFG, we keep all the API functions in main itself and evaluate a non-deterministic "if" condition before entering the code for an API function.

*Results.* For the purpose of analysis we listed out some numerical relations between kernel variables in the same region, which we believed should hold. We identified a total of 15 invariants including 4 invariants which involve relations between kernel variables. We then inserted assertions for these invariants at the key points in our source code like the exit of a block protecting a region.

We have implemented an interval-based value-set analysis and a region-based octagon and polyhedral analysis for C programs using CIL [22] as the front-end and the Apron library (version 0.9.11) [16]. We represent the sync-with edges of the sync-CFG of a program using goto statements from the source (release) to the target (acquire) of the may-synchronizes-with (MSW) edges.

We ran our implementation on the abstracted version of the FreeRTOS kernel library, with the aim of checking how many of the invariants it was able to prove. The abstracted code along with addition of gotos is about 1500 lines of code. We did a preliminary interval analysis on this abstracted sync-CFG and were able to prove 11 out of these 15 invariants. With a widening threshold of 30, the interval analysis takes under 5 min to run. As expected, the interval analysis could not prove the relational invariants.

We then did a region-based polyhedral analysis using the six regions identified above. For the region-based analysis, we used convex polyhedra domain with a widening threshold of 30. It is able to prove all the assertions we believed to be true. The analysis takes about 30 min to complete with the convex polyhedra domain and about 20 min with the octagon domain.

The results obtained by our analysis are shown in Table 3.


**Table 3.** Relational analysis results

# **8 Related Work**

We classify related work based on the main topics touched upon in this paper.

*Data Races.* Adve and Hill [1] introduce the notion of a data race using a happens-before relation, and identify instructions that form release-acquire pairs, for low-level concurrent programs. Boehm and Adve [4] define races in terms of consecutive occurrences in a sequentially consistent execution, as well as using a happens-before order, in the context of the C++ semantics. They show their notions are equivalent as far as race-free programs go. As pointed out earlier, the definition of races as consecutive occurrences is inadequate in our setting. Schwarz *et al.* [26] define a notion of data race for priority-based interrupt-driven programs, where there is a single main task and multiple ISRs. A race occurs when the main thread is accessing a variable at a certain dynamic priority, and an ISR thread with higher priority also accesses the variable. Our definition can be seen to be stronger and more accurately captures racy situations. In particular, if the ISR thread with higher priority does not actually execute the conflicting access, due to say a condition not being enabled, then we would *not* call it a race. The term "high-level" race was coined by Artho *et al.* [2]. Our definition of a high-level race follows that of [20].

*Analysis of Interrupt-Driven Programs.* Regehr and Cooprider [23] describe a source-to-source translation of an interrupt-driven program to a standard multithreaded program, and analyze the translated program for races. Their translation is inadequate for our setting in many ways: in particular, disable-enable of interrupts is translated by acquiring and releasing all ISR-specific locks; however this does not prevent interaction with another task while one task has disabled interrupts. In [8] they also describe an analysis framework for constantpropagation analysis on TinyOS applications. They use a similar idea of adding "control-flow" edges between disable-enable blocks and ISRs. However no soundness argument is given, and other kinds of blocks (suspend/resume, flag-based synchronization) are not handled. The works in [5,6,13] analyze timing properties, interrupt-latency, and stack sizes for interrupt-driven programs, using model-checking, algebraic, and algorithmic approaches. Schwarz *et al.* [25,26] give analyses for race-detection and invariants based on linear-equalities for their aforementioned class of priority-based interrupt-driven programs. Our work differs in several ways: Their analysis is directed towards *applications* (we target *libraries* where task priorities do not matter), their analyses are specific (we provide a basis for carrying out a variety of value-set and relational analyses, targeting race-free programs), they consider priority and flag-based synchronization (but not disable-enable and suspend-resume based synchronization). Sung and others [27] consider interrupt-driven applications in the form of ISRs with different priorities, and perform interval-based static analysis for checking assertions. They do not handle libraries and do not leverage race-freedom. Finally, [20] uses a model-checking approach to find all high-level races in FreeRTOS with a completeness guarantee.

*Analysis of Race-Free Programs.* Chugh *et al.* [7] use race information to do thread-modular null-dereference analysis, by killing facts at a point whenever a notional read of a variable is found to be racy. De *et al.* [11] propose the sync-CFG and value-set analysis for race-free programs, while Mukherjee *et al.* [21] extend the framework to region and relational analyses. Gotsman *et al.* [12] and Min´e *et al.* [18,19] define relational shape/value analyses for concurrent programs that exploit race-freedom and lock invariants respectively. All these works are for classical lock-based synchronization while we target interrupt-driven programs.

# **9 Conclusion**

In this paper our aim has been to give efficient static analyses for classes of non-standard concurrent programs like interrupt-driven kernels, that exploit the property of race-freedom. Towards this goal, we have proposed a definition of data races which we feel is applicable to general concurrent programs. We have also proposed a general principle for defining synchronizes-with edges, which is the key ingredient of a happens-before relation, based on the notion of disjoint blocks. We have implemented our theory to perform sound and effective static analysis for race-detection and invariant inference, on the popular real-time kernel FreeRTOS.

We feel this framework should be applicable to other kinds of concurrent systems, like other embedded kernels (for example TI-RTOS [14]) and application programs, and event-driven programs. There are additional challenges in these systems like priority-based preemption and priority inheritance conventions which need to be addressed. Apart from investigating these systems we would like to apply this theory to perform other static analyses like null-dereference, points-to, and shape analysis, for these non-standard classes of concurrent programs.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **An Abstract Domain for Trees with Numeric Relations**

Matthieu Journault1(B), Antoine Min´e1,2(B) , and Abdelraouf Ouadjaout1(B)

<sup>1</sup> Sorbonne Universit´e, CNRS, Laboratoire d'Informatique de Paris 6, LIP6, 75005 Paris, France {matthieu.journault,antoine.mine,abdelraouf.ouadjaout}@lip6.fr <sup>2</sup> Institut universitaire de France, Paris, France

**Abstract.** We present an abstract domain able to infer invariants on programs manipulating trees. Trees considered in the article are defined over a finite alphabet and can contain unbounded numeric values at their leaves. Our domain can infer the possible shapes of the tree values of each variable and find numeric relations between: the values at the leaves as well as the size and depth of the tree values of different variables. The abstract domain is described as a product of (1) a symbolic domain based on a tree automata representation and (2) a numerical domain lifted, for the occasion, to describe numerical maps with potentially infinite and heterogeneous definition set. In addition to abstract set operations and widening we define concrete and abstract transformers on these environments. We present possible applications, such as the ability to describe memory zones, or track symbolic equalities between program variables. We implemented our domain in a static analysis platform and present preliminary results analyzing a tree-manipulating toy-language.

# **1 Introduction**

The abstract interpretation framework [5] enables the development of sound static analyzers by inferring and proving invariants on reachable states of programs. Invariants in the scope of abstract interpretation are elements of a lattice called an abstract domain. Most domains focus on numeric or pointer variables. By contrast, we propose an abstract domain for variables whose values are tree data-structures. Tree values appear natively in some languages (such as OCaml) and applications (such as the DOM in web programming) or can be encoded through pointer manipulations (as in C). Trees can abstract terms in logic programming. A tree domain can also be useful to collect symbolic expressions appearing in a program.

c The Author(s) 2019

This work is supported by the European Research Council under Consolidator Grant Agreement 681393 – MOPSA.

L. Caires (Ed.): ESOP 2019, LNCS 11423, pp. 724–751, 2019. https://doi.org/10.1007/978-3-030-17184-1\_26

```
typedef struct node
{
  int data;
  struct node * next;
} node ;
node * append( node * head , int data)
{
  if (head==NULL) {
    return (create(data , NULL));
  } else {
    node *cursor=head;
    while (cursor ->next != NULL)
      cursor=cursor ->next;
    node * new_node=create(data ,NULL);
    cursor ->next=new_node;
    return head;
  }
}
   Program 1: Append to list in C
                                         float golden_ratio(int n) {
                                           int i = 0;
                                           float r = 1;
                                           while (i < n) {
                                             r = 1 + 1 / r;
                                             i += 1;
                                           }
                                           return r;
                                         }
                                            Program 2: Golden ratio in C
                                         let rec fxn=
                                           match n with
                                           | 0 -> []
                                           | _ -> (x+1)::(x-1)::(f x (n-1))
                                         let () =
                                           (*Assume x:int and n:int >=0*)
                                           let t=fxn in
                                           match t with
                                           | [] -> ()
                                           | p :: q when p > x -> ()
                                           | _ -> assert false
```
Program 3: List type in OCaml

*Used Memory Zones.* Program 1 describes an append function defined in the C language, this function adds an integer at the end of a linked list. The infinite set of unbounded terms of the form \*(\*( ...\*(head + 4) ...+ 4) + 4) represents memory zones that are used by the append function. Our analyzer is able to infer and represent such sets of terms. This provides the information that Program 1 does not use any of the data field of the linked list. Such a function would be fairly commonly called in a real-life project. In a classical top-down static analysis by abstract interpretation, function calls are inlined at each call site. A way to improve scalability is to design modular analyzers able to reuse previous analysis results (as emphasized in [7]). In order to be able to successfully reuse function body analysis, input states must be unified. Moreover the cost of performing the analysis of the body of functions grows with the number of variables that need to be tracked. A common way to deal with both problems is to use framing on the inputs of the functions (as in separation logic [25]). This improves (1) precision: as we know that they are not modified by the function call, (2) body analysis efficiency: as the input state is reduced and finally (3) modularity: as constraints on the usage of the first analysis are relaxed by the removal of constraints.

*Symbolic Relations.* Program 2 is a C function computing an approximation of the golden ration (as it is the limit of the sequence r<sup>0</sup> = 1, r<sup>n</sup>+1 =1+ <sup>1</sup> <sup>r</sup><sup>n</sup> ). As classical numerical domains can not represent such numerical relations, methods were proposed to track symbolic equality between expressions (see [23]). However such methods can not handle the unbounded iteration of Program 2. The set of reachable states at the end of Program 2 can be expressed by r = 1+1/(1 + 1/... 1 ...) with depth n. Please note that to infer such results we need to express numerical relations between the size of trees and the numeric variables from the program.

*Numerical Environment.* Consider now the OCaml Program 3, we want to prove that the assert false expression is never reached. This program builds a list of size 2 ∗ n with alternating values x+ 1 and x−1. The assertion states that the head of the list is x+ 1. After the definition of t there are two types of reachable states. (1) Those that have not gone through the loop (<sup>t</sup> → [], <sup>x</sup> → <sup>Z</sup>, <sup>n</sup> → 0), and (2) those that have gone through at least one iteration of the loop: (t → [a1;a2;a3; ...], x → α, n > 0, a<sup>1</sup> → α + 1, a<sup>2</sup> → α − 1, a<sup>3</sup> → α + 1), where <sup>α</sup> <sup>∈</sup> <sup>Z</sup>. Therefore we need to be able to keep numerical relations between the parametric and unbounded number of numeric values appearing in t and numeric variables from the program. Classical numeric domains do not provide out-ofthe-box abstractions for sets of partially defined numerical functions, therefore we define such an abstraction. As an example of analysis result, the memory representation obtained by our analysis for t describes the set of trees of the form: Cons(a, Cons(b, Cons(a, ..., Nil) ...)) where a = x + 1 and b = x−1. Therefore we are able to prove that the assert false expression is never reached.

*Contributions.* The main contributions of the article are threefold: (1) The extension of results on tree automata to the abstract interpretation framework by definition of a widening operator, in order to represent the set of tree shapes that a variable can contain. (2) The definition of a numerical domain built upon classical abstract domains able to represent sets of partial numerical maps with heterogeneous and unbounded definition sets. This is necessary to represent the numeric values at the leaves of a set of trees, as trees are unbounded and can contain a different number of leaves. (3) The definition of a novel abstraction for trees that can contain numerical values at their leaves. This last domain combines the abstractions (1) and (2). Moreover it is relational as it can express relations between numerical values found in trees and in the rest of the program, and relations between trees. Finally all results were implemented in an existing framework and experimented on a toy-language.

*Limitations.* At this point, analyses can only be performed on the toy language presented thereinafter, not on real life code, therefore we do not present any benchmark results, even though examples of analysis results will be put forth. Indeed Programs 1, 2 and 3 were precisely analyzed once encoded into our toylanguage (see Programs 4 and 5).

*Outline.* We start, in Sect. 2, by presenting the concrete semantic we want to abstract. In Sect. 3 we build a first abstraction which forgets numerical values and focuses on abstracting tree shapes. Section 4 presents a novel numerical abstract domain required for the definition of the abstract domain of Sect. 5, which aims at precisely representing numerical constraints between trees and program variables. In Sect. 6 we provide remarks on the implementation and results of the analyzer. Finally Sect. 7 mentions related works while Sect. 8 concludes.

*Notations.* Classical Galois connections (see [5]) are denoted (A, <sup>⊆</sup>A) <sup>−</sup> ←−−−−→− α γ (B, <sup>⊆</sup>B). When no best abstraction can be defined, we use the *representation* framework (as defined by Bourdoncle in [3], also known as concretization only framework), representations are denoted by (A, <sup>⊆</sup>A) <sup>γ</sup> ←− (B, <sup>⊆</sup>B). <sup>A</sup> - B denotes the set of partial maps from A to B, and λ|Ax.f(x) ∈ B denotes the map in A → B that associates f(x) to x. Finally when f ∈ A → C and g ∈ B → C, with A ∩ B = ∅, f g is the function defined on A ∪ B, that associates f(x) (resp. g(x)) to x whenever x ∈ A (resp. x ∈ B).

### **2 Syntax and Concrete Semantics**

**Definition 1.** *An* alphabet <sup>F</sup> *is a finite set, a* ranked alphabet *is a pair* <sup>R</sup> <sup>=</sup> (F, a) *where* <sup>F</sup> *is an alphabet and* <sup>a</sup> ∈F→ <sup>N</sup>*. For* <sup>f</sup> ∈ F*, we call* arity *of* <sup>f</sup> *the value* <sup>a</sup>(f)*. We assume that* <sup>Z</sup> *and* <sup>F</sup> *are disjoint and we define the set of* natural terms *over* <sup>R</sup> *(denoted* <sup>T</sup>Z(R)*) to be the smallest set defined by:*

*–* <sup>Z</sup> <sup>⊆</sup> <sup>T</sup>Z(R) *–* <sup>∀</sup><sup>p</sup> <sup>≥</sup> <sup>0</sup>, f ∈ F, t1, ...,t<sup>p</sup> <sup>∈</sup> <sup>T</sup>Z(R), a(f) = <sup>p</sup> <sup>⇒</sup> <sup>f</sup>(t1,...,tp) <sup>∈</sup> <sup>T</sup>Z(R)

*Moreover when* <sup>R</sup> *contains at least one symbol of arity* <sup>0</sup>*, we define* terms *over* <sup>R</sup> *(denoted* <sup>T</sup>(R)*) to be the smallest set defined by:*

$$p - \forall p \ge 0, \ f \in \mathcal{F}, t\_1, \dots, t\_p \in T(\mathcal{R}), \ a(f) = p \Rightarrow f(t\_1, \dots, t\_p) \in T(\mathcal{R})$$

*In the following,* <sup>F</sup><sup>n</sup> *denotes the subset of* <sup>F</sup> *of arity* <sup>n</sup>*. Moreover given a term* <sup>t</sup> <sup>∈</sup> <sup>T</sup>(R) *we denote* <sup>f</sup> <sup>=</sup> **head**(t) ∈ F *and* **sons**(t) *a possibly empty tuple* (t1,...,tn) *of elements of* <sup>T</sup>(R) *such that* <sup>t</sup> <sup>=</sup> <sup>f</sup>(t1,...,tn)*.*

*Remark 1.* Numerical leaves are defined to contain integers, however this could be modified to rationals, real numbers or floats. We are parametric in the type of numeric values, as they are delegated to an underlying numerical domain.

*Example 1.* Consider the ranked alphabet <sup>R</sup> <sup>=</sup> {\*(1), &(1), <sup>+</sup>(2), <sup>x</sup>(0)}, <sup>u</sup>(n) means that symbol u has arity n. Then &x ∈ T(R), but \*(&x+4) ∈ TZ(R), and \*(&x+4) ∈/ T(R). Using this alphabet we can model C pointer arithmetic.

*Example 2.* <sup>U</sup> <sup>=</sup> {+(x, y) <sup>|</sup> <sup>x</sup> <sup>≤</sup> <sup>y</sup>} and <sup>V</sup> <sup>=</sup> {+(x, +(z,y)) <sup>|</sup> <sup>x</sup> <sup>≤</sup> <sup>y</sup> <sup>∧</sup> <sup>z</sup> <sup>≤</sup> <sup>y</sup>} are two sets of natural terms over R = {+(2)} which we use as running examples.

$$\begin{array}{lcl} \mathsf{tree-expr} \triangleq \mathsf{make-symb1ic}(\mathcal{F},\\ \mathsf{tree-expr}, \dots, \mathsf{tree-expr}) & \mathsf{sym-expr} \triangleq \mathsf{// } \mathsf{get.symb2}(\mathsf{tree-expr})\\ & \mathsf{make-integer}(\mathsf{expr}) & \mathsf{expr} \triangleq \dots \\ & \mathsf{get.non}(\mathsf{tree-expr}, \exp r) & \mathsf{ $\mathsf{get.num.head}(\,tree-expr)$ }\\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \mid \; \mathsf{get.num.head}(\,tree-expr) \\ & \qquad \qquad \qquad \qquad \qquad \mid \; \mathsf{T} = \textit{tree-expr} \end{array}$$

#### **Fig. 1.** Syntax extension of the language

$$\begin{aligned} \mathbb{E}[\texttt{make-symbox}] \cup \{s \in \mathcal{F}\_m, T\_1, \dots, T\_m\}[(E, F) = \{s(t\_1, \dots, t\_m) \mid \forall i, \ t\_i \in \mathbb{E}[T\_i](E, F)\}]\\ \mathbb{E}[\texttt{make-integer}(e \in exp)][(E, F) = \mathbb{E}[e](E, F) \\ \mathbb{E}[\texttt{is\\_symbox}](T)](E, F) = \{\texttt{true} \mid \exists t \in \mathbb{E}[T](E, F), \exists f \in \mathcal{R}, \ t = f(\ldots)\} \\ \cup \{\texttt{false} \mid \exists t \in \mathbb{E}[T](E, F), t \in \mathbb{Z}\} \\ \mathbb{E}[\texttt{get\\_sum}(T, e)](E, F) = \{t \mid \exists i \in \mathbb{E}[e](E, F), \ t' \in \mathbb{E}[T](E, F), f \in \mathcal{F}\_{m > i}, \\ t' = f(t\_0, \dots, t\_{m-1}) \land t\_i = t\} \\ \mathbb{E}[\texttt{get\\_sum}, \texttt{head}(T)](E, F) = \{i \in \mathbb{Z} \mid \exists t \in \mathbb{E}[T](E, F), \ t = i\} \\ \mathbb{E}[\texttt{get\\_sum}, \texttt{head}(T)](E, F) = \{s \in \mathbb{R} \mid \exists t \in \mathbb{E}[T](E, F), \ t = s(\ldots)\} \end{aligned}$$

#### **Fig. 2.** Concrete operations on natural terms


*Syntax of the Language and Concrete Operations.* We assume already defined a small imperative language and extend it (in Fig. 1) with statements, tree expressions (*tree-expr*) which are expressions that are evaluated to trees, and simple symbol expressions (*sym-expr*) which enable the manipulation of symbols. We add the ability to build a tree which contains only a numerical leaf: make integer(e), the ability to read the i-th son of a tree t: get son(t, i), . . . . Figure 2 defines concrete operations over the set ℘(TZ(R)). Figure 2 assumes given a set of program numerical variables V, a set of numerical expressions (over <sup>V</sup>) denoted *expr*, a set of statements *stmt*, a notion of numerical environment <sup>E</sup> <sup>∈</sup> <sup>E</sup> <sup>=</sup> V → <sup>Z</sup>, a set of tree program variables <sup>T</sup> , a notion of tree environment F ∈ F = T → ℘(TZ(R)), D = E × F is our concrete domain. Finally we assume already partially defined on numerical expressions an evaluation function <sup>E</sup>[[<sup>e</sup> <sup>∈</sup> *expr*]](<sup>E</sup> ∈V→ <sup>Z</sup>, F ∈T → <sup>℘</sup>(TZ(R))) <sup>∈</sup> <sup>℘</sup>(Z). Using this operator we are able to define Program 4 which computes the memory zones used by append from Program 1, and Program 5 that simulates the behavior of Program 3.

# **3 Natural Term Abstraction by Tree Automata**

In this section we start by defining a value abstraction for tree sets (in Sect. 3.1), which is then lifted to an environment abstraction (in Sect. 3.2).

#### **3.1 Value Abstraction**

As a first abstraction for natural terms, we put aside numerical values and define an abstraction able to describe sets of tree shapes. Tree automata enable the description of set of terms built upon a finite ranked alphabet. The ranked alphabet of the language we want to analyze is extend with the symbol to denote potential positions of numerical values.

**Definition 2 (Finite tree automata).** *A* finite tree automaton *(FTA) over a ranked alphabet* <sup>R</sup> *is a tuple* (Q, <sup>R</sup>, Q<sup>f</sup> , δ)*, where* <sup>Q</sup> *is a (finite) set of states,* <sup>Q</sup><sup>f</sup> <sup>⊆</sup> <sup>Q</sup> *is the set of final states, and* <sup>δ</sup> <sup>∈</sup> <sup>℘</sup>( - <sup>n</sup>∈<sup>N</sup> <sup>F</sup><sup>n</sup> <sup>×</sup> <sup>Q</sup><sup>n</sup> <sup>×</sup> <sup>Q</sup>) *is the set of transitions. We define* δ : (- <sup>n</sup>∈<sup>N</sup> <sup>F</sup><sup>n</sup> <sup>×</sup> <sup>Q</sup><sup>n</sup>) <sup>→</sup> <sup>℘</sup>(Q) *by:* <sup>δ</sup>(f, −→<sup>q</sup> ) = {q <sup>|</sup> (f, −→q,q ) <sup>∈</sup> <sup>δ</sup>}*. When* <sup>δ</sup> *is such that,* <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>, f ∈ F<sup>n</sup>, −→<sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>n</sup>, <sup>|</sup>δ(f, −→<sup>q</sup> )<sup>|</sup> = 1*, we say that the automaton is complete and deterministic (CDFTA). We then abuse notations and denote by* δ(f, −→q ) *the unique element in the set* δ(f, −→q )*.*

**Definition 3 (Reachability).** *Given a FTA* <sup>A</sup> = (Q, <sup>R</sup>, Q<sup>f</sup> , δ) *we define, a reachability function* reach<sup>A</sup> : <sup>T</sup>(R) <sup>→</sup> <sup>℘</sup>(Q)

$$\text{REACH}\_{\mathcal{A}}(t) = \text{let } t\_1, \dots, t\_n = \mathbf{sons}(t) \text{ in}$$

$$\bigcup\_{(q\_1, \dots, q\_n) \in \{\text{REACH}\_{\mathcal{A}}(t\_1), \dots, \text{REACH}\_{\mathcal{A}}(t\_n)\}} \overline{\delta}(\mathbf{head}(t), (q\_1, \dots, q\_n))$$

*If* **sons**(t) *is the empty tuple (which is the case when* t *is a constant* a*), the union is made over a unique element (which is the empty tuple), which then boils down to:* <sup>δ</sup>(a,())*. If* **sons**(t) *is not the empty tuple and for some* <sup>i</sup>*,* ReachA(ti) *is empty, then* ReachA(t) *is also empty.*

*Example 3.* Consider the ranked alphabet <sup>R</sup> <sup>=</sup> {f(2), a(0)}, and the automaton A = ({u, v}, R, {v}, {a() → u, f(v, v) → v, f(u, u) → u, f(u, u) → v}). Then reachA(a) = {u}, reachA(f(a, a)) = {u, v}, reachA(f(f(a, a), a)) = {u, v}.

**Definition 4 (Acceptance).** *Given a FTA* <sup>A</sup> = (Q, <sup>R</sup>, Q<sup>f</sup> , δ)*, a term* <sup>t</sup>*, we say that* <sup>t</sup> *is* accepted *by the automaton if* reachA(t) <sup>∩</sup> <sup>Q</sup><sup>f</sup> <sup>=</sup> <sup>∅</sup>*.* <sup>L</sup>(A) *denotes the set of terms accepted by automaton* <sup>A</sup>*.*

*Example 4.* With the definition of Example 3, <sup>L</sup>(A) is the set of terms over <sup>R</sup> that contain at least one f.

**Definition 5 (Tree regular languages).** *A set of terms* <sup>T</sup> *over a ranked alphabet* <sup>R</sup> *is called* tree regular *if there exists a FTA* <sup>A</sup> *over* <sup>R</sup> *such that* <sup>L</sup>(A) = <sup>T</sup> *. The set of such languages is denoted TReg*(R)*.*

*Remark 2.* As for regular languages, for all A ∈ FTA there exists <sup>A</sup> <sup>∈</sup> CDFTA such that L(A) = L(A ), moreover A is computable (see [4]).


**Proposition 1.** (*TReg*(R), <sup>⊆</sup>,∩,∪, .<sup>c</sup>, <sup>∅</sup>, T(R)) *is a complemented lattice with infinite height, moreover it is not complete.* <sup>⊆</sup>,∩,<sup>∪</sup> *and complementation (*. c*) are computable operations on tree automata [4].*

We denote by R the ranked alphabet R after adding the symbol of arity 0 (we assume that - ∈ R). Given a natural term t, we define t to be the term obtained by replacing every integer with the symbol.

**Proposition 2.** (℘(TZ(R)), <sup>⊆</sup>) <sup>γ</sup> ←− (*TReg*(R-), <sup>⊆</sup>) *where* <sup>γ</sup>(A) = {<sup>t</sup> <sup>|</sup> <sup>t</sup> - ∈ <sup>L</sup>(A)} *is a representation. Moreover with such a* <sup>γ</sup> *definition,* <sup>∪</sup>*,* <sup>∩</sup> *soundly represent the union and the intersection.*

*Remark 3.* We only have a representation and not a Galois connection as language T of Example 5 does not have a best tree regular over approximation.

*Example 6.* Let <sup>R</sup> <sup>=</sup> {+(2)} and <sup>A</sup> = ({0, <sup>1</sup>}, <sup>R</sup>-, {0, 1}, {(-() → 0, +(0, 0) → 1, +(0, 1) → 1)}). Examples of terms recognized by A are shown on Fig. 3. Natural terms from our running example U and V (defined in Example 2) are also contained in γ(A). Moreover as we do not provide numerical constraints: 1 + (3 + 4), 23, 1 + (2 + (3 + 4)) are also elements in γ(A).

Due to the infinite height of the lattice, a widening operator is required. In the following, we assume given a constant <sup>w</sup> <sup>∈</sup> <sup>N</sup>, this constant will be used to stabilize increasing chains, the greater the constant, the more precise our widening operator will be.

**Definition 6.** *Let* <sup>A</sup> = (Q, <sup>R</sup>, Q<sup>f</sup> , δ) <sup>∈</sup> *FTA, and* <sup>∼</sup> *be an equivalence relation on* <sup>Q</sup>*, such that* <sup>p</sup> <sup>∼</sup> <sup>q</sup> <sup>∧</sup> <sup>p</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> <sup>⇒</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> *. We define* <sup>A</sup>/ <sup>∼</sup>= (Q/ <sup>∼</sup>, <sup>R</sup>, Q<sup>f</sup> / <sup>∼</sup>, - (f,q1,...,qn,q)∈δ{(f, q<sup>∼</sup> <sup>1</sup> ,...,q<sup>∼</sup> <sup>n</sup> , q∼)}) *where* <sup>q</sup><sup>∼</sup> *is the equivalence class of* <sup>q</sup> *in* <sup>∼</sup>*.*

**Proposition 3.** *For every* A ∈ *FTA and every* <sup>∼</sup> *equivalence relation on its states,* <sup>L</sup>(A) ⊆ L(A/ <sup>∼</sup>)*.*

Therefore following the idea from [9] and in [11], we define a widening operation by quotienting states of automata by an equivalence relation of finite index. We define by induction a special sequence of equivalence relations on states of tree automata: ∼1= {Q<sup>f</sup> , Q \ Q<sup>f</sup> } and ∼<sup>k</sup>+1 is ∼<sup>k</sup> where we split equivalence classes not satisfying the following condition: ∀f ∈ F<sup>n</sup>, ∀p1,...,p<sup>n</sup> ∈ Q, ∀q1,...,q<sup>n</sup> ∈ Q,( n <sup>i</sup>=1 p<sup>i</sup> ∼<sup>k</sup> qi) ⇒ δ(f, p1,...,pn) ∼<sup>k</sup> δ(f, q1,...,qn) and ∀q ∈ Q<sup>f</sup> , q∼<sup>k</sup> ⊆ Q<sup>f</sup> . This sequence of equivalence relations is the Myhill-Nerode sequence (see [4]). This sequence is of length at most the number of states of the automaton (before stabilization). Let φ(w) = max{i ≤ |Q| | index of ∼<sup>i</sup>≤ w} (given an integer w, φ yields the index of the most precise of the equivalence relationships in the Myhill-Nerode sequence, that contains at most w equivalence classes) and [A]<sup>w</sup> = A/ ∼<sup>φ</sup>(w). [A]<sup>w</sup> is therefore a FTA with at most w states such that L(A) ⊆ L([A]w). As for regular languages, for every CDFTA a equivalent minimal CDFTA (in the sense of the number of states, and unique modulo state renaming) can be obtained by quotienting the automaton by ∼|Q|. Therefore we define a widening operator on CDFTAs, which is then lifted to tree regular languages.

#### **Definition 7 (Widening operator ).** AA = [A∪A ]w*.*

### **Proposition 4.** *This widening is sound and stabilizes infinite sequences.*

*Remark 4.* Consider the two following complete and deterministic tree automata: A = ({a, b, h}, {+(2)}, {a}, {-() → b, +(b, b) → a}) and B = ({a, b, c, h}, {+(2)}, {a}, {-() → b, +(b, b) → c, +(b, c) → a}) (unmentioned transitions go to h). A (resp. B) recognizes the tree +(-, -) (resp. +(-, +(-, -))), it over-approximates U (resp. V ) from our running example. A∪B is recognized by the following complete and deterministic tree automaton: C = ({a, b, c, h}, {+(2)}, {a, c}, {-() → b, +(b, b) → c, +(b, c) → a}). If we want to widen A and B with parameter 3, the following equivalence relation is computed: {{h}, {b}, {a, c}}. Merging equivalent states produces ({a, b, h}, {+(2)}, {a}, {-() → b, +(b, b) → a, +(b, a) → a}), which contains a loop and overapproximates the union.

#### **3.2 Environment Abstraction**

Now that we are given an abstraction for natural term sets, let us show how this is lifted to a notion of abstract natural term environments mapping variables to natural terms. Given a set of natural term variables <sup>T</sup> , consider <sup>F</sup> <sup>=</sup> (T → TReg(R-)) ∪ {⊥} and the set operators defined by the point-wise lifting of operators on TReg(R-). We also lift the concretization function ℘(TZ(R)) ← TReg(R-) to <sup>F</sup> <sup>←</sup> <sup>F</sup>. We assume given an abstract numerical environment E and an abstract evaluator E[[e]]. Abstract


**Fig. 3.** Example of accepted trees from Example 6

transformers [[make symbolic]], [[is symbol]], [[get son(e)]], [[get sym head]] and [[get num head]] are simple tree automata operations. For concision Fig. 4 only provides definitions of two of these operators. Please note that these definitions require all states of the automata to be reachable. An example of use of the is symbol operator can be found in Example 7. Other abstract operators are similar.

$$\begin{split} & \mathbb{E}^{\mathfrak{z}}[\texttt{mathbb{A}}\texttt{a}\texttt{a}\texttt{a}\texttt{in}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e} \\ & \mathbb{E}^{\mathfrak{z}}[\texttt{geet}\texttt{,}\texttt{soon}(T,e\in\texttt{exp}\texttt{p})\texttt{I}\langle E^{\mathfrak{z}},F^{\mathfrak{z}}\rangle = \\ & \bigcup\_{\substack{(Q,\mathcal{R},Q\_{f},\delta)\in\mathbb{E}^{\mathfrak{z}}\|\,\mathcal{F}\|\,\mathcal{F}\|\,\mathcal{E}^{\mathfrak{z}}\texttt{i}\,\mathcal{F}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e} \\ & i\in\mathbb{Z}^{\mathfrak{z}}\|\,\mathcal{E}^{\mathfrak{z}}\|\,\mathcal{E}^{\mathfrak{z}}\texttt{i}\,\mathcal{F}\|\,\mathcal{E}^{\mathfrak{z}}\texttt{i}\texttt{e}\texttt{e}\texttt{e}\texttt{e}\texttt{e} \texttt{e} \texttt{e} \texttt{e} \texttt{e} \texttt{$$

#### **Fig. 4.** Abstract operators

*Example 7.* Consider the tree automaton <sup>A</sup> of Example 6, (Fig. 3), with <sup>F</sup> = (<sup>x</sup> → A): [[get sym head(x)]](E, F) = {+} and [[get num head(x)]](E, <sup>F</sup>) = .

#### **4 Numerical Abstractions**

As emphasized in the introductory example, we rely on numerical domains to introduce constraints on numerical variables found in trees. In a classical numeric abstraction (e.g. intervals [6], octagons [22], polyhedra [8], . . . ), each abstract element represents a set of maps V → <sup>R</sup> for a fixed, finite set of variables V. In contrast, our numeric variables are leaves of a possibly infinite set of trees of unbounded size. Hence before starting the presentation of the numerical abstraction for natural terms, we show how to extend in a generic way an abstract element in two steps. Firstly we want to be able to represent a set of maps, where each map is defined over a (possibly different) finite subset of an infinite set of variables (this is done in Sect. 4.1). Secondly, we use summarization variables to relax the finiteness constraint, so as to represent sets of maps over heterogeneous maps over infinitely many variables (done in Sect. 4.2).

#### **4.1 Heterogeneous Support**

We define M <sup>Δ</sup> <sup>=</sup> <sup>℘</sup>(<sup>V</sup> -<sup>R</sup>), the set of partial maps from <sup>V</sup>, to <sup>R</sup>. <sup>M</sup> is ordered by the inclusion relation ⊆. In the following **def**(f) denotes the definition set of

<sup>f</sup>. We assume defined a representation (℘(S → <sup>R</sup>), <sup>⊆</sup>) <sup>γ</sup><sup>S</sup> ←0 −− (N<sup>S</sup> , <sup>S</sup> <sup>0</sup> ), for every finite set S⊆V (such as octagons in |S| dimensions). N<sup>S</sup> comes with the usual abstract set operator <sup>S</sup> <sup>0</sup> , <sup>S</sup> <sup>0</sup> . Moreover if x ∈ S, y /∈ S, S is another finite set and <sup>N</sup> <sup>∈</sup> <sup>N</sup><sup>S</sup> then <sup>N</sup>[<sup>x</sup> → <sup>y</sup>] <sup>∈</sup> <sup>N</sup>S∪{y}\{x} is the abstract element obtained by renaming x into y, N |S- ∈ NS is obtained by existentially quantifying dimensions associated to elements in S and not in S and adding unconstrained dimensions for elements in S and not in S. From now on we assume that this last operator is exact (as for intervals, octagons, polyhedra over R). However results from this section can be extended to numerical domains that are able, given <sup>N</sup> <sup>∈</sup> <sup>N</sup><sup>S</sup> , <sup>N</sup> <sup>∈</sup> <sup>N</sup>S- , to check if γ<sup>S</sup> <sup>0</sup> (N) <sup>⊆</sup> <sup>γ</sup>S- <sup>0</sup> (N )|S . The precision of the extension defined in this subsection would then depend upon the precision of this test in the underlying domain. Finally [[.]]<sup>S</sup> <sup>0</sup> (resp. [[.]],<sup>S</sup> <sup>0</sup> ) refers to the classical concrete (resp. abstract) semantic of operators on sets of numerical maps (resp. abstract elements). A classical method for the abstraction of heterogeneous maps is the use of a partitioning of the concrete element according to the definition set of its represented maps. However partitioning induces an increase in numerical operation cost (exponential in the number of variable) which we would like to avoid. Therefore in order to abstract sets of maps with heterogeneous definition sets, we start by abstracting the potential definition set. We choose a simple lowerbound/upper-bound abstraction (l and u in the following definition). Moreover we need to abstract the potential mappings given a definition set: this is done using a classical numerical domain. Contrary to partitioning, we will use only one numerical abstract element, defined on the upper-bound u, to represent all environments (instead of one abstract element by definition set). We also add a element, used in the case where the upper bound u is infinite.

**Definition 8 (Numerical abstraction).** *Let us define the following set:* M <sup>Δ</sup> = {N,l, u | l, u <sup>∈</sup> <sup>℘</sup>(<sup>V</sup> )∧<sup>l</sup> *and* <sup>u</sup> *are finite*∧<sup>l</sup> <sup>⊆</sup> <sup>u</sup>∧N <sup>∈</sup> <sup>N</sup><sup>u</sup>∧N <sup>=</sup> <sup>⊥</sup><sup>u</sup> <sup>0</sup> }∪{, ⊥}*. An element of* <sup>M</sup> *is therefore: either ,* <sup>⊥</sup> *or a triple* N,l, u *where* <sup>l</sup> *and* <sup>u</sup> *are finite sets of variables such that* N *is defined over* u*.*

**Definition 9 (Concretization function).** *Abstract elements from* M *are mapped to* <sup>M</sup> *thanks to the following concretization function:* <sup>γ</sup>(⊥) = <sup>∅</sup>*,* <sup>γ</sup>() = <sup>M</sup> *and* <sup>γ</sup>(N,l, u) = {<sup>ρ</sup> ∈S→ <sup>Z</sup> <sup>|</sup> <sup>l</sup> ⊆S⊆ <sup>u</sup> <sup>∧</sup> <sup>ρ</sup> <sup>∈</sup> <sup>γ</sup><sup>S</sup> <sup>0</sup> (N)|S )}*.*

*Example 8.* As an example consider <sup>γ</sup>({<sup>x</sup> <sup>=</sup> y, x <sup>≤</sup> <sup>3</sup>, z = 0}, {x}, {x, y, z}) = {(x → a) | a ≤ 3}∪{(x → a, y → a) | a ≤ 3}∪{(x → a, z → 0) | a ≤ 3}∪{(x → a, y → a, z → 0) | a ≤ 3}. As intended, the resulting set of maps contains maps with different definition sets.

**Definition 10 (Order).** *On* M *we define the following comparison operator:* N,l, uN , l , u ⇔ l <sup>⊆</sup> <sup>l</sup> <sup>⊆</sup> <sup>u</sup> <sup>⊆</sup> <sup>u</sup> <sup>∧</sup> <sup>N</sup> <sup>u</sup> <sup>0</sup> <sup>N</sup> |u*, this comparison is trivially extended to (resp.* <sup>⊥</sup>*) as being the biggest (resp. smallest) element in* M*. In the following* M <sup>p</sup> *denotes the subset of* <sup>M</sup> *where* <sup>u</sup> <sup>=</sup> <sup>p</sup> *extended with and* <sup>⊥</sup>*.*

# **Proposition 5.** <sup>γ</sup> *is monotonic for .*

Figure 5 provides the definition of the concrete and abstract semantics of the classical numerical statements, Assume and Assign (denoted x ← e). We denote **vars**(e) the set of variables appearing in e. We recall that [[Assume(c)]]<sup>S</sup> <sup>0</sup> (E ∈ <sup>℘</sup>(S → <sup>R</sup>)) = {<sup>f</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> true <sup>∈</sup> <sup>E</sup>[[c]](f)} and [[<sup>x</sup> <sup>←</sup> <sup>e</sup>]]<sup>S</sup> <sup>0</sup> (<sup>E</sup> <sup>∈</sup> <sup>℘</sup>(S → <sup>R</sup>)) = {f[x → e ] <sup>|</sup> <sup>f</sup> <sup>∈</sup> <sup>E</sup> <sup>∧</sup> <sup>e</sup> <sup>∈</sup> <sup>E</sup>[[e]](f)}. In order to ease the lifting of these classical operators we define [[stmt]]0(M ∈ <sup>M</sup>) <sup>Δ</sup> = ∪<sup>S</sup> finite⊆V [[stmt]]<sup>S</sup> <sup>0</sup> (M ∩ (S → <sup>R</sup>)), for every statement stmt. Moreover we assume the existence of the following abstract operators: [[Assume(c)]],u <sup>0</sup> (N) and [[<sup>x</sup> <sup>←</sup> <sup>e</sup>]],u <sup>0</sup> <sup>N</sup> abstracting soundly their respective concrete transformers. Note that the concrete semantic of Assume(c) (resp. x ← e) enforces that maps are defined at least on the variables appearing in c (resp. in e and on x). Abstract operators from Fig. 5 are sound with respect to γ and their concrete operators.

$$\begin{aligned} [\mathsf{Assume}(c)](\mathcal{M}) &= [\mathsf{Assume}(c)] \wr (\{ f \mid f \in \mathcal{M} \land \mathsf{vars}(c) \subseteq \mathsf{def}(f) \}) \\ [\mathsf{Assume}(c)]^{\sharp}(\langle N^{\sharp}, l, u \rangle) &= \langle [\mathsf{Assume}(c)]^{\sharp, u}\_{0}(N^{\sharp}), l \cup \mathsf{vars}(c), u \rangle \\ [x \leftarrow e](\mathcal{M}) &= [x \leftarrow e] \lrcorner\_{0} (\{ f \mid f \in \mathcal{M} \land \mathsf{vars}(e) \cup \{ x \} \subseteq \mathsf{def}(f) \}) \\ [x \leftarrow e]^{\sharp}(\langle N^{\sharp}, l, u \rangle) &= \langle [x \leftarrow e]^{\sharp, u}\_{0}(N^{\sharp}), l \cup \mathsf{vars}(e) \cup \{ x \}, u \rangle \end{aligned}$$

We now need to define that abstracts the classic set operator ∪. We can not directly apply the corresponding abstract operator on the numerical component of the abstractions as they might have different definition sets. A first naive solution would be to extend their respective definition set and to perform the abstract operation on the resulting elements: N |u∪u <sup>u</sup>∪u- <sup>0</sup> <sup>N</sup> |u∪u- . However consider <sup>M</sup> <sup>=</sup> {<sup>x</sup> <sup>=</sup> <sup>y</sup>}(= <sup>U</sup>), {x, y}, {x, y} and <sup>N</sup> <sup>=</sup> {<sup>x</sup> <sup>=</sup> <sup>z</sup>}(= <sup>V</sup> ), {x, z}, {x, z}, where the underlying domain is the octagon domain where elements are represented as a set of linear constraints (e.g. {<sup>x</sup> <sup>=</sup> <sup>y</sup>}). We have <sup>U</sup> |{x,y,z} <sup>=</sup> {<sup>x</sup> <sup>=</sup> <sup>y</sup>} and V |{x,y,z} <sup>=</sup> {<sup>x</sup> <sup>=</sup> <sup>z</sup>}, hence <sup>U</sup> |{x,y,z} {x,y,z} <sup>0</sup> <sup>V</sup> |{x,y,z} <sup>=</sup> . Consider now the abstract element in <sup>M</sup>: <sup>R</sup> <sup>=</sup> {<sup>x</sup> <sup>=</sup> y, x <sup>=</sup> <sup>z</sup>}(= <sup>W</sup>), {x}, {x, y, z}. The concretization of R over-approximates the union of the concretization of M and N, and its numerical component is more precise than . We note that the numerical constraints appearing in W could be found in U or V , therefore in order to remove the aforementioned imprecision we define a refined abstract union operator, denoted as , that uses constraints found in the inputs in order to refine its

#### **Algorithm 1. strengthening** operator

**Input :** *X*- , *C*: a set of constraints, *U*- ∈ *N*u: a soundness threshold on environment *u*, *V* - ∈ *N*v: a soundness threshold on environment *v* **Output:** *Z* an abstract element over-approximating *U* on *u* and *V* on *v* **<sup>1</sup>** *Z*- <sup>←</sup> *<sup>X</sup>*- ; **<sup>2</sup> foreach** *c* ∈ *C* **do <sup>3</sup>** *T*- <sup>←</sup> [[Assume(*c*)]]-,u∪v <sup>0</sup> (*Z*- ); **<sup>4</sup> if** *U* <sup>u</sup> <sup>0</sup> *T*- <sup>|</sup><sup>u</sup> <sup>∧</sup> *<sup>V</sup>* <sup>v</sup> <sup>0</sup> *T*- <sup>|</sup><sup>v</sup> **then <sup>5</sup>** *Z*- <sup>←</sup> *<sup>T</sup>*- ; **6 end <sup>7</sup> return** *Z*- ;

result. This is done using the **strenghtening** operator of Algorithm 1 which adds constraints from C that do not make the projection of X to u (resp. v) lower than the threshold U (resp. V ). We assume that, given an abstract element U, we can extract a finite set of constraints satisfied by U, those are denoted **constraints**(U) (the more constraints can be extracted, the more precise the result will be). For example if the numerical domain is the interval domain, constraints have the form ±x ≥ a. If the numerical domain is the octagon domain the **constraints** operator yields all the linear relations among variables that define the octagon.

**Definition 11 ( operator).** *Let* <sup>U</sup> <sup>∈</sup> <sup>N</sup>u*,* <sup>V</sup> <sup>∈</sup> <sup>N</sup><sup>v</sup> *be two numerical environments, let* <sup>X</sup> <sup>∈</sup> <sup>N</sup><sup>u</sup>∪<sup>v</sup>*, let* <sup>C</sup> *be a sequence of numerical constraints over* <sup>u</sup>∪v*, let* <sup>c</sup> <sup>=</sup> <sup>u</sup> <sup>∩</sup> <sup>v</sup> *we define:*

$$\begin{aligned} U^\sharp \uplus V^\sharp &= \text{let } \ X^\sharp = (U^\sharp\_{\mid \mathfrak{c}} \uplus\_0^\mathfrak{c} V^\sharp\_{\mid \mathfrak{c}})\_{|u \cup v} \text{ in} \\ &\quad \text{let } C = \text{constraints}(U^\sharp) \cup \text{constraints}(V^\sharp) \text{ in} \\ &\quad \text{strengthing}(X^\sharp, C, U^\sharp, V^\sharp) \end{aligned}$$


*Example 9.* Let us now consider the example introduced thereinbefore <sup>U</sup> V = {<sup>x</sup> <sup>=</sup> y, y <sup>=</sup> <sup>z</sup>} ∈ <sup>N</sup>{x,y,z}. Indeed using the notations of Definition 11: <sup>Z</sup> <sup>Δ</sup> = <sup>X</sup> <sup>=</sup> ∈ <sup>N</sup>{x,y,z}, <sup>C</sup> <sup>=</sup> {<sup>x</sup> <sup>=</sup> y, y <sup>=</sup> <sup>z</sup>}, moreover [[Assume(<sup>x</sup> <sup>=</sup> <sup>y</sup>)]],u∪<sup>v</sup> <sup>0</sup> () = {x = y}( Δ <sup>=</sup> <sup>T</sup>), <sup>U</sup> {x,y} <sup>0</sup> {<sup>x</sup> <sup>=</sup> <sup>y</sup>} <sup>=</sup> <sup>T</sup> |{x,y} and <sup>V</sup> {x,z} <sup>0</sup> <sup>=</sup> <sup>T</sup> |{x,z}. Therefore constraint x = y is added to Z. At the next loop iteration: [[Assume(x = z)]],u∪<sup>v</sup> <sup>0</sup> ({x = y}) = {x = y, x = z}( Δ <sup>=</sup> <sup>T</sup>), <sup>U</sup> {x,y} <sup>0</sup> {<sup>x</sup> <sup>=</sup> <sup>y</sup>} <sup>=</sup> <sup>T</sup> |{x,y} and <sup>V</sup> {x,z} <sup>0</sup> {<sup>x</sup> <sup>=</sup> <sup>z</sup>} <sup>=</sup> <sup>T</sup> |{x,z}. Therefore constraint <sup>x</sup> <sup>=</sup> <sup>z</sup> is added to <sup>Z</sup>.

**Proposition 6 (Soundness of ).** *let* <sup>U</sup> <sup>∈</sup> <sup>N</sup><sup>u</sup> *and* <sup>V</sup> <sup>∈</sup> <sup>N</sup>v*, then* <sup>γ</sup><sup>u</sup> <sup>0</sup> (U) <sup>⊆</sup> (γu∪<sup>v</sup> <sup>0</sup> (U <sup>V</sup> ))|<sup>u</sup> *and* <sup>γ</sup><sup>v</sup> <sup>0</sup> (<sup>V</sup> ) <sup>⊆</sup> (γu∪<sup>v</sup> <sup>0</sup> (U <sup>V</sup> ))|v*.*

**Definition 12 (Union abstract operators).** *We define the following abstract set operator:* N,l, uN , l , u Δ <sup>=</sup> N N , l ∩ l , u ∪ u *. This operator soundly abstracts the union. Moreover in order to ensure the stabilization of infinitely increasing chains in* M *we define the following widening operator:*

$$\langle N^\sharp, l, u \rangle \nabla \langle N^\sharp, l', u' \rangle = \begin{cases} \langle N^\sharp \nabla\_0^u N\_{\vert u}^\sharp, l, u \rangle \,\, when \, l \subseteq l' \land u' \subseteq u'\\ \langle N^\sharp \uplus N^\sharp, l', u \rangle \,\, when \, l' \subset l \land u' \subseteq u'\\ \top \qquad \, otherwise \end{cases}$$

*Remark 6.* This widening operator over-approximates to whenever the upperbound on the definition set is growing. This yields a huge loss of information however this numerical domain is designed as a tool domain used by a higher level abstraction in charge of stabilizing the environment before applying the widening, so that this case will not be used in practice.

Subsequent tree abstractions require the definition of the following operators:


#### **4.2 Representation of Maps over Potentially Unbounded Sets**

In this subsection we focus on the problem of defining abstract numerical environments on potentially infinite environments. A classical method we use here is variable summarization (see [13]). This is based on the folding of several concrete objects (a potentially infinite number) to an abstract element which summarizes all concrete objects. The folding is encoded in a function f mapping summarized variables to the set of concrete variables they abstract. Given an abstract numerical environment <sup>N</sup> and a mapping from summary variables: <sup>V</sup> to sets of concrete variables f ∈ V → ℘(V) where f(v1) ∩ f(v2) = ∅ ⇒ v<sup>1</sup> = v2, we define the collapsing of a partial map <sup>ρ</sup> ∈ V -Z under a summarizing function f:

$$\begin{aligned} \downarrow\_f \; (\rho) = \{ \rho' \in \mathcal{V} \, \rightsquigarrow \mathbb{Z} \, | \forall v' \in \mathcal{V}, \; (f(v') \cap \mathbf{def}(\rho) = \emptyset \land \rho'(v') = \mathbf{undefined}) \\ \qquad \qquad \qquad \qquad \qquad \lor \,(\exists v \in \mathcal{V}, \; v \in f(v') \cap \mathbf{def}(\rho) \land \rho'(v') = \rho(v)) \} \end{aligned}$$

*Example 10.* Consider <sup>V</sup> <sup>=</sup> {x, y, z, t} and <sup>V</sup> <sup>=</sup> {a, b, c, d, g, h}, the environment ρ = (a → 0, b → 1, c → 2, d → 3) and finally the summarizing function f = (x → {a}, y → {b, c}, z → {d}, t → {g}). Collapsing environment ρ under f yields the set of environments: (x → 0, y → 1, z → 3) and (x → 0, y → 2, z → 3).

Given a summarizing function f we can now define an extension of the concretization function γ of the previous subsection in the following manner:

$$\gamma[f](N^\sharp) = \{ \rho \in \mathcal{V} \twoheadrightarrow \mathbb{Z} \mid \downarrow\_f \text{ } (\rho) \subseteq \gamma(N^\sharp) \}$$

*Example 11.* Going back to Example 10 and considering the numerical abstract element: <sup>N</sup> <sup>=</sup> {<sup>x</sup> <sup>≤</sup> <sup>y</sup>}, {x}, {x, y}, we have: <sup>γ</sup>(N) = {(<sup>x</sup> → <sup>α</sup>) <sup>|</sup> <sup>α</sup> <sup>∈</sup> <sup>Z</sup>}∪{(<sup>x</sup> → α, y → <sup>β</sup>) <sup>|</sup> <sup>α</sup> <sup>≤</sup> <sup>β</sup>}. We have: <sup>m</sup> <sup>∈</sup> <sup>γ</sup>[f](N) ⇔↓<sup>f</sup> (m) <sup>⊆</sup> <sup>γ</sup>(N) ⇒ {x} ⊆ **def**(↓<sup>f</sup> (m)) ⊆ {x, y}. Therefore if we assume <sup>m</sup> defined on <sup>d</sup> then f(z) ∩ **def**(m) = ∅ hence there would be an element in ↓<sup>f</sup> (m) defined on z. Hence m is not defined on d, similarly for g. Moreover {x} ⊆ **def**(↓<sup>f</sup> (m)) implies that <sup>m</sup> is defined on <sup>a</sup>. Finally: defining <sup>S</sup> <sup>=</sup> {(<sup>a</sup> → <sup>α</sup>) <sup>|</sup> <sup>α</sup> <sup>∈</sup> <sup>Z</sup>}∪{(<sup>a</sup> → α, b → β) | α ≤ β}∪{(a → α, c → β) | α ≤ β}∪{(a → α, b → β, c → γ) | α ≤ <sup>β</sup> <sup>∧</sup> <sup>α</sup> <sup>≤</sup> <sup>γ</sup>}. We have: <sup>γ</sup>[f](N) = <sup>S</sup> <sup>∪</sup> ( - <sup>f</sup>∈<sup>S</sup>{<sup>f</sup> (<sup>h</sup> → <sup>δ</sup>) <sup>|</sup> <sup>δ</sup> <sup>∈</sup> <sup>Z</sup>}).

The abstract domains we will define in the following sections will employ this summarization framework. The manipulation of summarized variables requires the definition of a **fold**(E, x, S) (resp. **expand**(E, x, S)) operator yielding a new environment where x is used as a summary variable for S (resp. where a summary variable x is desummarized into a set of variables S). Let S and S be two finite sets of elements such that S ∩S ⊆ {x}, we define: **expand**0(N, x, <sup>S</sup>) = - <sup>v</sup>∈S-- <sup>N</sup>[<sup>x</sup> → <sup>v</sup>]|(S\{x})∪S- and **fold**0(N, x, <sup>S</sup>) = <sup>v</sup>∈S-- <sup>N</sup>[<sup>v</sup> → <sup>x</sup>]|(S\S--)∪{x} (which generalize the one introduced in [13]). These operations are lifted as operators on elements of M:

$$\begin{aligned} \mathsf{Expand}(\langle N^{\sharp},l,u\rangle,x,\mathcal{S}) & \triangleq \langle \mathsf{Expand}\_{0}(N^{\sharp},x,\mathcal{S}),l \mid \{x\},(u\mid \{x\}) \cup \mathcal{S} \rangle\\ \mathsf{fold}(\langle N^{\sharp},l,u\rangle,x,\mathcal{S}) & \triangleq \langle \mathsf{fold}\_{0}(N^{\sharp},x,\mathcal{S}),\begin{cases} (l\mid \mathcal{S}) \cup \{x\} \text{ if } \mathcal{S} \subseteq l\\ (l\mid \mathcal{S}) \end{cases}\\ \text{otherwise} \end{aligned} \end{aligned}$$

#### **5 Natural Term Abstraction by Numerical Constraints**

We are now able to represent sets of maps with heterogeneous supports and to lift their concretization (modulo a summarization function) to sets of maps with infinite and heterogeneous supports. Given a tree shape (in the sense of Sect. 3), we can associate a numeric variable to each numeric leaf, and use a numeric abstract element to represent the possible values of these leaves. We will name the variable of each leaf as the path from the root to the leaf, i.e., V is a set of words in {0, ..., n − 1} where n is the maximum arity of the considered ranked alphabet. In order to avoid confusion such paths will be denoted 0, <sup>1</sup>, <sup>1</sup> for the word (0, 1, 1). A summarized variable then represents a set of such paths. We will abstract such sets as regular expressions. Using the summarization extended to heterogeneous supports presented in the previous section, it will be possible to represent, using a single numeric abstract element, a set of contraints over the numeric leaves of an infinite set of unbounded trees of arbitrary shape.

#### **5.1 Hole Positions and Numerical Constraints**

The presentation of our computable abstraction able to represent numerical values in trees is broken down (for presentation purposes) into two consecutive abstractions. The first one is not computable, as natural terms are abstracted as partial environments over tree paths to numerical values. This abstraction looses most of the tree shapes but focuses on their numerical environment. A second abstraction will show how partial environments over paths are abstracted into numerical abstract elements defined over a regular expression environment.

In the following, when R is a ranked alphabet of maximum arity n, we call *words* sequences of integers, <sup>w</sup> = (w0,...,w<sup>p</sup>−<sup>1</sup>) ∈ {0,...,(n−1)}<sup>p</sup> will be called a word of length p (denoted |w|), w<sup>i</sup> denotes the i-th integer of the sequence, <sup>w</sup> = (w1,...,w<sup>p</sup>−<sup>1</sup>) is the tail of word <sup>w</sup>, <sup>W</sup>(R) = {0,...,(<sup>n</sup> <sup>−</sup> 1)} is the set of all words over {0,...,n − 1} of arbitrary size.

**Definition 13 (Position in a term).** *Given a natural term* t *and a word* w *we inductively define the subterm of* <sup>t</sup> *at position* <sup>w</sup> *(denoted* <sup>t</sup>|<sup>w</sup>*) to be:*

$$t\_{|w|} = \begin{cases} (t\_{w\_0})\_{|\overline{w}|} & \text{when } |w| > 0 \land t = f(t\_0, \dots, t\_{p-1}) \text{ with } w\_0 < p\\ t & \text{when } |w| = 0\\ \textbf{undefined} & \text{otherwise} \end{cases}$$

*Moreover we denote by* **numeric**(t) = {<sup>w</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>t</sup>|<sup>w</sup> <sup>∈</sup> <sup>Z</sup>}*.*

**Definition 14 (Positioning lattice with exact numerical constraints).** *We define* <sup>C</sup>(R) <sup>Δ</sup> <sup>=</sup> <sup>℘</sup>(W(R) -<sup>Z</sup>)*, an element of* <sup>C</sup>(R) *is therefore a set of partial maps that are acceptable bindings of positions to integers.*

**Proposition 7 (Galois connection with natural terms).** *When* t *is a natural term,* <sup>t</sup><sup>Z</sup> *is the partial map:* <sup>λ</sup>|**numeric**(t)w.tw*. We have the following Galois connection:* (℘(TZ(R)), <sup>⊆</sup>) ←−−−−−− −−−−−→− <sup>α</sup>C(R) <sup>γ</sup>C(R) (C(R), <sup>⊆</sup>)*, with:*

$$\gamma\_{\mathcal{C}(\mathcal{R})}(\varGamma) = \{ t \in T\_{\mathbb{Z}}(\mathcal{R}) \mid t\_{\mathbb{Z}} \in \varGamma \} \quad \alpha\_{\mathcal{C}(\mathcal{R})}(\varGamma) = \{ t\_{\mathbb{Z}} \mid t \in \mathcal{T} \}$$

*Example 12.* Consider our running example (introduced in Example 2), V = {+(x, +(z,y)) <sup>|</sup> <sup>x</sup> <sup>≤</sup> <sup>y</sup> <sup>∧</sup> <sup>z</sup> <sup>≤</sup> <sup>y</sup>}, we have <sup>α</sup>C(R)(<sup>V</sup> ) = {0 → α, 1, <sup>0</sup> → γ, 1, <sup>1</sup> → <sup>β</sup> <sup>|</sup> <sup>α</sup> <sup>≤</sup> <sup>β</sup> <sup>∧</sup> <sup>γ</sup> <sup>≤</sup> <sup>β</sup>}. The concretization of which is exactly <sup>V</sup> .

*Example 13.* Consider however the ranked alphabet {f(2), g(2), a(0)}, and the tree a. Its abstraction contains only the empty map, the concretization of which is the set of all terms that do not contain any numerical value. For example: f(g(a, a), a), g(a, a),... . This emphasizes that we loose information on:


Now that we have abstracted away the shape of the terms, we are left with numerical environments with potentially infinite dimensions (that are words over the alphabet {0,...,n−1}) and different definition sets. Therefore following the idea of Sect. 4 we want to define a summarization for sets of words over the alphabet {0,...,n − 1}. A summarization of such a language can be expressed as a partition into sub-languages. The set of regular languages over the alphabet {0,...,n − 1} is a subset of the set of languages over this alphabet, that is closed under common set operations. Hence given a set {r1,...,r<sup>m</sup>} of regular expressions (with respective recognized language {L1,...,L<sup>m</sup>}), we summarize all words in L<sup>i</sup> inside a common variable r<sup>i</sup> and therefore ↑ {r1,...,r<sup>m</sup>} denotes the summarization function: λri.Li. In the following, Reg<sup>n</sup> denotes the set of regular expressions over the alphabet A<sup>n</sup> = {0,...,n − 1}. As for tree regular expressions, (Regn, <sup>⊂</sup>,∩,∪, .<sup>c</sup>, <sup>∅</sup>, A <sup>n</sup>) is a (non complete) complemented lattice of infinite height, upon which we can define a widening operator (see [10]) in a similar manner as for tree regular expressions (this widening is also parameterized by an integer constant). We recall moreover that operators ⊂,∩,∪ and complementation (. <sup>c</sup>) are computable, and that every finite set of words is regular. Moreover we have the following representation: (A <sup>n</sup>, ) <sup>γ</sup>Reg<sup>n</sup> <sup>=</sup>Id ←−−−−−− (Regn, ). Finally in order to disambiguate regular expressions from integers we will typeset them within .! in a bold font as in: **<sup>0</sup>** <sup>+</sup> **<sup>0</sup>**.**1**!.

*Example 14.* Using notations from Sect. 4.2, <sup>V</sup> = Reg<sup>n</sup> and <sup>V</sup> <sup>=</sup> <sup>W</sup>(R). Consider our running example (introduced in Example 2), natural terms from <sup>V</sup> <sup>=</sup> {+(x, +(z,y)) <sup>|</sup> <sup>x</sup> <sup>≤</sup> <sup>y</sup>∧<sup>z</sup> <sup>≤</sup> <sup>y</sup>} contain three paths to numerical values: 0, 1, <sup>0</sup> and 1, <sup>1</sup>. Numerical constraints on 0 and 1, <sup>0</sup> are similar, therefore the two paths are summarized into one regular expression: **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.**0**!, 1, <sup>1</sup> is left alone in its regular expression: **1**.**1**!. The two constraints x ≤ y ∧z ≤ y can now be expressed as one: **0** + **1**.**0**!≤ **1**.**1**!.

In Example 14, we saw that tree paths with similar numerical constraints can be summarized in one regular expression. However, for precision purposes, we do not want to summarize all tree paths into one regular expression. Hence, we will keep several disjoint regular expressions, which we call a subpartitioning.

**Definition 15 (Subpartitioning).** *Given a regular expression* s*, a subpartitioning of* <sup>s</sup> *is a set* {s1,...,s<sup>n</sup>} *of regular expressions such that* <sup>∀</sup><sup>i</sup> <sup>=</sup> j, s<sup>i</sup>∩s<sup>j</sup> <sup>=</sup> <sup>∅</sup> *and* n <sup>i</sup>=1 <sup>s</sup><sup>i</sup> <sup>⊆</sup> <sup>s</sup>*. We note* <sup>P</sup>(s) *the set of all subpartitioning of* <sup>s</sup>*. Moreover if* <sup>S</sup> <sup>=</sup> {s1,...,s<sup>n</sup>} *is a set of regular expressions,* [S]<sup>∅</sup> <sup>=</sup> <sup>S</sup> \ {∅}*.*

*Remark 7.* Contrary to a partitioning of s, we do not require that the set of partitions covers s. Indeed when a set of tree paths is unconstrained we can just remove it from the partitioning, therefore no dimension in the numerical abstract environment will be allocated for this path.

**Fig. 6.** Unification operator

**Definition 16 (Positioning lattice with numerical abstraction).** *Given a ranked alphabet* <sup>R</sup>*, where the maximum arity of symbols is* <sup>n</sup>*, we define* <sup>C</sup>(R) = {s, <sup>p</sup>, R | <sup>s</sup> <sup>∈</sup> *Reg*n, <sup>p</sup> <sup>∈</sup> <sup>P</sup>(s), R <sup>∈</sup> <sup>M</sup> <sup>p</sup>}*. Therefore* <sup>C</sup>(R) *are triples containing:*


*Remark 8.* In the following, numerical abstract elements described in the form {c}, where <sup>c</sup> is a set of constraints, refer to c, **vars**(c), **vars**(c) ∈ <sup>M</sup>.

#### **Algorithm 2. unify join** operator

**Input :** *s,* {*p*1*,...,p*n}*, R*- *, s ,* {*p* 1*,...,p* m}*, R*- two abstract elements **Output:** two unified abstract elements **<sup>1</sup>** (*c*i,j )<sup>i</sup>≤n,j≤<sup>m</sup> ← *p*<sup>i</sup> ∩ *p* j ; **<sup>2</sup>** (*p*i)<sup>i</sup>≤<sup>n</sup> <sup>←</sup> *<sup>p</sup>*<sup>i</sup> <sup>∩</sup> *<sup>s</sup>*<sup>c</sup>; **3** (*p* <sup>j</sup> )<sup>j</sup>≤<sup>m</sup> ← *p* <sup>j</sup> <sup>∩</sup> *<sup>s</sup>*<sup>c</sup>; **<sup>4</sup>** (*q*i)<sup>i</sup>≤<sup>n</sup> ← *p*<sup>i</sup> ∩ *s* ∩ (∪<sup>j</sup>≤<sup>m</sup>*c*i,j ) c; **5** (*q* <sup>j</sup> )<sup>j</sup>≤<sup>m</sup> ← *p* <sup>j</sup> ∩ *s* ∩ (∪<sup>i</sup>≤<sup>n</sup>*c*i,j ) c; **<sup>6</sup>** *R*- <sup>←</sup> *<sup>R</sup>*- ; **<sup>7</sup>** *R*- <sup>←</sup> *<sup>R</sup>*- ; **8 for** *i* = 1 **to** *n* **do <sup>9</sup>** *R*- <sup>←</sup> **expand**(*R*- *, p*i*,* [{*c*i,j}<sup>j</sup>≤<sup>m</sup> ∪ {*p*i}∪{*q*i}]∅); **10 for** *j* = 1 **to** *m* **do <sup>11</sup>** *R*- <sup>←</sup> **expand**(*R*- *, p* <sup>j</sup> *,* [{*c*i,j}<sup>i</sup>≤<sup>n</sup> ∪ {*p* j}∪{*q* <sup>j</sup>}]∅); **<sup>12</sup> return** *s,* - <sup>i</sup>≤n,j≤<sup>m</sup>[{*q*i*, p*i*, c*i,j}]∅*, R*- *, s ,* - <sup>i</sup>≤n,j≤<sup>m</sup>[{*q* i*, p* <sup>j</sup> *, c*i,j}]∅*, R*- ;

*Unification.* The previous definition shows that two elements <sup>U</sup> <sup>=</sup> s, <sup>p</sup>, R and <sup>V</sup> <sup>=</sup> s , p , R can have different subpartitionings (p and p ). However the partitions in p and in p might overlap, thus giving constraints to similar tree paths. Therefore in order to define the classical operators: , and , we need to unify the two abstract elements (U and V ) so that given a tree path and the partition in which it is contained in U, it is contained in the same partition in V . This will enable us to rely on abstract operators on the numerical domain. In order to perform unification, we rely on the **expand** and **fold** operators. Indeed consider our running example, <sup>U</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **0**!, **<sup>1</sup>**!}, { **0**!≤ **1**!} and <sup>V</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.(**<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**)!, { **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.**0**!, **<sup>1</sup>**.**1**!}, { **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.**0**!≤ **1**.**1**!}. We see that constraints on tree path 0 is given: in <sup>U</sup> by partition **<sup>0</sup>**! and in <sup>V</sup> by partition **0** + **1**.**0**!. However we can split the partition **0** + **1**.**0**! into two partitions: **0**! and **1**.**0**!, and expand variable **0**+**1**.**0**! into the two variables **0**! and **1**.**0**! in the numeric component: **expand**({ **0**+**1**.**0**!≤ **1**.**1**!}, **0**+**1**.**0**!, { **0**!, **1**.**0**!}) = { **0**!≤ **1**.**1**!, **<sup>1</sup>**.**0**!≤ **1**.**1**!}. Once <sup>U</sup> and <sup>V</sup> are unified we can rely on the numerical join to soundly abstract the union. Note that splitting partitions is more precise than merging them. Indeed, consider the example where: in U we have **<sup>0</sup>**! ≥ 0 and **<sup>1</sup>**! ≤ 0 and in <sup>V</sup> we have **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**! = 0. Splitting partition in <sup>V</sup> yields: **<sup>0</sup>**! = 0, **<sup>1</sup>**! = 0, after joining we get **<sup>0</sup>**! ≥ <sup>0</sup>, **<sup>1</sup>**! ≤ 0. Whereas merging partitions in <sup>U</sup> yields **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**! unconstrained, after joining we also get that **0**+**1**! is unconstrained. However unifying by splitting or merging partitions in both abstract elements might result in an over-approximation of the initial elements. This does not pose a threat to the soundness of the join operator, but it does for the inclusion test. Unifying by splitting partitions induces an increase in the number of partitions which we want to avoid when trying to stabilize abstract elements in the widening. Hence, we define three unification operators:


Operators **unify subset** and **unify widen** are very similar to **unify join**.

**Definition 17 (Comparison** C(R)**).** *Using* **unify subset** *we define a relation on* <sup>C</sup>(R)*:* C(R)<sup>=</sup> {(U, V ) <sup>|</sup> (s, <sup>p</sup>, N,s , p , N ) = **unify subset**(U, <sup>V</sup> ) <sup>⇒</sup> <sup>s</sup> <sup>⊆</sup> <sup>s</sup> ∧ ∀<sup>b</sup> <sup>∈</sup> <sup>p</sup> ,(<sup>b</sup> <sup>⊆</sup> <sup>s</sup><sup>c</sup> ∨ ∃!<sup>a</sup> <sup>∈</sup> <sup>p</sup>, b <sup>∩</sup> <sup>s</sup> <sup>=</sup> <sup>a</sup>) <sup>∧</sup> <sup>N</sup> <sup>N</sup> [φ]} *where* <sup>φ</sup> *is the renaming from* p *into* p *that renames* b *to* a *when such an* a *exists.*

*Example 15.* Going back to our running example: <sup>U</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **0**!, **<sup>1</sup>**!}, { **0**!≤ **1**!}(= <sup>A</sup>) and <sup>V</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.(**<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**)!, { **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.**0**!, **<sup>1</sup>**.**1**!}, { **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.**0**! ≤ **<sup>1</sup>**.**1**!}. We have <sup>s</sup> ⊆ <sup>s</sup> hence <sup>U</sup> <sup>V</sup> . However if we now consider <sup>W</sup>: (+**1**).(**0**+**1**)!, { (+**1**).**0**!, (+**1**).**1**!}, { (+**1**).**0**!≤ (+**1**).**1**!}(= <sup>B</sup>). <sup>W</sup> is already unified with <sup>U</sup>, we have <sup>s</sup> <sup>⊆</sup> <sup>s</sup> and <sup>φ</sup> : ( (+**1**).**0**! → **<sup>0</sup>**, (+**1**).**1**! → **<sup>1</sup>**!). Moreover <sup>A</sup> <sup>B</sup>[φ] = { **0**!≤ **1**!}. Hence <sup>U</sup> <sup>W</sup>.

**Proposition 8.** *We have:* (C(R), C(R)) <sup>γ</sup><sup>1</sup> ←− (C(R), C(R))*, where:* <sup>γ</sup>1(s, <sup>p</sup>, <sup>R</sup>) = {<sup>f</sup> <sup>|</sup> **def**(f) <sup>⊆</sup> <sup>γ</sup>*Reg*<sup>n</sup> (s) <sup>∧</sup> <sup>f</sup> <sup>∈</sup> <sup>γ</sup>[<sup>↑</sup> <sup>p</sup>](R)}*. By composition we get:* (℘(TZ(R)), <sup>⊆</sup>) <sup>γ</sup><sup>2</sup> ←− (C(R), CR)*, with* <sup>γ</sup><sup>2</sup> <sup>=</sup> <sup>γ</sup>C(R) ◦ <sup>γ</sup>1*.*

*Example 16.* Going back to our running example: <sup>V</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.(**<sup>0</sup>** <sup>+</sup> **1**)!, { **0** + **1**.**0**!, **1**.**1**!}, { **0** + **1**.**0**!≤ **1**.**1**!}. We have: ↑ p = ( **0** + **1**.**0**! → {0, 1, <sup>0</sup>}, **<sup>1</sup>**! → 1). Hence, <sup>γ</sup>1(<sup>V</sup> ) = {(0 → α, 1 → <sup>β</sup>) <sup>|</sup> <sup>α</sup> <sup>≤</sup> <sup>β</sup>}∪{(1, <sup>0</sup> → α, 1 → <sup>β</sup>) <sup>|</sup> <sup>α</sup> <sup>≤</sup> <sup>β</sup>}∪{(0 → α, 1, <sup>0</sup> → γ, 1 → <sup>β</sup>) <sup>|</sup> α ≤ β ∧ γ ≤ β}. The product with tree automata refines this result so that only the last set is left.

We now define the operator that relies on the **unify join** operator of Algorithm 2. Once elements are unified we can distinguish three kinds of partitions: (1) Partitions found in both abstract elements (e.g. in Fig. 6). (2) Partitions found in only one of the two, which do not overlap over the support of the other abstract element (denoted u<sup>o</sup>), these are outer-partitions. Information on such partitions can be soundly kept when joining two abstract elements (e.g. partition a in Fig. 6). (3) Partitions found in only one of the two, which overlap over the support of the other abstract element, these are inner-partitions. Information on such partitions can not be soundly kept when joining two abstract elements. (e.g. partition b in Fig. 6). Therefore in the following definition of the join operator, we compute (once elements are unified) the common partitions and both outer-partitions and merge them to form the resulting subpartitioning.

**Definition 18 (Union abstract operator).** *Given* <sup>U</sup>, V ∈ C(R)*, if* (s, <sup>p</sup>, R,s , p , R ) = **unify join**(U, V )*, let* <sup>c</sup> *be* <sup>p</sup> <sup>∪</sup> <sup>p</sup> *, let* u<sup>o</sup> *(*U *outerpartition) be* {<sup>e</sup> <sup>∈</sup> <sup>p</sup> <sup>|</sup> <sup>e</sup> <sup>⊆</sup> <sup>s</sup><sup>c</sup>}*, let* <sup>v</sup><sup>o</sup> *(*<sup>V</sup> *outer-partition) be* {<sup>e</sup> <sup>∈</sup> <sup>p</sup> <sup>|</sup> <sup>e</sup> <sup>⊆</sup> <sup>s</sup><sup>c</sup>}*, we then define:*

$$U^\sharp \sqcup\_{\mathcal{C}^\sharp(\mathcal{R})} V^\sharp = \langle s \cup s', \mathfrak{c} \cup u^o \cup v^o, R^\sharp\_{|\mathfrak{c}\cup u^o} \sqcup R^\sharp\_{|\mathfrak{c}\cup v^o} \rangle\_\sharp$$

**Proposition 9.** *We have:* <sup>γ</sup>1(U) <sup>∪</sup> <sup>γ</sup>1(<sup>V</sup> ) <sup>⊆</sup> <sup>γ</sup>1(U C(R) <sup>V</sup> )*.*

*Example 17.* Consider the two following abstract elements (this is the particular case of our running example where all numerical values are equal): <sup>V</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.(**<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**)!(= <sup>s</sup>), { **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**.**0**!(= <sup>a</sup>), **<sup>1</sup>**.**1**!(= <sup>b</sup>), {<sup>a</sup> <sup>=</sup> <sup>b</sup>}}, and <sup>U</sup> <sup>=</sup> **0** + **1**!(= s ), { **0**!(= <sup>c</sup>), **<sup>1</sup>**!(= <sup>d</sup>)}, {<sup>c</sup> <sup>=</sup> <sup>d</sup>}. Intuitively <sup>U</sup> could encode the term (x+x) and V the term (x+ (x+x)). The unification of those two elements is: V <sup>1</sup> <sup>=</sup> s, {c, b, **<sup>1</sup>**.**0**!(= <sup>e</sup>)}, R where <sup>R</sup> <sup>=</sup> {<sup>c</sup> <sup>=</sup> b, e <sup>=</sup> <sup>b</sup>}, {b}, {c, b, e} and U <sup>1</sup> <sup>=</sup> <sup>U</sup>, moreover the common environment (<sup>c</sup> in previous definition) is: {c},

**Fig. 7.** Widening illustration

<sup>V</sup> outer-partitioning is {e, f}, <sup>U</sup> outer-partitioning is {d}. Hence: the numerical component resulting of the join is: {c = d}, {c, d}, {c, d} {c = b, e = b}, {b}, {c, b, e} which is: {c = b, e = b, c = d}, ∅, {c, d, e, b}. We see here that using a naive numerical join operator, we would not have been able to get such a precise result (the numerical join would have yielded ).

**unify widen** <sup>C</sup>(R) contains infinite increasing chains, therefore, we need to provide a widening operator. As for the other operators, widening is computed on unified abstract elements. A **unify widen** operator is defined: it produces U and V , over-approximations of its inputs with the same number of partitions. Moreover it ensures that each partition of U intersects exactly one partition of V . This can be obtained by iterative merging partitions that overlap in both arguments until the abstract elements have the exact same partitions. Therefore from the result of **unify widen** we can extract a list of pairs (a, b) where a is a partition from <sup>U</sup>, <sup>b</sup> is a partition from <sup>V</sup> and <sup>a</sup>∩<sup>b</sup> <sup>=</sup> <sup>∅</sup>. This defines a bijection from partitions of U onto partitions of V .

**compose**. In order to ensure stabilization we first need to stabilize the supports on which abstract elements are defined. This is easily done using the automaton widening (s1s<sup>2</sup> in Algorithm 3). Figure 7 illustrates the following simple example: <sup>U</sup> is an abstract element with support **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, two partitions <sup>u</sup> <sup>=</sup> **<sup>0</sup>**! and <sup>u</sup> <sup>=</sup> **<sup>1</sup>**!, and numerical constraints <sup>u</sup> = 1 and <sup>u</sup> = 0. <sup>V</sup> is an abstract element with support ( + **1**).(**0** + **1**)!, two partitions v = ( + **1**).**0**! and v = ( + **1**).**1**! with the numerical constraints that v = 0 and v = 1. Supports are unstable, therefore we start by widening them, which yields a new support: **<sup>1</sup>**.(**<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**)!. The unification of <sup>U</sup> and <sup>V</sup> leaves subpartitionings unchanged and yields the bijection (u → v, u → v ). Given this information we now need to provide a new subpartitioning for the result of the widening. We see in this example that we could soundly use the subpartitioning from V , this would produce the abstract element Z <sup>1</sup> depicted in Fig. 7. However due to the widening of the support, paths of the form 1, <sup>1</sup>, <sup>1</sup>, <sup>0</sup> are in the support of the result but are left unconstrained as they are not in any of the partitions. Therefore we need to use the opportunity of the extension of the support to place constraints on the newly added paths. In order to do so we would like to force the extension of the existing partitions from U and V into the new support. Therefore we need to define a **compose** operator that produces a sound new partition, given: (1) a pair a, b of partitions (such as the one produced by

#### **Algorithm 3. widening** operator

**Input :** *U*- *, V* two abstract elements **<sup>1</sup>** (*s*1*,* <sup>p</sup>1*, R*- <sup>1</sup>*, s*2*,* <sup>p</sup>2*, R*- <sup>2</sup>) <sup>←</sup> **unify widen**(*U*- *, V* - ) ; **<sup>2</sup>** *s* ← *s*1*s*2; **<sup>3</sup>** *r* ← *s* \ (*s*<sup>1</sup> ∪ *s*2); **<sup>4</sup> foreach** *a* ∈ p<sup>1</sup> **do <sup>5</sup>** *b* ← the unique element from p<sup>2</sup> such that *b* ∩ *a* = ∅; **<sup>6</sup>** *p* ← **compose**(*a, b, s*1*, s*2*, r*); **<sup>7</sup>** p ← {*p*} ∪ p; **<sup>8</sup>** *R*- <sup>1</sup> <sup>←</sup> *<sup>R</sup>*- <sup>1</sup> [*a* → *p*]; **<sup>9</sup>** *R*- <sup>2</sup> <sup>←</sup> *<sup>R</sup>*- <sup>1</sup> [*b* → *p*]; **<sup>10</sup>** *r* ← *r* \ *p*; **11 if** p = p<sup>1</sup> **then <sup>12</sup> return** *s,* <sup>p</sup>*, R*- <sup>1</sup> *R*- <sup>2</sup> ; **13 else <sup>14</sup> return** *s,* <sup>p</sup>*, R*- <sup>1</sup> *<sup>R</sup>*- <sup>2</sup> ;

**unify widen**), (2) the support s<sup>1</sup> (resp s2) in which a (resp. b) lives and (3) a space to occupy r. The following criteria must be verified by the resulting partition p in order to be sound and to terminate: p ∩ s<sup>1</sup> = a, p ∩ s<sup>2</sup> = b and p \ (s<sup>1</sup> ∪ s2) ⊆ r. A variety of **compose** operators could be defined, we chose: **compose**(a, b, s1, s2, r) = a∪(b∩(s2\s1))∪((a(a∪b))∩r). The idea is the following: we keep a (as it is always sound thanks to the definition of the **unify widen** operator), we keep the part from b that satisfies the soundness condition, and we extend into the space left to occupy according to the automata widening of a and a ∪ b. In our example, considering the pair (u, v), this would translate as: a = **0**, <sup>b</sup>∩(s2\s1) = **<sup>1</sup>**.**0**! and (a(a∪b))∩<sup>r</sup> <sup>=</sup> **<sup>0</sup>**! (+1).**0**!∩ **1**≥<sup>2</sup>(**0**+**1**)! <sup>=</sup> **<sup>1</sup>**≥2.**0**!. We get the new partition: **<sup>1</sup>**.**0**!. Doing the same with the pair (v, v ) yields **<sup>1</sup>**.**1**!. Finally we get the abstract element <sup>Z</sup> <sup>2</sup> from Fig. 7, which is more precise than Z 1.

**Definition 19 (Widening).** *Algorithm 3 provides the definition of a widening operator using the* **unify widen** *operator and parameterized by a* **compose** *function.*

*Widening Stabilization.* Our abstraction contains three components: (1) a support that describes the set of paths (2) a subpartitioning of this support and (3) a numerical component giving constraints on partitions in the subpartitioning. We show how the widening operator stabilizes all three components.


allowed on the subpartitionings are those made by the **unify widen** operator. Each partition resulting from the operator is the union of input partitions, hence the subpartitioning will stabilize.

– Once subpartitionings are stable (p<sup>1</sup> = p in Algorithm 3) numerical widening is applied on the numerical component in order to ensure stabilization.

*Example 18 (Numerical example).* Consider the simple example where: <sup>R</sup> <sup>=</sup> {f(2)}, <sup>U</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **0**!, **<sup>1</sup>**!}, { **1**! <sup>=</sup> **<sup>0</sup>**!} and <sup>V</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **0**!, **<sup>1</sup>**!}, { **1**!≥ **0**!, **<sup>1</sup>**!≤ **0**!+ 1}. <sup>U</sup> and <sup>V</sup> have the same shape, therefore widening will be performed on the numerical component of the abstraction, therefore: <sup>U</sup><sup>V</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **0**!, **<sup>1</sup>**!}, { **1**!≥ **0**!}.

*Reducing Dimensionality and Improving Precision.* As emphasized by the previous examples, definitions and illustrations, the numerical component of an abstract state is used as a container for constraints on regular expressions, every node in a regular expression must then satisfy all numerical constraints on the underlying regular expression. Therefore when two nodes of a tree satisfy the same constraints, they should be stored in the same partition so as to reduce the dimension of the numerical domain (thus improving efficiency). Moreover the widening operator provided in Algorithm 3 relies (for precision) on the fact that partitions are built by similarity of constraints, therefore partition merging, when it does not result in an over-approximation, also leads to a precision gain. The unification operator defined in Algorithm 2 tends to split partitions whereas the widening operator defined in Algorithm 3 tends to merge them. In order to reduce dimensionality, we would like to define a **reduce** : <sup>C</sup>(R) → C(R) operator, that folds variables with similar constraints into one. Please note that ∀S ∩S ⊆ {x}, <sup>x</sup> <sup>∈</sup> <sup>S</sup> and <sup>R</sup> <sup>∈</sup> <sup>N</sup>S, we have that <sup>R</sup> <sup>N</sup><sup>S</sup> **expand**(**fold**(R, x, S ), x, S ). This means that when variables are folded into one, expanding them afterwards would yield a bigger abstract element. For example, consider the octagon <sup>R</sup> <sup>=</sup> {<sup>x</sup> <sup>≥</sup> <sup>2</sup>, y <sup>≥</sup> <sup>2</sup>, x <sup>=</sup> <sup>y</sup>} then **fold**(R, z, {x, y}) = {<sup>z</sup> <sup>≥</sup> <sup>2</sup>}( Δ = R ) and **expand**(R , z, {x, y}) = {<sup>x</sup> <sup>≥</sup> <sup>2</sup>, y <sup>≥</sup> <sup>2</sup>}. However if we consider <sup>R</sup> <sup>=</sup> {<sup>x</sup> <sup>≥</sup> <sup>2</sup>, y <sup>≥</sup> <sup>2</sup>} then **fold**(**expand**(R, z, {x, y}), z, {x, y}) = <sup>R</sup>. Therefore if we assume given a score function **score**(R, x, S ) ranging in [0, 1] such that **score**(R, x, S )=1 <sup>⇔</sup> <sup>R</sup> <sup>=</sup> **expand**(**fold**(R, x, S ), x, S ), we are able to define a generic **reduce** operator parameterized by a value α. This **reduce** operator merges partitions until no more set of partitions has a high enough score according to the **score** function. Finding a good **score** function is a work in progress. As a first approximation we used the following trivial one: **score**0(R, S) = 1 when **expand**(**fold**(R, x, S), x, S) = R and 0 otherwise. This **score**<sup>0</sup> guarantees there is no loss of precision, but can miss opportunities for simplification.

*Example 19.* Consider the following example: <sup>U</sup> <sup>=</sup> **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **0**!, **<sup>1</sup>**!}, { **0**! <sup>=</sup> 0, **1**! = 0}. Relations on **0**! and **1**! can be expressed in one relation using the summarizing variable **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!. This yields: **reduce**(U) = **<sup>0</sup>** <sup>+</sup> **<sup>1</sup>**!, { **<sup>0</sup>** <sup>+</sup> **1**!}, { **0** + **1**! = 0}. Note that **expand**({ **0** + **1**! = 0}, **0** + **1**!, { **1**!, **0**!}) = { **0**! = 0, **1**! = 0}. Therefore no information is lost.

*Abstract Semantic of Operators.* As for tree automata, abstract semantic of operators defined in Sect. 2 can be defined as simple transformations on regular automata. Indeed the make symbolic(s ∈ R) (resp. get son) operator, amounts to adding (resp. removing) an integer letter to: (1) the partitions in the subpartitioning and (2) the support. make integer(<sup>e</sup> <sup>∈</sup> *expr*) amounts to building an abstract element with support ! and a subpartitioning containing only { !}, on which we put the constraint that it is equal to e. is symbol needs only split the support and each partition, in the two language <sup>L</sup> <sup>=</sup> {} and <sup>A</sup> <sup>n</sup> \L. Indeed in order to restrict to terms having only an integer as root, the support must be reduced to . The get sym head operator always yields the whole ranked alphabet (as this was abstracted away and will be refined by the automaton abstraction). Finally for get num head: (1) if the empty path is in the support we produce the set of integers satisfying the numerical constraints on the partition containing , and in case no such partition could be found, and (2) otherwise we know that no numerical value is produced.

#### **5.2 Product of Tree Automata and Numerical Constraints**

The abstraction by tree automata defined in Sect. 3 and the abstraction by numerical constraints on tree paths defined in Sect. 5.1 provide non comparable information on the set of terms they abstract. Indeed the former describes precisely the shape of the term but can not express numerical constraints whereas the latter abstracts away most of the shape and focuses on numerical constraints. To benefit from both kinds of information, we use a reduced product between the two domains. Both abstractions in the product contain information on potential integer positions. The position of the symbol in the tree automaton abstraction and the support in the numerical constraints abstractions both yield this information. We remove the support component from the product as the information can be retrieved from the tree abstraction. The definitions of the abstract operators in Sect. 5.1 require the support to be a regular language. We show in this subsection how to retrieve the support of a tree automaton with holes and that it is regular.

Given a FTA(Q, R, Q<sup>f</sup> , δ) over a ranked alphabet R with maximum arity n. We assume that every node in Q is reachable. Consider the following system over variables v<sup>p</sup> for p ∈ Q with values in the set of languages over the alphabet A<sup>n</sup> (. designates the classical concatenation operator lifted to languages):

$$\{v\_p = \bigcup\_{(s,(q\_1,\ldots,q\_m),q)\in\delta\vert q\_i=p} v\_q.\{i\}\cup\left\{\begin{matrix}\{\epsilon\}\text{ if } p\in Q\_f\\\emptyset\text{ otherwise}\end{matrix}\mid p\in Q\right\}.$$

Every language {i} for <sup>i</sup> <sup>∈</sup> <sup>N</sup> is regular and does not contain , moreover ∅ and {} are regular languages. By application of Arden's rule (see [18]) and Gauss elimination we can compute the unique solution of this system, moreover every v<sup>p</sup> is regular. Variable v<sup>p</sup> is defined so that: w ∈ v<sup>p</sup> if and only if there exists a tree t recognized by the automaton such that p ∈ reach(t|<sup>w</sup>). If - ∈ R we have that the regular language: ∪(-,(),p)∈<sup>δ</sup>v<sup>p</sup> represents exactly the potential positions of integers in trees accepted by the tree automaton.

*Height and Size.* The product is enriched with a simple height and size abstraction: numerical variables (encoding heights and sizes) are added to the numerical component of the abstraction.

#### **5.3 Environment Abstraction**

In the previous section, we designed abstractions for sets of trees. However in order to be able to tackle the examples from the introductory section (Sect. 1) we need to design an abstraction able to represent maps from a set of variables to natural terms. In Sect. 3 we have shown how to lift abstractions on natural terms to abstractions of environments over a given finite set of finite term variables T . We apply the same mechanism here to lift the product presented in Sect. 5.2. However lifting the product would result in abstract environments being maps from natural term variables to abstractions containing a numerical environment. In order to be able to express numerical relations between two sets of natural terms or even between numerical program variables and numerical values of natural terms we factor away the numerical environment so that it is shared by all natural term abstractions in the term environment and by the program variables in the numerical environment. Therefore the final abstraction is a pair (m, R) where: (1) <sup>m</sup> is a map from <sup>T</sup> to an abstract element that is a product of the automaton abstraction and the hole positioning abstraction. Moreover as all the numerical constraints are stored in a common numerical environment the product abstraction amounts to a pair (A, p) where A is an element of the automaton abstraction and p is a partitioning of its support. (2) R is an element of M binding in the same numerical element: numerical program variables and all partitions found in the mapping m.

### **6 Implementation and Example**

#### **6.1 Implementation**

The analyzer was implemented in OCaml (∼5000 loc) in the novel and still in development Mopsa framework (see [21]). Mopsa enables a modular development of static analyzers defined by abstract interpretation. An analyzer is built by choosing abstract domains, and combining them according to the user specification. Mopsa comes with pre-existing iterators and domains (e.g. interprocedural analysis, loop iterators, numerical domains, . . . ), and new ones can be added (e.g. tree abstract domain). A key feature of Mopsa is the ability of an abstract domain to use the abstract knowledge it maintains to transform dynamically expressions into other expressions that can be manipulated more easily by further domains, providing a flexible way to combine relational domains. For instance, assume that a domain abstracts arrays by associating a scalar variable a0, a1, . . . , to each element a[0], a[1], ..., of an array a, and delegating the abstraction of the array contents to a numeric domain for scalars. It can then evaluate <sup>E</sup>[[2 <sup>∗</sup> <sup>a</sup>[i] + <sup>i</sup>]](<sup>i</sup> → [0, 1]) into the disjunction (2 ∗ a<sup>0</sup> + i, i → [0, 0]) ∨ (2 ∗ a<sup>1</sup> + i, i → [1, 1]), indicating that 2 ∗ a[i] + i is equivalent to 2 ∗ a<sup>0</sup> + i in the sub-environment where i = 0 and to 2 ∗ a<sup>1</sup> + i in the sub-environment where i = 1. Each term of the disjunction contains an array-free expression that can be handled by the scalar domain in the corresponding subenvironment. In the abstract, expressions can be evaluated by induction on the syntax into symbolic expressions to retain the full power of relational domains and disjunctive reasoning (see [21] for more details). We exploit this feature in our implementation to combine our tree abstractions. We implemented (in the Mopsa framework) libraries for regular and tree regular languages that offer the usual lattice interface enriched with a widening operator. These libraries can be reused for the definition of other abstract domains. The overall complexity of the analysis is driven by the complexity of the lattice operations in the regular and tree regular libraries. These are exponential in the number of states of the considered automata, which is bounded by the widening parameter.

### **6.2 Examples of Analysis**

Numerical variables of the form t.x, where t is a natural term variable, represent a variable allocated for tree t. For example: t.r where r is a regular expression is the variable allocated for partition r in tree t.

*C Introductory Example.* Let us consider the introductory example Program 4. The loop invariant inferred with our analysis is the following abstract element: <sup>U</sup> = (<sup>y</sup> → (A, { **0**.(**0**.**0**).**1**!(= <sup>r</sup>)}), R), with <sup>A</sup> <sup>=</sup> {a, b, c, d}, {∗(1), +(2), -(0),(p, 0)}, {c}, {∗(d) → c, +(c, a) → d, -() → a, p → <sup>c</sup>}, and <sup>R</sup> satisfies the constraints: {<sup>i</sup> <sup>≥</sup> <sup>0</sup>, <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>y</sup>.r = 4}. This describes precisely the set of terms of the form: p, ∗(p+4), ∗(∗(p+4)+4), . . . . As mentioned in Sect. 6.1 evaluations of tree expressions yield pairs containing an expression and an abstract environment. Tree expressions are pairs (A, p), partitions in p are bound by the adjoined environment. Let us now present the result of the evaluation of the make integer(4) expression in the abstract environment U. Here we get the expression (A , { !}) (where A recognizes only -) in the environment: (<sup>y</sup> → (A, {r}), R ) where <sup>R</sup> <sup>=</sup> <sup>R</sup> ∪ { ! = 4}. This emphasizes how the environment is used to give constraints on the adjoined expression. This transports numerical relations from the leafs of the expression up to the assigned variable t.

*OCaml Introductory Example.* Let us now consider the introductory example Program 5. The inferred loop invariant is the following (<sup>r</sup> <sup>=</sup> (**1**.**1**).**0**! and <sup>r</sup> <sup>=</sup> (**1**.**1**).**1**.**0**!): (<sup>t</sup> → (A, {r, r }), R) and <sup>R</sup> satisfies the constraints: {t.r = x − 1, t.r = t.r + 2, i ≥ 0, i ≤ n} and A = ({a, b, c, d}, {Cons(2), Nil(0), -(0)}, {a}, {Cons(c, a) → d, Cons(c, d) → a, Nil → a, - → c}). Please note that at the end of the while loops the two numerical environments that need to be joined are not defined over the same set of variables (in the environments that have not gone through the loop, variables t.r and t.r are not present). However thanks to the operator, we do not have to loose the numerical relations between these variables and x. Hence we are able to prove that the assertion holds.

The analyzer was able to successfully analyze and infer the expected invariants for both examples.

# **7 Related Works**

Previous works on sets of trees abstractions [20] were able to recognize larger classes of tree languages than tree automata. However we focused here on the abstraction of trees labeled with numerical values, therefore the work closest to ours would be [12]. Indeed it defines tree automata where leaves can be elements of a lattice (for example an interval). They are therefore able to represent sets of natural terms, but can not express numerical relations between the leaves of trees. Moreover they rely on a partitioning of the leaf lattice for tree automata operations. In [1] (and [2]) tree automata and regular automata are used for the model checking of programs manipulating C pointers and structures. Other uses have been made of tree automata in verification: shape analysis of C programs as in [15], computation of an over-approximation of terms computable by attackers of cryptographic protocols as in [24]. Widening regular languages by the computation of an equivalence relation of bounded index is also done in [9] and in [11]. As mentioned, variable summarization is often used to represent unbounded memory locations as in [17] or [14]. Moreover numerical abstract domains able to handle optional variables have been defined such as [19]. Finally termination analyses have been proposed for the analysis of programs manipulating tree structures (AVL, red-black trees) see [16].

### **8 Conclusion**

In this article we presented a relational abstract environment for sets of trees over a finite algebra, with numerically labeled leaves. We emphasized the potential applications of being able to describe such trees: description of reachable memory zones, tracking symbolic equalities between program variables, description of tree like structures. In order to improve the precision of the analysis while not blowing up its cost we defined a novel abstraction for sets of maps with heterogeneous supports. This numeric abstraction is able to represent optional dimensions in numerical domains without losing relations with optional variables. All domains presented in the article were implemented as a library in the Mopsa framework.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Static Higher-Order Dependency Pair Framework**

Carsten Fuhs1(B) and Cynthia Kop2(B)

<sup>1</sup> Department of Computer Science and Information Systems, Birkbeck, University of London, London, UK carsten@dcs.bbk.ac.uk <sup>2</sup> Department of Software Science, Radboud University Nijmegen, Nijmegen, The Netherlands

c.kop@cs.ru.nl

**Abstract.** We revisit the static dependency pair method for proving termination of higher-order term rewriting and extend it in a number of ways: (1) We introduce a new rewrite formalism designed for general applicability in termination proving of higher-order rewriting, Algebraic Functional Systems with Meta-variables. (2) We provide a syntactically checkable soundness criterion to make the method applicable to a large class of rewrite systems. (3) We propose a modular dependency pair *framework* for this higher-order setting. (4) We introduce a fine-grained notion of *formative* and *computable* chains to render the framework more powerful. (5) We formulate several existing and new termination proving techniques in the form of processors within our framework.

The framework has been implemented in the (fully automatic) higherorder termination tool WANDA.

# **1 Introduction**

Term rewriting [3,48] is an important area of logic, with applications in many different areas of computer science [4,11,18,23,25,36,41]. *Higher-order* term rewriting – which extends the traditional *first-order* term rewriting with higher-order types and binders as in the λ-calculus – offers a formal foundation of functional programming and a tool for equational reasoning in higher-order logic. A key question in the analysis of both first- and higher-order term rewriting is *termination*; both for its own sake, and as part of confluence and equivalence analysis.

In first-order term rewriting, a hugely effective method for proving termination (both manually and automatically) is the *dependency pair (DP) approach* [2]. This approach has been extended to the *DP framework* [20,22], a highly modular methodology which new techniques for proving termination *and nontermination* can easily be plugged into in the form of *processors*.

In higher-order rewriting, two DP approaches with distinct costs and benefits are used: *dynamic* [31,45] and *static* [6,32–34,44,46] DPs. Dynamic DPs are more broadly applicable, yet static DPs often enable more powerful analysis techniques. Still, neither approach has the modularity and extendability of the DP framework, nor can they be used to prove non-termination. Also, these approaches consider different styles of higher-order rewriting, which means that for all results certain language features are not available.

In this paper, we address these issues for the *static* DP approach by extending it to a full higher-order *dependency pair framework* for both termination and non-termination analysis. For broad applicability, we introduce a new rewriting formalism, *AFSMs*, to capture several flavours of higher-order rewriting, including *AFSs* [26] (used in the annual Termination Competition [50]) and *pattern HRSs* [37,39] (used in the annual Confluence Competition [10]). To show the versatility and power of this methodology, we define various processors in the framework – both adaptations of existing processors from the literature and entirely new ones.

*Detailed Contributions.* We reformulate the results of [6,32,34,44,46] into a DP framework for AFSMs. In doing so, we instantiate the applicability restriction of [32] by a very liberal syntactic condition, and add two new flags to track properties of DP problems: one completely new, one from an earlier work by the authors for the *first-order* DP framework [16]. We give eight *processors* for reasoning in our framework: four translations of techniques from static DP approaches, three techniques from first-order or dynamic DPs, and one completely new.

This is a *foundational* paper, focused on defining a general theoretical framework for higher-order termination analysis using dependency pairs rather than questions of implementation. We have, however, implemented most of these results in the fully automatic termination analysis tool WANDA [28].

*Related Work.* There is a vast body of work in the first-order setting regarding the DP approach [2] and framework [20,22,24]. We have drawn from the ideas in these works for the core structure of the higher-order framework, but have added some new features of our own and adapted results to the higher-order setting.

There is no true higher-order DP *framework* yet: both static and dynamic approaches actually lie halfway between the original "DP approach" of firstorder rewriting and a full DP framework as in [20,22]. Most of these works [30–32,34,46] prove "non-loopingness" or "chain-freeness" of a set P of DPs through a number of theorems. Yet, there is no concept of *DP problems*, and the set R of rules cannot be altered. They also fix assumptions on dependency chains – such as minimality [34] or being "tagged" [31] – which frustrate extendability and are more naturally dealt with in a DP framework using flags.

The static DP approach for higher-order term rewriting is discussed in, e.g., [34,44,46]. The approach is limited to *plain function passing (PFP)* systems. The definition of PFP has been made more liberal in later papers, but always concerns the position of higher-order variables in the left-hand sides of rules. These works include non-pattern HRSs [34,46], which we do not consider, but do not employ formative rules or meta-variable conditions, or consider non-termination, which we do. Importantly, they do not consider strictly positive inductive types, which could be used to significantly broaden the PFP restriction. Such types *are* considered in an early paper which defines a variation of static higher-order dependency pairs [6] based on a computability closure [7,8]. However, this work carries different restrictions (e.g., DPs must be type-preserving and not introduce fresh variables) and considers only one analysis technique (reduction pairs).

Definitions of DP approaches for *functional programming* also exist [32,33], which consider applicative systems with ML-style polymorphism. These works also employ a much broader, semantic definition than PFP, which is actually more general than the syntactic restriction we propose here. However, like the static approaches for term rewriting, they do not truly exploit the computability [47] properties inherent in this restriction: it is only used for the initial generation of dependency pairs. In the present work, we will take advantage of our exact computability notion by introducing a computable flag that can be used by the computable subterm criterion processor (Theorem 63) to handle benchmark systems that would otherwise be beyond the reach of static DPs. Also in these works, formative rules, meta-variable conditions and non-termination are not considered.

Regarding *dynamic* DP approaches, a precursor of the present work is [31], which provides a halfway framework (methodology to prove "chain-freeness") for dynamic DPs, introduces a notion of formative rules, and briefly translates a basic form of static DPs to the same setting. Our formative *reductions* consider the shape of reductions rather than the rules they use, and they can be used as a flag in the framework to gain additional power in other processors. The adaptation of static DPs in [31] was very limited, and did not for instance consider strictly positive inductive types or rules of functional type.

For a more elaborate discussion of both static and dynamic DP approaches in the literature, we refer to [31] and the second author's PhD thesis [29].

*Organisation of the Paper.* Section 2 introduces higher-order rewriting using AFSMs and recapitulates computability. In Sect. 3 we impose restrictions on the input AFSMs for which our framework is soundly applicable. In Sect. 4 we define static DPs for AFSMs, and derive the key results on them. Section 5 formulates the DP framework and a number of DP processors for existing and new termination proving techniques. Section 6 concludes. Detailed proofs for all results in this paper and an experimental evaluation are available in a technical report [17]. In addition, many of the results have been informally published in the second author's PhD thesis [29].

### **2 Preliminaries**

In this section, we first define our notation by introducing the AFSM formalism. Although not one of the standards of higher-order rewriting, AFSMs combine features from various forms of higher-order rewriting and can be seen as a form of IDTSs [5] which includes application. We will finish with a definition of *computability*, a technique often used for higher-order termination methods.

#### **2.1 Higher-Order Term Rewriting Using AFSMs**

Unlike first-order term rewriting, there is no single, unified approach to higherorder term rewriting, but rather a number of similar but not fully compatible systems aiming to combine term rewriting and typed λ-calculi. For generality, we will use *Algebraic Functional Systems with Meta-variables*: a formalism which admits translations from the main formats of higher-order term rewriting.

**Definition 1 (Simple types).** *We fix a set* S *of* sorts*. All sorts are simple types, and if* σ, τ *are simple types, then so is* σ <sup>→</sup> τ *.*

We let → be right-associative. Note that all types have a unique representation in the form <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup> with <sup>ι</sup> ∈ S.

**Definition 2 (Terms and meta-terms).** *We fix disjoint sets* F *of* function symbols*,* V *of* variables *and* M *of* meta-variables*, each symbol equipped with a type. Each meta-variable is additionally equipped with a natural number. We assume that both* V *and* M *contain infinitely many symbols of all types. The set* <sup>T</sup> (F, <sup>V</sup>) *of* terms *over* <sup>F</sup>, <sup>V</sup> *consists of expressions* s *where* s : σ *can be derived for some type* σ *by the following clauses:*

*(V)* x : σ *if* x : σ ∈ V *(@)* s t : τ *if* s : σ <sup>→</sup> τ *and* t : σ

*(F)* <sup>f</sup> : σ *if* <sup>f</sup> : σ ∈ F *(*Λ*)* λx.s : σ <sup>→</sup> τ *if* x : σ ∈ V *and* <sup>s</sup> : <sup>τ</sup> Meta-terms *are expressions whose type can be derived by those clauses and:*

*(M)* <sup>Z</sup>s<sup>1</sup>,...,s<sup>k</sup> : <sup>σ</sup><sup>k</sup>+1 <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup>

*if* <sup>Z</sup> : (σ<sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>k</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> ι, k) ∈ M *and* <sup>s</sup><sup>1</sup> : <sup>σ</sup><sup>1</sup>,...,s<sup>k</sup> : <sup>σ</sup><sup>k</sup> *The* λ *binds variables as in the* λ*-calculus; unbound variables are called* free*, and FV* (s) *is the set of free variables in* s*. Meta-variables cannot be bound; we write FMV* (s) *for the set of meta-variables occurring in* s*. A meta-term* s *is called* closed *if FV* (s) = <sup>∅</sup> *(even if FMV* (s) <sup>=</sup> <sup>∅</sup>*). Meta-terms are considered modulo* α*-conversion. Application (@) is left-associative; abstractions (*Λ*) extend as far to the right as possible. A meta-term* s has type σ *if* s : σ*; it* has base type *if* <sup>σ</sup> ∈ S*. We define* head(s) = head(s<sup>1</sup>) *if* <sup>s</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup>*, and* head(s) = <sup>s</sup> *otherwise.*

*A (meta-)term* s *has a* sub-(meta-)term t*, notation* s ☎ t*, if either* s <sup>=</sup> t *or* <sup>s</sup> ✄ <sup>t</sup>*, where* <sup>s</sup> ✄ <sup>t</sup> *if (a)* <sup>s</sup> <sup>=</sup> λx.s *and* <sup>s</sup> ☎ <sup>t</sup>*, (b)* <sup>s</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup> *and* <sup>s</sup><sup>2</sup> ☎ <sup>t</sup> *or (c)* <sup>s</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup> *and* <sup>s</sup><sup>1</sup> ☎ <sup>t</sup>*. A (meta-)term* <sup>s</sup> *has a* fully applied sub-(meta-)term <sup>t</sup>*, notation* s <sup>t</sup>*, if either* <sup>s</sup> <sup>=</sup> <sup>t</sup> *or* <sup>s</sup> t*, where* s t *if (a)* s <sup>=</sup> λx.s *and* s t*, (b)* <sup>s</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup> *and* <sup>s</sup><sup>2</sup> <sup>t</sup> *or (c)* <sup>s</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>s</sup><sup>2</sup> *and* <sup>s</sup><sup>1</sup> <sup>t</sup> *(so if* <sup>s</sup> <sup>=</sup> x s<sup>1</sup> <sup>s</sup><sup>2</sup>*, then* <sup>x</sup> *and* x s<sup>1</sup> *are not fully applied subterms, but* <sup>s</sup> *and both* <sup>s</sup><sup>1</sup> *and* <sup>s</sup><sup>2</sup> *are).*

*For* Z : (σ, k) ∈ M*, we call* k *the* arity *of* Z*, notation arity*(Z)*.*

Clearly, all fully applied subterms are subterms, but not all subterms are fully applied. Every term <sup>s</sup> has a form t s<sup>1</sup> ··· <sup>s</sup><sup>n</sup> with <sup>n</sup> <sup>≥</sup> 0 and <sup>t</sup> <sup>=</sup> head(s) a variable, function symbol, or abstraction; in meta-terms t may also be a metavariable application Fs<sup>1</sup>,...,s<sup>k</sup>. *Terms* are the objects that we will rewrite; *meta-terms* are used to define rewrite rules. Note that all our terms (and metaterms) are, by definition, well-typed. For rewriting, we will employ *patterns*: **Definition 3 (Patterns).** *A meta-term is a* pattern *if it has one of the forms* <sup>Z</sup>x1,...,xk *with all* <sup>x</sup><sup>i</sup> *distinct variables;* λx. *with* <sup>x</sup> ∈ V *and a pattern; or* a <sup>1</sup> ··· <sup>n</sup> *with* <sup>a</sup> ∈F∪V *and all* <sup>i</sup> *patterns (*<sup>n</sup> <sup>≥</sup> <sup>0</sup>*).*

In rewrite rules, we will use meta-variables for *matching* and variables only with *binders*. In terms, variables can occur both free and bound, and meta-variables cannot occur. Meta-variables originate in very early forms of higher-order rewriting (e.g., [1,27]), but have also been used in later formalisms (e.g., [8]). They strike a balance between matching modulo β and syntactic matching. By using meta-variables, we obtain the same expressive power as with Miller patterns [37], but do so without including a reversed β-reduction as part of matching.

*Notational Conventions:* We will use x, y, z for variables, X, Y, Z for metavariables, b for symbols that could be variables or meta-variables, <sup>f</sup>, <sup>g</sup>, <sup>h</sup> or more suggestive notation for function symbols, and s, t, u, v, q, w for (meta-)terms. Types are denoted σ, τ , and ι, κ are sorts. We will regularly overload notation and write x ∈ V, <sup>f</sup> ∈ F or Z ∈ M without stating a type (or minimal arity). For meta-terms Z we will usually omit the brackets, writing just Z.

**Definition 4 (Substitution).** *A* meta-substitution *is a type-preserving function* γ *from variables and meta-variables to meta-terms. Let the* domain *of* γ *be given by:* dom(γ) = {(x : σ) ∈V| γ(x) <sup>=</sup> x}∪{(Z : (σ, k)) ∈M| <sup>γ</sup>(Z) <sup>=</sup> λy<sup>1</sup> ...y<sup>k</sup>.Zy<sup>1</sup>,...,y<sup>k</sup>}*; this domain is allowed to be infinite. We let* [b<sup>1</sup> := <sup>s</sup><sup>1</sup>,...,b<sup>n</sup> := <sup>s</sup><sup>n</sup>] *denote the meta-substitution* <sup>γ</sup> *with* <sup>γ</sup>(b<sup>i</sup>) = <sup>s</sup><sup>i</sup> *and* <sup>γ</sup>(z) = <sup>z</sup> *for* (<sup>z</sup> : <sup>σ</sup>) ∈V\{b<sup>1</sup>,...,b<sup>n</sup>}*, and* <sup>γ</sup>(Z) = λy<sup>1</sup> ...y<sup>k</sup>.Zy<sup>1</sup>,...,y<sup>k</sup> *for* (Z : (σ, k)) ∈M\{b<sup>1</sup>,...,b<sup>n</sup>}*. We assume there are infinitely many variables* <sup>x</sup> *of all types such that (a)* x /<sup>∈</sup> dom(γ) *and (b) for all* b <sup>∈</sup> dom(γ)*:* x /<sup>∈</sup> *FV* (γ(b))*.*

*A* substitution *is a meta-substitution mapping everything in its domain to terms. The result* sγ *of applying a meta-substitution* γ *to a term* s *is obtained by:* xγ <sup>=</sup> γ(x) *if* x ∈ V (s t)γ = (sγ) (tγ) 

$$\mathbf{f}\gamma = \mathbf{f} \qquad \text{if } \mathbf{f} \in \mathcal{F} \quad (\lambda x.s)\gamma = \lambda x.(s\gamma) \quad \text{if } \gamma(x) = x \land x \notin \bigcup\_{y \in \text{dom}(\gamma)} FV(\gamma(y))$$

*For meta-terms, the result* sγ *is obtained by the clauses above and:*

<sup>Z</sup>s<sup>1</sup>,...,s<sup>k</sup>γ <sup>=</sup> γ(Z)s<sup>1</sup>γ,...,s<sup>k</sup>γ *if* Z /<sup>∈</sup> dom(γ) Zs<sup>1</sup>,...,s<sup>k</sup>γ <sup>=</sup> γ(Z)s<sup>1</sup>γ,...,s<sup>k</sup>γ *if* Z <sup>∈</sup> dom(γ) (λx<sup>1</sup> ...x<sup>k</sup>.s)t<sup>1</sup>,...,t<sup>k</sup> <sup>=</sup> <sup>s</sup>[x<sup>1</sup> := <sup>t</sup><sup>1</sup>,...,x<sup>k</sup> := <sup>t</sup><sup>k</sup>] (λx<sup>1</sup> ...x<sup>n</sup>.s)t<sup>1</sup>,...,t<sup>k</sup> <sup>=</sup> <sup>s</sup>[x<sup>1</sup> := <sup>t</sup><sup>1</sup>,...,x<sup>n</sup> := <sup>t</sup><sup>n</sup>] <sup>t</sup><sup>n</sup>+1 ···t<sup>k</sup> *if* n<k *and* s *is not an abstraction*

Note that for fixed k, any term has exactly one of the two forms above (λx<sup>1</sup> ...x<sup>n</sup>.s with n<k and <sup>s</sup> not an abstraction, or λx<sup>1</sup> ...x<sup>k</sup>.s).

Essentially, applying a meta-substitution that has meta-variables in its domain combines a substitution with (possibly several) β-steps. For example, we have that: deriv (λx.sin (Fx))[F := λy.plus y x] equals deriv (λz.sin (plus z x)). We also have: X0, nil[X := λx.map (λy.x)] equals map (λy.0) nil.

**Definition 5 (Rules and rewriting).** *Let* <sup>F</sup>, <sup>V</sup>,<sup>M</sup> *be fixed sets of function symbols, variables and meta-variables respectively. A* rule *is a pair* <sup>⇒</sup> r *of closed meta-terms of the same type such that is a pattern of the form* <sup>f</sup> <sup>1</sup> ··· <sup>n</sup> *with* <sup>f</sup> ∈ F *and FMV* (r) <sup>⊆</sup> *FMV* ()*. A set of rules* <sup>R</sup> *defines a rewrite relation* ⇒<sup>R</sup> *as the smallest monotonic relation on terms which includes:*


*We say* <sup>s</sup> <sup>⇒</sup><sup>β</sup> <sup>t</sup> *if* <sup>s</sup> <sup>⇒</sup><sup>R</sup> <sup>t</sup> *is derived using a (Beta) step. A term* <sup>s</sup> *is* terminating *under* <sup>⇒</sup><sup>R</sup> *if there is no infinite reduction* <sup>s</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>⇒</sup><sup>R</sup> <sup>s</sup><sup>1</sup> <sup>⇒</sup><sup>R</sup> ... *, is* in normal form *if there is no* <sup>t</sup> *such that* <sup>s</sup> <sup>⇒</sup><sup>R</sup> <sup>t</sup>*, and is* <sup>β</sup>-normal *if there is no* <sup>t</sup> *with* <sup>s</sup> <sup>⇒</sup><sup>β</sup> <sup>t</sup>*. Note that we are allowed to reduce at any position of a term, even below <sup>a</sup>* <sup>λ</sup>*. The relation* <sup>⇒</sup><sup>R</sup> *is terminating if all terms over* <sup>F</sup>, <sup>V</sup> *are terminating. The set* D⊆F *of* defined symbols *consists of those* (<sup>f</sup> : σ) ∈ F *such that a rule* <sup>f</sup> <sup>1</sup> ··· <sup>n</sup> <sup>⇒</sup> <sup>r</sup> *exists; all other symbols are called* constructors*.*

Note that R is allowed to be infinite, which is useful for instance to model polymorphic systems. Also, right-hand sides of rules do not have to be in βnormal form. While this is rarely used in practical examples, non-β-normal rules may arise through transformations, and we lose nothing by allowing them.

*Example 6.* Let F⊇{<sup>0</sup> : nat, <sup>s</sup> : nat <sup>→</sup> nat, nil : list, cons : nat <sup>→</sup> list <sup>→</sup> list, map : (nat <sup>→</sup> nat) <sup>→</sup> list <sup>→</sup> list} and consider the following rules <sup>R</sup>:

> map (λx.Zx) nil <sup>⇒</sup> nil map (λx.Zx) (cons H T) <sup>⇒</sup> cons ZH (map (λx.Zx) T)

Then map (λy.0) (cons (s 0) nil) <sup>⇒</sup><sup>R</sup> cons 0 (map (λy.0) nil) <sup>⇒</sup><sup>R</sup> cons 0 nil. Note that the bound variable y does not need to occur in the body of λy.<sup>0</sup> to match λx.Zx. However, a term like map s (cons 0 nil) *cannot* be reduced, because <sup>s</sup> does not instantiate λx.Zx. We could alternatively consider the rules:

map Z nil <sup>⇒</sup> nil map Z (cons H T) <sup>⇒</sup> cons (Z H) (map Z T)

Where the system before had (Z : (nat <sup>→</sup> nat, 1)) ∈ M, here we assume (Z : (nat <sup>→</sup> nat, 0)) ∈ M. Thus, rather than meta-variable application <sup>Z</sup>H we use explicit application Z H. Then map s (cons 0 nil) <sup>⇒</sup><sup>R</sup> cons (s 0) (map s nil). However, we will often need explicit β-reductions; e.g., map (λy.0) (cons (s 0) nil) <sup>⇒</sup><sup>R</sup> cons ((λy.0) (s 0)) (map (λy.0) nil) <sup>⇒</sup><sup>β</sup> cons 0 (map (λy.0) nil).

**Definition 7 (AFSM).** *An* AFSM *is a tuple* (F, <sup>V</sup>,M, <sup>R</sup>) *of a signature and a set of rules built from meta-terms over* <sup>F</sup>, <sup>V</sup>,M*; as types of relevant variables and meta-variables can always be derived from context, we will typically just refer to the AFSM* (F, <sup>R</sup>)*. An AFSM implicitly defines the abstract reduction system* (<sup>T</sup> (F, <sup>V</sup>),⇒R)*: a set of terms and a rewrite relation on this set. An AFSM is terminating if* <sup>⇒</sup><sup>R</sup> *is terminating (on all terms in* <sup>T</sup> (F, <sup>V</sup>)*).*

*Discussion:* The two most common formalisms in termination analysis of higherorder rewriting are *algebraic functional systems* [26] (AFSs) and *higher-order rewriting systems* [37,39] (HRSs). AFSs are very similar to our AFSMs, but use variables for matching rather than meta-variables; this is trivially translated to the AFSM format, giving rules where all meta-variables have arity 0, like the "alternative" rules in Example 6. HRSs use matching modulo β/η, but the common restriction of *pattern HRSs* can be directly translated into AFSMs, provided terms are β-normalised after every reduction step. Even without this β-normalisation step, termination of the obtained AFSM implies termination of the original HRS; for second-order systems, termination is equivalent. AFSMs can also naturally encode CRSs [27] and several applicative systems (cf. [29, Chapter 3]).

*Example 8 (Ordinal recursion).* A running example is the AFSM (F, <sup>R</sup>) with F⊇{<sup>0</sup> : ord, <sup>s</sup> : ord <sup>→</sup> ord, lim : (nat <sup>→</sup> ord) <sup>→</sup> ord, rec : ord <sup>→</sup> nat <sup>→</sup> (ord → nat → nat) → ((nat → ord) → (nat → nat) → nat) → nat} and R given below. As all meta-variables have arity 0, this can be seen as an AFS.

> rec 0 KFG <sup>⇒</sup> K rec (<sup>s</sup> <sup>X</sup>) KFG <sup>⇒</sup> F X (rec XKFG) rec (lim H) KFG <sup>⇒</sup> G H (λm.rec (H m) KFG)

Observant readers may notice that by the given constructors, the type nat in Example 8 is not inhabited. However, as the given symbols are only a subset of F, additional symbols (such as constructors for the nat type) may be included. The presence of additional function symbols does not affect termination of AFSMs:

**Theorem 9 (Invariance of termination under signature extensions).** *For an AFSM* (F, <sup>R</sup>) *with* <sup>F</sup> *at most countably infinite, let* funs(R) ⊆ F *be the set of function symbols occurring in some rule of* <sup>R</sup>*. Then* (<sup>T</sup> (F, <sup>V</sup>),⇒R) *is terminating if and only if* (<sup>T</sup> (funs(R), <sup>V</sup>),⇒R) *is terminating.*

*Proof.* Trivial by replacing all function symbols in F \funs(R) by corresponding variables of the same type.

Therefore, we will typically only state the types of symbols occurring in the rules, but may safely assume that infinitely many symbols of all types are present (which for instance allows us to select unused constructors in some proofs).

#### **2.2 Computability**

A common technique in higher-order termination is Tait and Girard's *computability* notion [47]. There are several ways to define computability predicates; here we follow, e.g., [5,7–9] in considering *accessible meta-terms* using strictly positive inductive types. The definition presented below is adapted from these works, both to account for the altered formalism and to introduce (and obtain termination of) a relation <sup>C</sup> that we will use in the "computable subterm criterion processor" of Theorem 63 (a termination criterion that allows us to handle systems that would otherwise be beyond the reach of static DPs). This allows for a minimal presentation that avoids the use of ordinals that would otherwise be needed to obtain <sup>C</sup> (see, e.g., [7,9]).

To define computability, we use the notion of an *RC-set*:

**Definition 10.** *A* set of reducibility candidates*, or* RC-set*, for a rewrite relation* <sup>⇒</sup><sup>R</sup> *of an AFSM is a set* <sup>I</sup> *of base-type terms* <sup>s</sup> *such that: every term in* <sup>I</sup> *is terminating under* <sup>⇒</sup>R*;* <sup>I</sup> *is closed under* <sup>⇒</sup><sup>R</sup> *(so if* <sup>s</sup> <sup>∈</sup> <sup>I</sup> *and* <sup>s</sup> <sup>⇒</sup><sup>R</sup> <sup>t</sup> *then* <sup>t</sup> <sup>∈</sup> <sup>I</sup>*); if* <sup>s</sup> <sup>=</sup> x s<sup>1</sup> ··· <sup>s</sup><sup>n</sup> *with* <sup>x</sup> ∈ V *or* <sup>s</sup> = (λx.u) <sup>s</sup><sup>0</sup> ··· <sup>s</sup><sup>n</sup> *with* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*, and for all* <sup>t</sup> *with* <sup>s</sup> <sup>⇒</sup><sup>R</sup> <sup>t</sup> *we have* <sup>t</sup> <sup>∈</sup> <sup>I</sup>*, then* <sup>s</sup> <sup>∈</sup> <sup>I</sup> *(for any* u, s<sup>0</sup>,...,s<sup>n</sup> ∈ T (F, <sup>V</sup>)*).*

*We define* I*-computability for an RC-set* I *by induction on types. For* s <sup>∈</sup> <sup>T</sup> (F, <sup>V</sup>)*, we say that* s *is* I*-computable if either* s *is of base type and* s <sup>∈</sup> I*; or* s : σ <sup>→</sup> τ *and for all* t : σ *that are* I*-computable,* s t *is* I*-computable.*

The traditional notion of computability is obtained by taking for I the set of all terminating base-type terms. Then, a term s is computable if and only if (a) s has base type and is terminating; or (b) s : σ <sup>→</sup> τ and for all computable t : σ the term s t is computable. This choice is simple but, for reasoning, not ideal: we do not have a property like: "if <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> is computable then so is each <sup>s</sup><sup>i</sup>". Such a property would be valuable to have for generalising termination proofs from first-order to higher-order rewriting, as it allows us to use computability where the first-order proof uses termination. While it is not possible to define a computability notion with this property alongside case (b) (as such a notion would not be well-founded), we can come *close* to this property by choosing a different set for I. To define this set, we will use the notion of *accessible arguments*, which is used for the same purpose also in the *General Schema* [8], the *Computability Path Ordering* [9], and the *Computability Closure* [7].

**Definition 11 (Accessible arguments).** *We fix a quasi-ordering* <sup>S</sup> *on* S *with well-founded strict part* <sup>S</sup> := <sup>S</sup> \ <sup>S</sup> *.* <sup>1</sup> *For a type* <sup>σ</sup> <sup>≡</sup> <sup>σ</sup><sup>1</sup>→...→σ<sup>m</sup> <sup>→</sup><sup>κ</sup> *(with* κ ∈ S*) and sort* ι*, let* ι <sup>S</sup> <sup>+</sup> <sup>σ</sup> *if* <sup>ι</sup> <sup>S</sup> <sup>κ</sup> *and* <sup>ι</sup> <sup>S</sup> <sup>−</sup> <sup>σ</sup><sup>i</sup> *for all* <sup>i</sup>*, and let* ι <sup>S</sup> <sup>−</sup> <sup>σ</sup> *if* <sup>ι</sup> <sup>S</sup> <sup>κ</sup> *and* <sup>ι</sup> <sup>S</sup> <sup>+</sup> <sup>σ</sup><sup>i</sup> *for all* <sup>i</sup>*.* 2

*For* <sup>f</sup> : <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup> ∈ F*, let Acc*(f) = {<sup>i</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup> <sup>∧</sup> <sup>ι</sup> <sup>S</sup> <sup>+</sup> <sup>σ</sup><sup>i</sup>}*. For* <sup>x</sup> : <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup> ∈ V*, let Acc*(x) = {<sup>i</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup> <sup>∧</sup> <sup>σ</sup><sup>i</sup> *has the form* <sup>τ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>τ</sup><sup>n</sup> <sup>→</sup> <sup>κ</sup> *with* <sup>ι</sup> <sup>S</sup> <sup>κ</sup>}*. We write* <sup>s</sup> ☎acc <sup>t</sup> *if either* <sup>s</sup> <sup>=</sup> <sup>t</sup>*, or* <sup>s</sup> <sup>=</sup> λx.s *and* <sup>s</sup> ☎acc <sup>t</sup>*, or* <sup>s</sup> <sup>=</sup> a s<sup>1</sup> ··· <sup>s</sup><sup>n</sup> *with* <sup>a</sup> ∈F∪V *and* <sup>s</sup><sup>i</sup> ☎acc <sup>t</sup> *for some* <sup>i</sup> <sup>∈</sup> *Acc*(a) *with* a /<sup>∈</sup> *FV* (s<sup>i</sup>)*.*

With this definition, we will be able to define a set C such that, roughly, s is C-computable if and only if (a) s : σ <sup>→</sup> τ and s t is C-computable for all Ccomputable <sup>t</sup>, or (b) <sup>s</sup> has base type, is terminating, and if <sup>s</sup> <sup>=</sup> <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>m</sup> then <sup>s</sup><sup>i</sup> is <sup>C</sup>-computable for all *accessible* <sup>i</sup> (see Theorem <sup>13</sup> below). The reason that *Acc*(x) for <sup>x</sup> ∈ V is different is proof-technical: computability of λx.x s<sup>1</sup> ··· <sup>s</sup><sup>m</sup>

<sup>1</sup> Well-foundedness is immediate if <sup>S</sup> is finite, but we have not imposed that requirement.

<sup>2</sup> Here <sup>ι</sup> -S <sup>+</sup> σ corresponds to "ι occurs only positively in σ" in [5,8,9].

implies the computability of more arguments <sup>s</sup><sup>i</sup> than computability of <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>m</sup> does, since x can be instantiated by anything.

*Example 12.* Consider a quasi-ordering <sup>S</sup> such that ord <sup>S</sup> nat. In Example 8, we then have ord <sup>S</sup> <sup>+</sup> nat <sup>→</sup> ord. Thus, 1 <sup>∈</sup> *Acc*(lim), which gives lim <sup>H</sup>☎accH.

**Theorem 13.** *Let* (F, <sup>R</sup>) *be an AFSM. Let* <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>m</sup> <sup>I</sup> <sup>s</sup><sup>i</sup> <sup>t</sup><sup>1</sup> ···t<sup>n</sup> *if both sides have base type,* <sup>i</sup> <sup>∈</sup> *Acc*(f)*, and all* <sup>t</sup><sup>j</sup> *are* <sup>I</sup>*-computable. There is an RCset* C *such that* C <sup>=</sup> {s ∈ T (F, <sup>V</sup>) <sup>|</sup> s *has base type* <sup>∧</sup> s *is terminating under* <sup>⇒</sup><sup>R</sup> <sup>∪</sup> <sup>C</sup> <sup>∧</sup> *if* <sup>s</sup> <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>m</sup> *then* <sup>s</sup><sup>i</sup> *is* <sup>C</sup>*-computable for all* <sup>i</sup> <sup>∈</sup> *Acc*(f)}*.*

*Proof (sketch).* Note that we cannot *define* C as this set, as the set relies on the notion of C-computability. However, we *can* define C as the fixpoint of a monotone function operating on RC-sets. This follows the proof in, e.g., [8,9].

The complete proof is available in [17, Appendix A].

# **3 Restrictions**

The termination methodology in this paper is restricted to AFSMs that satisfy certain limitations: they must be *properly applied* (a restriction on the number of terms each function symbol is applied to) and *accessible function passing* (a restriction on the positions of variables of a functional type in the left-hand sides of rules). Both are syntactic restrictions that are easily checked by a computer (mostly; the latter requires a search for a sort ordering, but this is typically easy).

### **3.1 Properly Applied AFSMs**

In *properly applied AFSMs*, function symbols are assigned a certain, minimal number of arguments that they must always be applied to.

**Definition 14.** *An AFSM* (F, <sup>R</sup>) *is* properly applied *if for every* <sup>f</sup> ∈ D *there exists an integer* <sup>k</sup> *such that for all rules* <sup>⇒</sup> <sup>r</sup> ∈ R*: (1) if* <sup>=</sup> <sup>f</sup> <sup>1</sup> ··· <sup>n</sup> *then* n <sup>=</sup> k*; and (2) if* r <sup>f</sup> <sup>r</sup><sup>1</sup> ··· <sup>r</sup><sup>n</sup> *then* <sup>n</sup> <sup>≥</sup> <sup>k</sup>*. We denote minar* (f) = <sup>k</sup>*.*

That is, every occurrence of a function symbol in the *right-hand* side of a rule has at least as many arguments as the occurrences in the *left-hand* sides of rules. This means that partially applied functions are often not allowed: an AFSM with rules such as double X <sup>⇒</sup> plus X X and doublelist L <sup>⇒</sup> map double L is not properly applied, because double is applied to one argument in the left-hand side of some rule, and to zero in the right-hand side of another.

This restriction is not as severe as it may initially seem since partial applications can be replaced by λ-abstractions; e.g., the rules above can be made properly applied by replacing the second rule by: doublelist L <sup>⇒</sup> map (λx.double x) L. By using η-expansion, we can transform any AFSM to satisfy this restriction:

**Definition 15 (**R↑**).** *Given a set of rules* <sup>R</sup>*, let their* η-expansion *be given by* <sup>R</sup><sup>↑</sup> <sup>=</sup> {( Z<sup>1</sup> ···Zm)↑<sup>η</sup> <sup>⇒</sup> (r Z<sup>1</sup> ···Zm)↑η<sup>|</sup> <sup>⇒</sup> <sup>r</sup> ∈ R *with* <sup>r</sup> : <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup>*,* <sup>ι</sup> ∈ S*, and* <sup>Z</sup>1,...,Z<sup>m</sup> *fresh meta-variables*}*, where*


Note that <sup>↑</sup><sup>η</sup> is a pattern if is. By [29, Thm. 2.16], a relation <sup>⇒</sup><sup>R</sup> is terminating if ⇒R<sup>↑</sup> is terminating, which allows us to transpose any methods to prove termination of properly applied AFSMs to all AFSMs.

However, there is a caveat: this transformation can introduce non-termination in some special cases, e.g., the terminating rule <sup>f</sup> X <sup>⇒</sup> g f with <sup>f</sup> : <sup>o</sup> <sup>→</sup> <sup>o</sup> and <sup>g</sup> : (<sup>o</sup> <sup>→</sup> <sup>o</sup>) <sup>→</sup> <sup>o</sup>, whose η-expansion <sup>f</sup> X <sup>⇒</sup> <sup>g</sup> (λx.(<sup>f</sup> x)) is non-terminating. Thus, for a properly applied AFSM the methods in this paper apply directly. For an AFSM that is not properly applied, we can use the methods to prove *termination* (but not non-termination) by first η-expanding the rules. Of course, if this analysis leads to a *counterexample* for termination, we may still be able to verify whether this counterexample applies in the original, untransformed AFSM.

*Example 16.* Both AFSMs in Example 6 and the AFSM in Example 8 are properly applied.

*Example 17.* Consider an AFSM (F, <sup>R</sup>) with F⊇{sin, cos : real <sup>→</sup> real, times : real <sup>→</sup> real <sup>→</sup> real, deriv : (real <sup>→</sup> real) <sup>→</sup> real <sup>→</sup> real} and <sup>R</sup> <sup>=</sup> {deriv (λx.sin Fx) <sup>⇒</sup> λy.times (deriv (λx.Fx) <sup>y</sup>) (cos <sup>F</sup>y)}. Although the one rule has a functional output type (real → real), this AFSM is properly applied, with deriv having always at least 1 argument. Therefore, we do not need to use R<sup>↑</sup>. However, if R were to additionally include some rules that did not satisfy the restriction (such as the double and doublelist rules above), then η-expanding *all* rules, including this one, would be necessary. We have: <sup>R</sup><sup>↑</sup> <sup>=</sup> {deriv (λx.sin Fx) Y <sup>⇒</sup> (λy.times (deriv (λx.Fx) y) (cos Fy)) Y }. Note that the right-hand side of the η-expanded deriv rule is not β-normal.

#### **3.2 Accessible Function Passing AFSMs**

In *accessible function passing* AFSMs, variables of functional type may not occur at arbitrary places in the left-hand sides of rules: their positions are restricted using the sort ordering <sup>S</sup> and accessibility relation ☎acc from Definition 11.

**Definition 18 (Accessible function passing).** *An AFSM* (F, <sup>R</sup>) *is* accessible function passing (AFP) *if there exists a sort ordering* <sup>S</sup> *following Definition <sup>11</sup> such that: for all* <sup>f</sup> <sup>1</sup> ··· <sup>n</sup> <sup>⇒</sup> <sup>r</sup> ∈ R *and all* <sup>Z</sup> <sup>∈</sup> *FMV* (r)*: there are variables* <sup>x</sup><sup>1</sup>,...,x<sup>k</sup> *and some* <sup>i</sup> *such that* <sup>i</sup> ☎acc <sup>Z</sup>x<sup>1</sup>,...,x<sup>k</sup>*.*

The key idea of this definition is that computability of each <sup>i</sup> implies computability of all meta-variables in r. This excludes cases like Example <sup>20</sup> below. Many common examples satisfy this restriction, including those we saw before:

*Example 19.* Both systems from Example 6 are AFP: choosing the sort ordering <sup>S</sup> that equates nat and list, we indeed have cons H T ☎acc <sup>H</sup> and cons H T ☎acc <sup>T</sup> (as *Acc*(cons) = {1, <sup>2</sup>}) and both λx.Zx ☎acc <sup>Z</sup>x and <sup>Z</sup> ☎acc <sup>Z</sup>. The AFSM from Example <sup>8</sup> is AFP because we can choose ord <sup>S</sup> nat and have lim <sup>H</sup> ☎acc <sup>H</sup> following Example <sup>12</sup> (and also <sup>s</sup> <sup>X</sup> ☎acc <sup>X</sup> and <sup>K</sup> ☎acc K, F ☎acc F, G ☎acc <sup>G</sup>). The AFSM from Example <sup>17</sup> is AFP, because λx.sin <sup>F</sup>x ☎acc <sup>F</sup>x for any <sup>S</sup> : λx.sin <sup>F</sup>x ☎acc <sup>F</sup>x because sin <sup>F</sup>x ☎acc <sup>F</sup>x because 1 <sup>∈</sup> *Acc*(sin).

In fact, *all* first-order AFSMs (where all fully applied sub-meta-terms of the left-hand side of a rule have base type) are AFP via the sort ordering <sup>S</sup> that equates all sorts. Also (with the same sort ordering), an AFSM (F, <sup>R</sup>) is AFP if, for all rules <sup>f</sup> <sup>1</sup> ··· <sup>k</sup> <sup>⇒</sup> <sup>r</sup> ∈ R and all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, we can write: <sup>i</sup> <sup>=</sup> λx<sup>1</sup> ...x<sup>n</sup><sup>i</sup> . where <sup>n</sup><sup>i</sup> <sup>≥</sup> 0 and all fully applied sub-meta-terms of have base type.

This covers many practical systems, although for Example 8 we need a nontrivial sort ordering. Also, there are AFSMs that cannot be handled with *any* <sup>S</sup> .

*Example 20 (Encoding the untyped* λ*-calculus).* Consider an AFSM with F ⊇ {ap : <sup>o</sup> <sup>→</sup> <sup>o</sup> <sup>→</sup> <sup>o</sup>, lm : (<sup>o</sup> <sup>→</sup> <sup>o</sup>) <sup>→</sup> <sup>o</sup>} and <sup>R</sup> <sup>=</sup> {ap (lm F) <sup>⇒</sup> F} (note that the only rule has type o → o). This AFSM is not accessible function passing, because lm <sup>F</sup> ☎acc <sup>F</sup> cannot hold for any <sup>S</sup> (as this would require <sup>o</sup> <sup>S</sup> <sup>o</sup>).

Note that this example is also not terminating. With t <sup>=</sup> lm (λx.ap x x), we get this self-loop as evidence: ap t t <sup>⇒</sup><sup>R</sup> (λx.ap x x) <sup>t</sup> <sup>⇒</sup><sup>β</sup> ap t t.

Intuitively: in an accessible function passing AFSM, meta-variables of a higher type may occur only in "safe" places in the left-hand sides of rules. Rules like the ones in Example 20, where a higher-order meta-variable is lifted out of a base-type term, are not admitted (unless the base type is greater than the higher type).

In the remainder of this paper, we will refer to a *properly applied, accessible function passing* AFSM as a PA-AFP AFSM.

*Discussion:* This definition is strictly more liberal than the notions of "plain function passing" in both [34] and [46] as adapted to AFSMs. The notion in [46] largely corresponds to AFP if <sup>S</sup> equates all sorts, and the HRS formalism guarantees that rules are properly applied (in fact, all fully applied sub-metaterms of both left- and right-hand sides of rules have base type). The notion in [34] is more restrictive. The current restriction of PA-AFP AFSMs lets us handle examples like ordinal recursion (Example 8) which are not covered by [34,46]. However, note that [34,46] consider a different formalism, which does take rules whose left-hand side is not a pattern into account (which we do not consider). Our restriction also quite resembles the "admissible" rules in [6] which are defined using a pattern computability closure [5], but that work carries additional restrictions.

In later work [32,33], Kusakari extends the static DP approach to forms of polymorphic functional programming, with a very liberal restriction: the definition is parametrised with an *arbitrary* RC-set and corresponding accessibility ("safety") notion. Our AFP restriction is actually an instance of this condition (although a more liberal one than the example RC-set used in [32,33]). We have chosen a specific instance because it allows us to use dedicated techniques for the RC-set; for example, our *computable subterm criterion processor* (Theorem 63).

### **4 Static Higher-Order Dependency Pairs**

To obtain sufficient criteria for both termination and non-termination of AFSMs, we will now transpose the definition of static dependency pairs [6,33,34,46] to AFSMs. In addition, we will add the new features of *meta-variable conditions*, *formative reductions*, and *computable chains*. Complete versions of all proof sketches in this section are available in [17, Appendix B].

Although we retain the first-order terminology of dependency *pairs*, the setting with meta-variables makes it more suitable to define DPs as *triples*.

**Definition 21 ((Static) Dependency Pair).** *A* dependency pair (DP) *is a triple* <sup>p</sup> (A)*, where is a closed pattern* <sup>f</sup> <sup>1</sup> ··· <sup>k</sup>*,* <sup>p</sup> *is a closed meta-term* <sup>g</sup> <sup>p</sup><sup>1</sup> ··· <sup>p</sup><sup>n</sup>*, and* <sup>A</sup> *is a set of* meta-variable conditions*: pairs* <sup>Z</sup> : <sup>i</sup> *indicating that* Z *regards its* i *th argument. A DP is* conservative *if FMV* (p) <sup>⊆</sup> *FMV* ()*.*

*A substitution* γ respects *a set of meta-variable conditions* A *if for all* Z : i *in* <sup>A</sup> *we have* <sup>γ</sup>(Z) = λx<sup>1</sup> ...x<sup>j</sup> .t *with either* i>j*, or* <sup>i</sup> <sup>≤</sup> <sup>j</sup> *and* <sup>x</sup><sup>i</sup> <sup>∈</sup> *FV* (t)*. DPs will be used only with substitutions that respect their meta-variable conditions.*

*For* p (∅) *(so a DP whose set of meta-variable conditions is empty), we often omit the third component and just write* p*.*

Like the first-order setting, the static DP approach employs *marked function symbols* to obtain meta-terms whose instances cannot be reduced at the root.

**Definition 22 (Marked symbols).** *Let* (F, <sup>R</sup>) *be an AFSM. Define* <sup>F</sup> := F{f : <sup>σ</sup> <sup>|</sup> <sup>f</sup> : <sup>σ</sup> ∈ D}*. For a meta-term* <sup>s</sup> <sup>=</sup> <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>k</sup> *with* <sup>f</sup> ∈ D *and* <sup>k</sup> <sup>=</sup> *minar* (f)*, we let* <sup>s</sup> <sup>=</sup> <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>k</sup>*; for* <sup>s</sup> *of other forms* <sup>s</sup> *is not defined.*

Moreover, we will consider *candidates*. In the first-order setting, candidate terms are subterms of the right-hand sides of rules whose root symbol is a defined symbol. Intuitively, these subterms correspond to function calls. In the current setting, we have to consider also meta-variables as well as rules whose right-hand side is not β-normal (which might arise for instance due to η-expansion).

**Definition 23 (**β**-reduced-sub-meta-term,** ☎β*,* ☎A**).** *A meta-term* <sup>s</sup> *has a fully applied* <sup>β</sup>-reduced-sub-meta-term <sup>t</sup> *(shortly,* BRSMT*), notation* <sup>s</sup> ☎<sup>β</sup> <sup>t</sup>*, if there exists a set of meta-variable conditions* <sup>A</sup> *with* <sup>s</sup>☎<sup>A</sup> <sup>t</sup>*. Here* <sup>s</sup>☎<sup>A</sup> <sup>t</sup> *holds if:*

*–* s <sup>=</sup> t*, or –* <sup>s</sup> <sup>=</sup> λx.u *and* <sup>u</sup> ☎<sup>A</sup> <sup>t</sup>*, or* *–* <sup>s</sup> = (λx.u) <sup>s</sup><sup>0</sup> ··· <sup>s</sup><sup>n</sup> *and some* <sup>s</sup><sup>i</sup> ☎<sup>A</sup> <sup>t</sup>*, or* <sup>u</sup>[<sup>x</sup> := <sup>s</sup>0] <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> ☎<sup>A</sup> <sup>t</sup>*, or –* <sup>s</sup> <sup>=</sup> a s<sup>1</sup> ··· <sup>s</sup><sup>n</sup> *with* <sup>a</sup> ∈F∪V *and some* <sup>s</sup><sup>i</sup> ☎<sup>A</sup> <sup>t</sup>*, or –* <sup>s</sup> <sup>=</sup> <sup>Z</sup>t1,...,tk <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> *and some* <sup>s</sup><sup>i</sup> ☎<sup>A</sup> <sup>t</sup>*, or –* <sup>s</sup> <sup>=</sup> <sup>Z</sup>t1,...,tk <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> *and* <sup>t</sup><sup>i</sup>☎<sup>A</sup> <sup>t</sup> *for some* <sup>i</sup> ∈ {1,...,k} *with* (<sup>Z</sup> : <sup>i</sup>) <sup>∈</sup> <sup>A</sup>*.*

Essentially, <sup>s</sup> ☎<sup>A</sup> <sup>t</sup> means that <sup>t</sup> can be reached from <sup>s</sup> by taking <sup>β</sup>-reductions at the root and "subterm"-steps, where Z : i is in A whenever we pass into argument i of a meta-variable Z. BRSMTs are used to generate *candidates*:

**Definition 24 (Candidates).** *For a meta-term* s*, the set* cand(s) *of* candidates of s *consists of those pairs* <sup>t</sup> (A) *such that (a)* <sup>t</sup> *has the form* <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>k</sup> *with* <sup>f</sup> ∈ D *and* <sup>k</sup> <sup>=</sup> *minar* (f)*, and (b) there are* <sup>s</sup><sup>k</sup>+1,...,s<sup>n</sup> *(with* <sup>n</sup> <sup>≥</sup> <sup>k</sup>*) such that* <sup>s</sup> ☎<sup>A</sup> t s<sup>k</sup>+1 ··· <sup>s</sup><sup>n</sup>*, and (c)* <sup>A</sup> *is minimal: there is no subset* <sup>A</sup> - A *with* s ☎<sup>A</sup> <sup>t</sup>*.*

*Example 25.* In AFSMs where all meta-variables have arity 0 and the righthand sides of rules are β-normal, the set cand(s) for a meta-term s consists exactly of the pairs <sup>t</sup> (∅) where <sup>t</sup> has the form <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup>*minar*(f) and <sup>t</sup> occurs as part of s. In Example 8, we thus have cand(G H (λm.rec (H m) KFG)) = { rec (H m) KFG (∅) }.

If some of the meta-variables *do* take arguments, then the meta-variable conditions matter: candidates of s are pairs t (A) where A contains exactly those pairs Z : i for which we pass through the i th argument of Z to reach t in s.

*Example 26.* Consider an AFSM with the signature from Example 8 but a rule using meta-variables with larger arities:

rec (lim (λn.Hn)) K (λx.λn.Fx, n) (λf.λg.Gf,g) <sup>⇒</sup> Gλn.Hn, λm.rec Hm K (λx.λn.Fx, n) (λf.λg.Gf,g)

The right-hand side has one candidate:

rec Hm K (λx.λn.Fx, n) (λf.λg.Gf,g) ({G : 2})

The original static approaches define DPs as pairs <sup>p</sup> where <sup>⇒</sup> <sup>r</sup> is a rule and <sup>p</sup> a subterm of <sup>r</sup> of the form <sup>f</sup> <sup>r</sup><sup>1</sup> ··· <sup>r</sup><sup>m</sup> – as their rules are built using terms, not meta-terms. This can set variables bound in r free in p. In the current setting, we use candidates with their meta-variable conditions and implicit βsteps rather than subterms, and we replace such variables by meta-variables.

**Definition 27 (***SDP***).** *Let* s *be a meta-term and* (F, <sup>R</sup>) *be an AFSM. Let metafy*(s) *denote* s *with all free variables replaced by corresponding metavariables. Now SDP*(R) = { *metafy*(p) (A) <sup>|</sup> <sup>⇒</sup> <sup>r</sup> ∈ R∧<sup>p</sup> (A) <sup>∈</sup> cand(r)}*.*

Although static DPs always have a pleasant form <sup>f</sup> <sup>1</sup> ··· <sup>k</sup> <sup>g</sup> <sup>p</sup><sup>1</sup> ··· <sup>p</sup><sup>n</sup> (A) (as opposed to the *dynamic* DPs of, e.g., [31], whose right-hand sides can have a meta-variable at the head, which complicates various techniques in the framework), they have two important complications not present in firstorder DPs: the right-hand side p of a DP p (A) may contain meta-variables that do not occur in the left-hand side – traditional analysis techniques are not really equipped for this – and the left- and right-hand sides may have different types. In Sect. 5 we will explore some methods to deal with these features.

*Example 28.* For the non-η-expanded rules of Example 17, the set *SDP*(R) has one element: deriv (λx.sin <sup>F</sup>x) deriv (λx.Fx). (As times and cos are not defined symbols, they do not generate dependency pairs.) The set *SDP*(R<sup>↑</sup>) for the η-expanded rules is {deriv (λx.sin <sup>F</sup>x) <sup>Y</sup> deriv (λx.Fx) <sup>Y</sup> }. To obtain the relevant candidate, we used the β-reduction step of BRSMTs.

*Example 29.* The AFSM from Example 8 is AFP following Example 19; here *SDP*(R) is:

$$\begin{array}{c} \mathsf{rec}^{\sharp} \left( \mathsf{s} \ X \right) \ K \ F \ G \Rightarrow \mathsf{rec}^{\sharp} \left( \begin{array}{c} X \ K \ F \ G \ \left( \emptyset \right) \end{array} \right) \\\mathsf{rec}^{\sharp} \left( \mathsf{1im} \ H \right) \ K \ F \ G \Rightarrow \mathsf{rec}^{\sharp} \left( \begin{array}{c} H \ M \end{array} \right) \ K \ F \ G \ \left( \emptyset \right) \end{array}$$

Note that the right-hand side of the second DP contains a meta-variable that is not on the left. As we will see in Example 64, that is not problematic here.

Termination analysis using dependency pairs importantly considers the notion of a *dependency chain*. This notion is fairly similar to the first-order setting:

**Definition 30 (Dependency chain).** *Let* P *be a set of DPs and* R *a set of rules. A (finite or infinite)* (P, <sup>R</sup>)-dependency chain *(or just* (P, <sup>R</sup>)*-chain) is a sequence* [(<sup>0</sup> <sup>p</sup><sup>0</sup> (A<sup>0</sup>), s<sup>0</sup>, t<sup>0</sup>),(<sup>1</sup> <sup>p</sup><sup>1</sup> (A<sup>1</sup>), s<sup>1</sup>, t<sup>1</sup>),...] *where each* <sup>i</sup> <sup>p</sup><sup>i</sup> (A<sup>i</sup>) ∈ P *and all* <sup>s</sup><sup>i</sup>, t<sup>i</sup> *are terms, such that for all* <sup>i</sup>*:*


*Example 31.* In the (first) AFSM from Example 6, we have *SDP*(R) = {map (λx.Zx)(cons H T) map (λx.Zx) T}. An example of a finite dependency chain is [(ρ, s<sup>1</sup>, t<sup>1</sup>),(ρ, s<sup>2</sup>, t<sup>2</sup>)] where ρ is the one DP, <sup>s</sup><sup>1</sup> <sup>=</sup> map (λx.<sup>s</sup> <sup>x</sup>) (cons 0 (cons (s 0) (map (λx.x) nil))) and <sup>t</sup><sup>1</sup> <sup>=</sup> map (λx.<sup>s</sup> <sup>x</sup>) (cons (s 0) (map (λx.x) nil)) and <sup>s</sup><sup>2</sup> <sup>=</sup> map (λx.<sup>s</sup> <sup>x</sup>) (cons (s 0) nil) and <sup>t</sup><sup>2</sup> <sup>=</sup> map (λx.<sup>s</sup> <sup>x</sup>) nil.

Note that here <sup>t</sup><sup>1</sup> reduces to <sup>s</sup><sup>2</sup> in a single step (map (λx.x) nil <sup>⇒</sup><sup>R</sup> nil).

We have the following key result:

**Theorem 32.** *Let* (F, <sup>R</sup>) *be a PA-AFP AFSM. If* (F, <sup>R</sup>) *is non-terminating, then there is an infinite* (*SDP*(R), <sup>R</sup>)*-dependency chain.*

*Proof (sketch).* The proof is an adaptation of the one in [34], altered for the more permissive definition of *accessible function passing* over *plain function passing* as well as the meta-variable conditions; it also follows from Theorem 37 below.

By this result we can use dependency pairs to prove termination of a given properly applied and AFP AFSM: if we can prove that there is no infinite (*SDP*(R), <sup>R</sup>)-chain, then termination follows immediately. Note, however, that the reverse result does *not* hold: it is possible to have an infinite (*SDP*(R), <sup>R</sup>) dependency chain even for a terminating PA-AFP AFSM.

*Example 33.* Let F⊇{0, <sup>1</sup> : nat, <sup>f</sup> : nat <sup>→</sup> nat, <sup>g</sup> : (nat <sup>→</sup> nat) <sup>→</sup> nat} and <sup>R</sup> <sup>=</sup> {f 0 <sup>⇒</sup> <sup>g</sup> (λx.<sup>f</sup> x), <sup>g</sup> (λx.Fx) <sup>⇒</sup> F1}. This AFSM is PA-AFP, with *SDP*(R) = {f <sup>0</sup> <sup>g</sup> (λx.<sup>f</sup> <sup>x</sup>), <sup>f</sup> <sup>0</sup> <sup>f</sup> <sup>X</sup>}; the second rule does not cause the addition of any dependency pairs. Although ⇒<sup>R</sup> is terminating, there is an infinite (*SDP*(R), <sup>R</sup>)-chain [(f <sup>0</sup> <sup>f</sup> X, <sup>f</sup> <sup>0</sup>, <sup>f</sup> <sup>0</sup>),(f <sup>0</sup> <sup>f</sup> X, <sup>f</sup> <sup>0</sup>, <sup>f</sup> <sup>0</sup>),...].

The problem in Example <sup>33</sup> is the *non-conservative* DP <sup>f</sup> <sup>0</sup> <sup>f</sup> X, with X on the right but not on the left. Such DPs arise from *abstractions* in the right-hand sides of rules. Unfortunately, abstractions are introduced by the restricted η-expansion (Definition 15) that we may need to make an AFSM properly applied. Even so, often all DPs are conservative, like Examples 6 and 17. There, we do have the inverse result:

**Theorem 34.** *For any AFSM* (F, <sup>R</sup>)*: if there is an infinite* (*SDP*(R), <sup>R</sup>)*-chain* [(ρ<sup>0</sup>, s<sup>0</sup>, t<sup>0</sup>),(ρ<sup>1</sup>, s<sup>1</sup>, t<sup>1</sup>),...] *with all* <sup>ρ</sup><sup>i</sup> *conservative, then* <sup>⇒</sup><sup>R</sup> *is non-terminating.*

*Proof (sketch).* If *FMV* (p<sup>i</sup>) <sup>⊆</sup> *FMV* (<sup>i</sup>), then we can see that <sup>s</sup><sup>i</sup> <sup>⇒</sup><sup>R</sup> · ⇒<sup>∗</sup> β t <sup>i</sup> for some term t <sup>i</sup> of which <sup>t</sup><sup>i</sup> is a subterm. Since also each <sup>t</sup><sup>i</sup> <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>s</sup><sup>i</sup>+1, the infinite chain induces an infinite reduction <sup>s</sup><sup>0</sup> <sup>⇒</sup><sup>+</sup> <sup>R</sup> t <sup>0</sup> ⇒<sup>∗</sup> <sup>R</sup> <sup>s</sup> <sup>1</sup> <sup>⇒</sup><sup>+</sup> <sup>R</sup> t <sup>1</sup> ⇒<sup>∗</sup> <sup>R</sup> ... .

The core of the dependency pair *framework* is to systematically simplify a set of pairs (P, <sup>R</sup>) to prove either absence or presence of an infinite (P, <sup>R</sup>)-chain, thus showing termination or non-termination as appropriate. By Theorems 32 and 34 we can do so, although with some conditions on the non-termination result. We can do better by tracking certain properties of dependency chains.

**Definition 35 (Minimal and Computable chains).** *Let* (F, <sup>U</sup>) *be an AFSM and* <sup>C</sup><sup>U</sup> *an RC-set satisfying the properties of Theorem <sup>13</sup> for* (F, <sup>U</sup>)*. Let* <sup>F</sup> *contain, for every type* σ*, at least countably many symbols* <sup>f</sup> : σ *not used in* <sup>U</sup>*.*

*<sup>A</sup>* (P, <sup>R</sup>)*-chain* [(ρ<sup>0</sup>, s<sup>0</sup>, t<sup>0</sup>),(ρ<sup>1</sup>, s<sup>1</sup>, t<sup>1</sup>),...] *is* <sup>U</sup>-computable *if:* <sup>⇒</sup><sup>U</sup> ⊇ ⇒R*, and for all* <sup>i</sup> <sup>∈</sup> <sup>N</sup> *there exists a substitution* <sup>γ</sup><sup>i</sup> *such that* <sup>ρ</sup><sup>i</sup> <sup>=</sup> <sup>i</sup> <sup>p</sup><sup>i</sup> (A<sup>i</sup>) *with* <sup>s</sup><sup>i</sup> <sup>=</sup> <sup>i</sup>γ<sup>i</sup> *and* <sup>t</sup><sup>i</sup> <sup>=</sup> <sup>p</sup><sup>i</sup>γ<sup>i</sup>*, and* (λx<sup>1</sup> ...x<sup>n</sup>.v)γ<sup>i</sup> *is* <sup>C</sup><sup>U</sup> *-computable for all* <sup>v</sup> *and* <sup>B</sup> *such that* <sup>p</sup><sup>i</sup> ☎<sup>B</sup> <sup>v</sup>*,* <sup>γ</sup><sup>i</sup> *respects* <sup>B</sup>*, and FV* (v) = {x<sup>1</sup>,...,x<sup>n</sup>}*.*

*A chain is* minimal *if the strict subterms of all* <sup>t</sup><sup>i</sup> *are terminating under* <sup>⇒</sup>R*.*

In the first-order DP framework, *minimal* chains give access to several powerful techniques to prove absence of infinite chains, such as the *subterm criterion* [24] and *usable rules* [22,24]. *Computable* chains go a step further, by building on the computability inherent in the proof of Theorem 32 and the notion of *accessible function passing* AFSMs. In computable chains, we can require that (some of) the subterms of all <sup>t</sup><sup>i</sup> are *computable* rather than merely *terminating*. This property will be essential in the *computable subterm criterion processor* (Theorem 63).

Another property of dependency chains is the use of *formative rules*, which has proven very useful for dynamic DPs [31]. Here we go further and consider *formative reductions*, which were introduced for the first-order DP framework in [16]. This property will be essential in the *formative rules processor* (Theorem 58).

**Definition 36 (Formative chain, formative reduction).** *<sup>A</sup>* (P, <sup>R</sup>)*-chain* [(<sup>0</sup> <sup>p</sup><sup>0</sup> (A<sup>0</sup>), s<sup>0</sup>, t<sup>0</sup>),(<sup>1</sup> <sup>p</sup><sup>1</sup> (A<sup>1</sup>), s<sup>1</sup>, t<sup>1</sup>),...] *is* formative *if for all* <sup>i</sup>*, the reduction* <sup>t</sup><sup>i</sup> <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>s</sup><sup>i</sup>+1 *is* <sup>i</sup>+1*-formative. Here, for a pattern , substitution* <sup>γ</sup> *and term* s*, a reduction* s <sup>⇒</sup><sup>∗</sup> <sup>R</sup> γ *is* -formative *if one of the following holds:*


The idea of a formative reduction is to avoid redundant steps: if s <sup>⇒</sup><sup>∗</sup> R γ by an -formative reduction, then this reduction takes only the steps needed to obtain an instance of . Suppose that we have rules plus 0 Y <sup>⇒</sup> Y, plus (<sup>s</sup> X) Y <sup>⇒</sup> <sup>s</sup> (plus X Y ). Let := g 0 X and t := plus 0 0. Then the reduction <sup>g</sup> t t <sup>⇒</sup><sup>R</sup> g 0 <sup>t</sup> is -formative: we must reduce the first argument to get an instance of . The reduction <sup>g</sup> t t <sup>⇒</sup><sup>R</sup> <sup>g</sup> <sup>t</sup> <sup>0</sup> <sup>⇒</sup><sup>R</sup> g00 is not -formative, because the reduction in the second argument does not contribute to the nonmeta-variable positions of . This matters when we consider as the left-hand side of a rule, say g 0 <sup>X</sup> <sup>⇒</sup> <sup>0</sup>: if we reduce <sup>g</sup> t t <sup>⇒</sup><sup>R</sup> <sup>g</sup> <sup>t</sup> <sup>0</sup> <sup>⇒</sup><sup>R</sup> g00 <sup>⇒</sup><sup>R</sup> <sup>0</sup>, then the first step was redundant: removing this step gives a shorter reduction to the same result: <sup>g</sup> t t <sup>⇒</sup><sup>R</sup> g 0 <sup>t</sup> <sup>⇒</sup><sup>R</sup> <sup>0</sup>. In an infinite reduction, redundant steps may also be postponed indefinitely.

We can now strengthen the result of Theorem 32 with two new properties.

**Theorem 37.** *Let* (F, <sup>R</sup>) *be a* properly applied*,* accessible function passing *AFSM. If* (F, <sup>R</sup>) *is non-terminating, then there is an infinite* <sup>R</sup>*-computable formative* (*SDP*(R), <sup>R</sup>)*-dependency chain.*

*Proof (sketch).* We select a *minimal non-computable (MNC)* term <sup>s</sup> := <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>k</sup> (where all <sup>s</sup><sup>i</sup> are <sup>C</sup><sup>R</sup>-computable) and an infinite reduction starting in <sup>s</sup>. Then we stepwise build an infinite dependency chain, as follows. Since s is non-computable but each <sup>s</sup><sup>i</sup> terminates (as computability implies termination), there exist a rule <sup>f</sup> <sup>1</sup> ··· <sup>k</sup> <sup>⇒</sup> <sup>r</sup> and substitution <sup>γ</sup> such that each <sup>s</sup><sup>i</sup> <sup>⇒</sup><sup>∗</sup> <sup>R</sup> i<sup>γ</sup> and rγ is noncomputable. We can then identify a candidate t (A) of r such that γ respects A and tγ is a MNC subterm of rγ; we continue the process with tγ (or a term at its head). For the *formative* property, we note that if s <sup>⇒</sup><sup>∗</sup> <sup>R</sup> γ and <sup>u</sup> is terminating, then u <sup>⇒</sup><sup>∗</sup> <sup>R</sup> δ by an -formative reduction for substitution <sup>δ</sup> such that each δ(Z) <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>γ</sup>(Z). This follows by postponing those reduction steps not needed to obtain an instance of . The resulting infinite chain is <sup>R</sup>-computable because we can show, by induction on the definition of ☎acc, that if <sup>⇒</sup> <sup>r</sup> is an AFP rule and γ is a MNC term, then γ(Z) is C<sup>R</sup>-computable for all Z <sup>∈</sup> *FMV* (r).

As it is easily seen that all <sup>C</sup><sup>U</sup> -computable terms are <sup>⇒</sup><sup>U</sup> -terminating and therefore <sup>⇒</sup>R-terminating, every <sup>U</sup>-computable (P, <sup>R</sup>)-dependency chain is also minimal. The notions of R-computable and formative chains still do not suffice to obtain a true inverse result, however (i.e., to prove that termination implies the absence of an infinite R-computable chain over *SDP*(R)): the infinite chain in Example 33 is R-computable.

To see why the two restrictions that the AFSM must be *properly applied* and *accessible function passing* are necessary, consider the following examples.

*Example 38.* Consider F⊇{fix : ((o → o) → o → o) → o → o} and R = {fix F X <sup>⇒</sup> F (fix F) X}. This AFSM is not properly applied; it is also not terminating, as can be seen by instantiating F with λy.y. However, it does not have any static DPs, since fix F is not a candidate. Even if we altered the definition of static DPs to admit a dependency pair fix F X fix <sup>F</sup>, this pair could not be used to build an infinite dependency chain.

Note that the problem does not arise if we study the η-expanded rules <sup>R</sup><sup>↑</sup> <sup>=</sup> {fix F X <sup>⇒</sup> F (λz.fix F z) X}, as the dependency pair fix F X fix F Z does admit an infinite chain. Unfortunately, as the one dependency pair does not satisfy the conditions of Theorem 34, we cannot use this to prove nontermination.

*Example 39.* The AFSM from Example 20 is not accessible function passing, since *Acc*(lm) = ∅. This is good because the set *SDP*(R) is empty, which would lead us to falsely conclude termination without the restriction.

*Discussion:* Theorem 37 transposes the work of [34,46] to AFSMs and extends it by using a more liberal restriction, by limiting interest to *formative*, R*computable* chains, and by including meta-variable conditions. Both of these new properties of chains will support new termination techniques within the DP framework.

The relationship with the works for functional programming [32,33] is less clear: they define a different form of chains suited well to polymorphic systems, but which requires more intricate reasoning for non-polymorphic systems, as DPs can be used for reductions at the head of a term. It is not clear whether there are non-polymorphic systems that can be handled with one and not the other. The notions of formative and R-computable chains are not considered there; meta-variable conditions are not relevant to their λ-free formalism.

### **5 The Static Higher-Order DP Framework**

In first-order term rewriting, the DP *framework* [20] is an extendable framework to prove termination and non-termination. As observed in the introduction, DP analyses in higher-order rewriting typically go beyond the initial DP *approach* [2], but fall short of the full *framework*. Here, we define the latter for static DPs. Complete versions of all proof sketches in this section are in [17, Appendix C].

We have now reduced the problem of termination to non-existence of certain chains. In the DP framework, we formalise this in the notion of a *DP problem*:

**Definition 40 (DP problem).** *<sup>A</sup>* DP problem *is a tuple* (P, <sup>R</sup>, m, f) *with* <sup>P</sup> *a set of DPs,* <sup>R</sup> *a set of rules,* <sup>m</sup> ∈ {minimal, arbitrary}∪{computable<sup>U</sup> <sup>|</sup> *any set of rules* U}*, and* f ∈ {formative, all}*.* 3

*A DP problem* (P, <sup>R</sup>, m, f) *is* finite *if there exists no infinite* (P, <sup>R</sup>)*-chain that is* <sup>U</sup>*-computable if* <sup>m</sup> <sup>=</sup> computable<sup>U</sup> *, is minimal if* <sup>m</sup> <sup>=</sup> minimal*, and is formative if* f <sup>=</sup> formative*. It is* infinite *if* <sup>R</sup> *is non-terminating, or if there exists an infinite* (P, <sup>R</sup>)*-chain where all DPs used in the chain are conservative.*

*To capture the levels of permissiveness in the* m *flag, we use a transitivereflexive relation generated by* computable<sup>U</sup> minimal arbitrary*.*

Thus, the combination of Theorems 34 and 37 can be rephrased as: an AFSM (F, <sup>R</sup>) is terminating if (*SDP*(R), <sup>R</sup>, computable<sup>R</sup>, formative) is finite, and is non-terminating if (*SDP*(R), <sup>R</sup>, m, f) is infinite for some m <sup>∈</sup> {computable<sup>U</sup> , minimal, arbitrary} and <sup>f</sup> ∈ {formative, all}. 4

The core idea of the DP framework is to iteratively simplify a set of DP problems via *processors* until nothing remains to be proved:

**Definition 41 (Processor).** *A* dependency pair processor *(or just* processor*) is a function that takes a DP problem and returns either NO or a set of DP problems. A processor Proc is* sound *if a DP problem* M *is finite whenever Proc*(M) <sup>=</sup> *NO and all elements of Proc*(M) *are finite. A processor Proc is* complete *if a DP problem* M *is infinite whenever Proc*(M) = *NO or contains an infinite element.*

To prove finiteness of a DP problem M with the DP framework, we proceed analogously to the first-order DP framework [22]: we repeatedly apply sound DP processors starting from M until none remain. That is, we execute the following rough procedure: (1) let A := {M}; (2) while A <sup>=</sup> <sup>∅</sup>: select a problem Q <sup>∈</sup> A and a sound processor *Proc* with *Proc*(Q) <sup>=</sup> NO, and let A := (A \ {Q}) <sup>∪</sup> *Proc*(Q). If this procedure terminates, then M is a finite DP problem.

<sup>3</sup> Our framework is implicitly parametrised by the signature <sup>F</sup> used for term formation. As none of the processors we present modify this component (as indeed there is no need to by Theorem 9), we leave it implicit.

<sup>4</sup> The processors in this paper do not *alter* the flag m, but some *require* minimality or computability. We include the minimal option and the subscript <sup>U</sup> for the sake of future generalisations, and for reuse of processors in the *dynamic* approach of [31].

To prove termination of an AFSM (F, <sup>R</sup>), we would use as initial DP problem (*SDP*(R), <sup>R</sup>, computableR, formative), provided that <sup>R</sup> is properly applied and accessible function passing (where η-expansion following Definition <sup>15</sup> may be applied first). If the procedure terminates – so finiteness of M is proved by the definition of soundness – then Theorem 37 provides termination of ⇒R.

Similarly, we can use the DP framework to prove infiniteness: (1) let A := {M}; (2) while A <sup>=</sup> NO: select a problem Q <sup>∈</sup> A and a complete processor *Proc*, and let A := NO if *Proc*(Q) = NO, or A := (A \ {Q}) <sup>∪</sup> *Proc*(Q) otherwise. For non-termination of (F, <sup>R</sup>), the initial DP problem should be (*SDP*(R), <sup>R</sup>, m, f), where m, f can be any flag (see Theorem 34). Note that the algorithms coincide while processors are used that are both sound *and* complete. In a tool, automation (or the user) must resolve the non-determinism and select suitable processors.

Below, we will present a number of processors within the framework. We will typically present processors by writing "for a DP problem M satisfying X, Y , Z, *Proc*(M) = ... ". In these cases, we let *Proc*(M) = {M} for any problem M not satisfying the given properties. Many more processors are possible, but we have chosen to present a selection which touches on all aspects of the DP framework:


### **5.1 The Dependency Graph**

We can leverage reachability information to *decompose* DP problems. In firstorder rewriting, a graph structure is used to track which DPs can possibly follow one another in a chain [2]. Here, we define this *dependency graph* as follows.

**Definition 42 (Dependency graph).** *A DP problem* (P, <sup>R</sup>, m, f) *induces a graph structure DG , called its* dependency graph*, whose nodes are the elements of* <sup>P</sup>*. There is a (directed) edge from* <sup>ρ</sup><sup>1</sup> *to* <sup>ρ</sup><sup>2</sup> *in DG iff there exist* <sup>s</sup><sup>1</sup>, t<sup>1</sup>, s<sup>2</sup>, t<sup>2</sup> *such that* [(ρ<sup>1</sup>, s<sup>1</sup>, t<sup>1</sup>),(ρ<sup>2</sup>, s<sup>2</sup>, t<sup>2</sup>)] *is a* (P, <sup>R</sup>)*-chain with the properties for* m, f*.*

*Example 43.* Consider an AFSM with F⊇{f : (nat → nat) → nat → nat} and <sup>R</sup> <sup>=</sup> {<sup>f</sup> (λx.Fx) (<sup>s</sup> Y ) <sup>⇒</sup> F<sup>f</sup> (λx.0) (<sup>f</sup> (λx.Fx) Y )}. Let <sup>P</sup> := *SDP*(R) =

$$\left\{ \begin{array}{l} (1) \ \mathtt{f}^{\sharp} \left( \lambda x.F \langle x \rangle \right) \left( \mathtt{s} \ Y \right) \Rightarrow \mathtt{f}^{\sharp} \left( \lambda x.\mathtt{0} \right) \left( \mathtt{f} \left( \lambda x.F \langle x \rangle \right) \right) Y \right) \left( \left\{ F:1 \right\} \right) \right\} \\ \left( 2 \right) \ \mathtt{f}^{\sharp} \left( \lambda x.F \langle x \rangle \right) \left( \mathtt{s} \ Y \right) \Rightarrow \mathtt{f}^{\sharp} \left( \lambda x.F \langle x \rangle \right) \left( \mathtt{Y} \qquad \left( \left\{ F:1 \right\} \right) \right) \end{array} \right\}$$

The dependency graph of (P, <sup>R</sup>, minimal, formative) is:

There is no edge from (1) to itself or (2) because there is no substitution γ such that (λx.0)γ can be reduced to a term (λx.Fx)δ where δ(F) regards its first argument (as ⇒<sup>∗</sup> <sup>R</sup> cannot introduce new variables).

In general, the dependency graph for a given DP problem is undecidable, which is why we consider *approximations*.

**Definition 44 (Dependency graph approximation** [31]**).** *A finite graph* <sup>G</sup><sup>θ</sup> approximates *DG if* θ *is a function that maps the nodes of DG to the nodes of* <sup>G</sup><sup>θ</sup> *such that, whenever DG has an edge from* <sup>ρ</sup><sup>1</sup> *to* <sup>ρ</sup><sup>2</sup>*,* <sup>G</sup><sup>θ</sup> *has an edge from* <sup>θ</sup>(ρ<sup>1</sup>) *to* <sup>θ</sup>(ρ<sup>2</sup>)*. (*G<sup>θ</sup> *may have edges that have no corresponding edge in DG .)*

Note that this definition allows for an *infinite* graph to be approximated by a *finite* one; infinite graphs may occur if R is infinite (e.g., the union of all simply-typed instances of polymorphic rules).

If <sup>P</sup> is finite, we can take a graph approximation <sup>G</sup>id with the same nodes as *DG*. A simple approximation may have an edge from <sup>1</sup> <sup>p</sup><sup>1</sup> (A<sup>1</sup>) to <sup>2</sup> <sup>p</sup><sup>2</sup> (A<sup>2</sup>) whenever both <sup>p</sup><sup>1</sup> and <sup>2</sup> have the form <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>k</sup> for the same <sup>f</sup> and k. However, one can also take the meta-variable conditions into account, as we did in Example 43.

**Theorem 45 (Dependency graph processor).** *The processor Proc*<sup>G</sup><sup>θ</sup> *that maps a DP problem* M = (P, <sup>R</sup>, m, f) *to* {({ρ ∈P| θ(ρ) <sup>∈</sup> C<sup>i</sup>}, <sup>R</sup>, m, f) <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>} *if* <sup>G</sup><sup>θ</sup> *is an approximation of the dependency graph of* <sup>M</sup> *and* <sup>C</sup><sup>1</sup>,...,C<sup>n</sup> *are the (nodes of the) non-trivial strongly connected components (SCCs) of* G<sup>θ</sup>*, is both sound and complete.*

*Proof (sketch).* In an infinite (P, <sup>R</sup>)-chain [(ρ<sup>0</sup>, s<sup>0</sup>, t<sup>0</sup>),(ρ<sup>1</sup>, s<sup>1</sup>, t<sup>1</sup>),...], there is always a path from <sup>ρ</sup><sup>i</sup> to <sup>ρ</sup><sup>i</sup>+1 in DG. Since <sup>G</sup><sup>θ</sup> is finite, every infinite path in DG eventually remains in a cycle in G<sup>θ</sup>. This cycle is part of an SCC.

*Example 46.* Let <sup>R</sup> be the set of rules from Example <sup>43</sup> and G be the graph given there. Then *Proc*G(*SDP*(R), <sup>R</sup>, computable<sup>R</sup>, formative) = {({f (λx.Fx) (<sup>s</sup> Y ) <sup>f</sup> (λx.Fx) <sup>Y</sup> ({<sup>F</sup> : 1})}, <sup>R</sup>, computable<sup>R</sup>, formative)}.

*Example 47.* Let <sup>R</sup> consist of the rules for map from Example <sup>6</sup> along with <sup>f</sup> L <sup>⇒</sup> map (λx.<sup>g</sup> <sup>x</sup>) <sup>L</sup> and <sup>g</sup> <sup>X</sup> <sup>⇒</sup> <sup>X</sup>. Then *SDP*(R) = {(1) map (λx.Zx) (cons H T) map (λx.Zx) T, (2) <sup>f</sup> <sup>L</sup> map (λx.<sup>g</sup> <sup>x</sup>) L, (3) <sup>f</sup> <sup>L</sup> <sup>g</sup> <sup>X</sup>}. DP (3) is not conservative, but it is not on any cycle in the graph approximation <sup>G</sup>id obtained by considering head symbols as described above:

As (1) is the only DP on a cycle, *ProcSDP*Gid (*SDP*(R), <sup>R</sup>, computable<sup>R</sup>, formative) = { ({(1)}, <sup>R</sup>, computable<sup>R</sup>, formative) }.

*Discussion:* The dependency graph is a powerful tool for simplifying DP problems, used since early versions of the DP approach [2]. Our notion of a dependency graph approximation, taken from [31], strictly generalises the original notion in [2], which uses a graph on the same node set as DG with possibly further edges. One can get this notion here by using a graph Gid. The advantage of our definition is that it ensures soundness of the dependency graph processor also for *infinite* sets of DPs. This overcomes a restriction in the literature [34, Corollary 5.13] to dependency graphs without non-cyclic infinite paths.

### **5.2 Processors Based on Reduction Triples**

At the heart of most DP-based approaches to termination proving lie wellfounded orderings to delete DPs (or rules). For this, we use *reduction triples* [24,31].

**Definition 48 (Reduction triple).** *<sup>A</sup>* reduction triple (, , ) *consists of two quasi-orderings and and a well-founded strict ordering on meta-terms such that is monotonic, all of* , , *are meta-stable (that is,* r *implies* γ rγ *if is a closed pattern and* γ *a substitution on domain FMV* () <sup>∪</sup> *FMV* (r)*, and the same for and ),* <sup>⇒</sup><sup>β</sup> <sup>⊆</sup> *, and both* ◦⊆ *and* ◦⊆*.*

In the first-order DP framework, the reduction pair processor [20] seeks to orient all rules with and all DPs with either or ; if this succeeds, those pairs oriented with may be removed. Using reduction *triples* rather than pairs, we obtain the following extension to the higher-order setting:

**Theorem 49 (Basic reduction triple processor).** *Let* <sup>M</sup> = (P<sup>1</sup> <sup>P</sup><sup>2</sup>, <sup>R</sup>, m, f) *be a DP problem. If* (, , ) *is a reduction triple such that*

*1. for all* <sup>⇒</sup> r ∈ R*, we have* r*; 2. for all* p (A) ∈ P1*, we have* <sup>p</sup>*; 3. for all* p (A) ∈ P2*, we have* <sup>p</sup>*;*

*then the processor that maps* M *to* {(P<sup>2</sup>, <sup>R</sup>, m, f)} *is both sound and complete.*

*Proof (sketch).* For an infinite (P<sup>1</sup> P<sup>2</sup>, <sup>R</sup>)-chain [(ρ<sup>0</sup>, s<sup>0</sup>, t<sup>0</sup>),(ρ<sup>1</sup>, s<sup>1</sup>, t<sup>1</sup>),...] the requirements provide that, for all <sup>i</sup>: (a) <sup>s</sup><sup>i</sup> <sup>t</sup><sup>i</sup> if <sup>ρ</sup><sup>i</sup> ∈ P1; (b) <sup>s</sup><sup>i</sup> <sup>t</sup><sup>i</sup> if <sup>ρ</sup><sup>i</sup> ∈ P2; and (c) <sup>t</sup><sup>i</sup> <sup>s</sup><sup>i</sup>+1. Since is well-founded, only finitely many DPs can be in <sup>P</sup>1, so a tail of the chain is actually an infinite (P<sup>2</sup>, <sup>R</sup>, m, f)-chain.

*Example 50.* Let (F, <sup>R</sup>) be the (non-η-expanded) rules from Example 17, and *SDP*(R) the DPs from Example 28. From Theorem 49, we get the following ordering requirements:

deriv (λx.sin Fx) λy.times (deriv (λx.Fx) y) (cos Fy) deriv (λx.sin Fx) deriv (λx.Fx)

We can handle both requirements by using a polynomial interpretation J to <sup>N</sup> [15,43], by choosing <sup>J</sup>sin(n) = <sup>n</sup> + 1, <sup>J</sup>cos(n) = 0, <sup>J</sup>times(n1, n2) = <sup>n</sup>1, <sup>J</sup>deriv(f) = <sup>J</sup>deriv (f) = λn.f(n). Then the requirements are evaluated to: λn.f(n)+1 <sup>≥</sup> λn.f(n) and λn.f(n)+1 > λn.f(n), which holds on <sup>N</sup>.

Theorem 49 is not ideal since, by definition, the left- and right-hand side of a DP may have different types. Such DPs are hard to handle with traditional techniques such as HORPO [26] or polynomial interpretations [15,43], as these methods compare only (meta-)terms of the same type (modulo renaming of sorts).

*Example 51.* Consider the toy AFSM with <sup>R</sup> <sup>=</sup> {<sup>f</sup> (<sup>s</sup> X) Y <sup>⇒</sup> <sup>g</sup> X Y, <sup>g</sup> X <sup>⇒</sup> λz.<sup>f</sup> X z} and *SDP*(R) = {f (<sup>s</sup> X) Y <sup>g</sup> X, <sup>g</sup> X <sup>f</sup> X Z}. If <sup>f</sup> and <sup>g</sup> both have a type nat → nat → nat, then in the first DP, the left-hand side has type nat while the right-hand side has type nat → nat. In the second DP, the left-hand side has type nat → nat and the right-hand side has type nat.

To be able to handle examples like the one above, we adapt [31, Thm. 5.21] by altering the ordering requirements to have base type.

**Theorem 52 (Reduction triple processor).** *Let* Bot *be a set* {⊥<sup>σ</sup> : <sup>σ</sup> <sup>|</sup> <sup>σ</sup> *a type*}⊆F *of unused constructors,* <sup>M</sup> = (P<sup>1</sup> P<sup>2</sup>, <sup>R</sup>, m, f) *a DP problem and* (, , ) *a reduction triple such that: (a) for all* <sup>⇒</sup> r ∈ R*, we have* <sup>r</sup>*; and (b) for all* <sup>p</sup> (A) ∈ P<sup>1</sup> P<sup>2</sup> *with* : <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup> *and* <sup>p</sup> : <sup>τ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>τ</sup><sup>n</sup> <sup>→</sup> <sup>κ</sup> *we have, for fresh meta-variables* <sup>Z</sup><sup>1</sup> : <sup>σ</sup><sup>1</sup>,...,Z<sup>m</sup> : <sup>σ</sup><sup>m</sup>*:*

*–* Z<sup>1</sup> ···Z<sup>m</sup> <sup>p</sup> <sup>⊥</sup><sup>τ</sup><sup>1</sup> ···⊥<sup>τ</sup><sup>n</sup> *if* <sup>p</sup> (A) ∈ P<sup>1</sup> *–* Z<sup>1</sup> ···Z<sup>m</sup> <sup>p</sup> <sup>⊥</sup><sup>τ</sup><sup>1</sup> ···⊥<sup>τ</sup><sup>n</sup> *if* <sup>p</sup> (A) ∈ P<sup>2</sup>

*Then the processor that maps* M *to* {(P<sup>2</sup>, <sup>R</sup>, m, f)} *is both sound and complete.*

*Proof (sketch).* If (, , ) is such a triple, then for R ∈ {, } define R as follows: for <sup>s</sup> : <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup> and <sup>t</sup> : <sup>τ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>τ</sup><sup>n</sup> <sup>→</sup> <sup>κ</sup>, let s R <sup>t</sup> if for all <sup>u</sup><sup>1</sup> : <sup>σ</sup><sup>1</sup>,...,u<sup>m</sup> : <sup>σ</sup><sup>m</sup> there exist <sup>w</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup>,...,w<sup>n</sup> : <sup>τ</sup><sup>n</sup> such that s u<sup>1</sup> ··· <sup>u</sup><sup>m</sup> Rtw<sup>1</sup> ··· <sup>w</sup><sup>n</sup>. Now apply Theorem <sup>49</sup> with the triple (, , ).

Here, the elements of Bot take the role of minimal terms for the ordering. We use them to flatten the type of the right-hand sides of ordering requirements, which makes it easier to use traditional methods to generate a reduction triple.

While and may still have to orient meta-terms of distinct types, these are always *base* types, which we could collapse to a single sort. The only relation required to be monotonic, , regards pairs of meta-terms of the *same* type. This makes it feasible to apply orderings like HORPO or polynomial interpretations.

Both the basic and non-basic reduction triple processor are difficult to use for *non-conservative* DPs, which generate ordering requirements whose right-hand side contains a meta-variable not occurring on the left. This is typically difficult for traditional techniques, although possible to overcome, by choosing triples that do not regard such meta-variables (e.g., via an argument filtering [35,46]): *Example 53.* We apply Theorem <sup>52</sup> on the DP problem (*SDP*(R), <sup>R</sup>, computableR, formative) of Example 51. This gives for instance the following ordering requirements:

$$\begin{array}{ccc} \mathbf{f} \ (\mathbf{s} \ X) \ Y \succeq \mathbf{g} \ X \ Y & \mathbf{f}^{\sharp} \ (\mathbf{s} \ X) \ Y \succ \mathbf{g}^{\sharp} \ X \ \mathsf{L}\_{\mathsf{nat}} \\ \mathbf{g} \ X \succeq \lambda z. \mathbf{f} \ X \ z & \mathbf{g}^{\sharp} \ X \ Y \succ \mathbf{f}^{\sharp} \ X \ Z \end{array}$$

The right-hand side of the last DP uses a meta-variable Z that does not occur on the left. As neither nor are required to be monotonic (only is), function symbols do not have to regard all their arguments. Thus, we can use a polynomial interpretation <sup>J</sup> to <sup>N</sup> with <sup>J</sup><sup>⊥</sup>nat = 0, <sup>J</sup>s(n) = <sup>n</sup> + 1 and <sup>J</sup>h(n<sup>1</sup>, n<sup>2</sup>) = <sup>n</sup><sup>1</sup> for <sup>h</sup> ∈ {f, <sup>f</sup>, <sup>g</sup>, <sup>g</sup>}. The ordering requirements then translate to <sup>X</sup> + 1 <sup>≥</sup> <sup>X</sup> and λy.X <sup>≥</sup> λz.X for the rules, and X + 1 > X and X <sup>≥</sup> X for the DPs. All these inequalities on N are clearly satisfied, so we can remove the first DP. The remaining problem is quickly dispersed with the dependency graph processor.

#### **5.3 Rule Removal Without Search for Orderings**

While processors often simplify only P, they can also simplify R. One of the most powerful techniques in first-order DP approaches that can do this are *usable rules*. The idea is that for a given set P of DPs, we only need to consider a *subset UR*(P, <sup>R</sup>) of <sup>R</sup>. Combined with the dependency graph processor, this makes it possible to split a large term rewriting system into a number of small problems.

In the higher-order setting, simple versions of usable rules have also been defined [31,46]. We can easily extend these definitions to AFSMs:

**Theorem 54.** *Given a DP problem* M = (P, <sup>R</sup>, m, f) *with* m minimal *and* <sup>R</sup> *finite, let UR*(P, <sup>R</sup>) *be the smallest subset of* <sup>R</sup> *such that:*


*Then the processor that maps* M *to* {(P, *UR*(P, <sup>R</sup>), arbitrary, all)} *is sound.*

For the proof we refer to the very similar proofs in [31,46].

*Example 55.* For the set *SDP*(R) of the ordinal recursion example (Examples 8 and 29), all rules are usable due to the occurrence of H M in the second DP. For the set *SDP*(R) of the map example (Examples 6 and 31), there are no usable rules, since the one DP contains no defined function symbols or applied meta-variables.

This higher-order processor is much less powerful than its first-order version: if any DP or usable rule has a sub-meta-term of the form F s or <sup>F</sup>s<sup>1</sup>,...,s<sup>k</sup> with <sup>s</sup><sup>1</sup>,...,s<sup>k</sup> not all distinct variables, then *all* rules are usable. Since applying a higher-order meta-variable to some argument is extremely common in higherorder rewriting, the technique is usually not applicable. Also, this processor imposes a heavy price on the flags: minimality (at least) is required, but is lost; the formative flag is also lost. Thus, usable rules are often combined with reduction triples to temporarily disregard rules, rather than as a way to permanently remove rules.

To address these weaknesses, we consider a processor that uses similar ideas to usable rules, but operates from the *left-hand* sides of rules and DPs rather than the right. This adapts the technique from [31] that relies on the new *formative* flag. As in the first-order case [16], we use a semantic characterisation of formative rules. In practice, we then work with over-approximations of this characterisation, analogous to the use of dependency graph approximations in Theorem 45.

**Definition 56.** *A function FR that maps a pattern and a set of rules* <sup>R</sup> *to a set FR*(, <sup>R</sup>) ⊆ R *is a* formative rules approximation *if for all* s *and* γ*: if* s <sup>⇒</sup><sup>∗</sup> <sup>R</sup> γ *by an -formative reduction, then this reduction can be done using only rules in FR*(, <sup>R</sup>)*.*

*We let FR*(P, <sup>R</sup>) = -{*FR*(<sup>i</sup>, <sup>R</sup>) <sup>|</sup> <sup>f</sup> <sup>1</sup> ··· <sup>n</sup> <sup>p</sup> (A) ∈ P∧ <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>}*.*

Thus, a formative rules approximation is a subset of R that is *sufficient* for a formative reduction: if s <sup>⇒</sup><sup>∗</sup> <sup>R</sup> γ, then <sup>s</sup> <sup>⇒</sup><sup>∗</sup> *FR*( ,R) γ. It is allowed for there to exist other formative reductions that do use additional rules.

*Example 57.* We define a simple formative rules approximation: (1) *FR*(Z, <sup>R</sup>) = <sup>∅</sup> if <sup>Z</sup> is a meta-variable; (2) *FR*(<sup>f</sup> <sup>1</sup> ··· <sup>m</sup>, <sup>R</sup>) = *FR*(<sup>1</sup>, <sup>R</sup>) ∪···∪ *FR*(<sup>m</sup>, <sup>R</sup>) if <sup>f</sup> : <sup>σ</sup><sup>1</sup> <sup>→</sup> ... <sup>→</sup> <sup>σ</sup><sup>m</sup> <sup>→</sup> <sup>ι</sup> and no rules have type <sup>ι</sup>; (3) *FR*(s, <sup>R</sup>) = <sup>R</sup> otherwise. This is a formative rules approximation: if s <sup>⇒</sup><sup>∗</sup> <sup>R</sup> Zγ by a <sup>Z</sup>-formative reduction, then s <sup>=</sup> Zγ, and if s <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>f</sup> <sup>1</sup> ··· <sup>m</sup> and no rules have the same output type as <sup>s</sup>, then <sup>s</sup> <sup>=</sup> <sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>m</sup> and each <sup>s</sup><sup>i</sup> <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>i</sup><sup>γ</sup> (by an <sup>i</sup>-formative reduction).

The following result follows directly from the definition of formative rules.

**Theorem 58 (Formative rules processor).** *For a formative rules approximation FR, the processor ProcFR that maps a DP problem* (P, <sup>R</sup>, m, formative) *to* {(P, *FR*(P, <sup>R</sup>), m, formative)} *is both sound and complete.*

*Proof (sketch).* A processor that only removes rules (or DPs) is always complete. For soundness, if the chain is formative then each step <sup>t</sup><sup>i</sup> <sup>⇒</sup><sup>∗</sup> <sup>R</sup> <sup>s</sup><sup>i</sup>+1 can be replaced by <sup>t</sup><sup>i</sup> <sup>⇒</sup><sup>∗</sup> *FR*(P,R) <sup>s</sup><sup>i</sup>+1. Thus, the chain can be seen as a (P, *FR*(P, <sup>R</sup>))-chain.

*Example 59.* For our ordinal recursion example (Examples 8 and 29), *none* of the rules are included when we use the approximation of Example 57 since all rules have output type ord. Thus, *ProcFR* maps (*SDP*(R), <sup>R</sup>, computable<sup>R</sup>, formative) to (*SDP*(R), <sup>∅</sup>, computable<sup>R</sup>, formative). *Note:* this example can also be completed without formative rules (see Example 64). Here we illustrate that, even with a simple formative rules approximation, we can often delete all rules of a given type.

Formative rules are introduced in [31], and the definitions can be adapted to a more powerful formative rules approximation than the one sketched in Example 59. Several examples and deeper intuition for the first-order setting are given in [16].

### **5.4 Subterm Criterion Processors**

Reduction triple processors are powerful, but they exert a computational price: we must orient all rules in R. The subterm criterion processor allows us to remove DPs without considering R at all. It is based on a *projection function* [24], whose higher-order counterpart [31,34,46] is the following:

**Definition 60.** *For* P *a set of DPs, let* heads(P) *be the set of all symbols* f *that occur as the head of a left- or right-hand side of a DP in* P*. A* projection function *for* <sup>P</sup> *is a function* ν : heads(P) <sup>→</sup> <sup>N</sup> *such that for all DPs* p (A) ∈ P*, the function* <sup>ν</sup> *with* <sup>ν</sup>(<sup>f</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup>) = <sup>s</sup><sup>ν</sup>(f) *is well-defined both for and for* <sup>p</sup>*.*

**Theorem 61 (Subterm criterion processor).** *The processor Proc*subcrit *that maps a DP problem* (P<sup>1</sup> P<sup>2</sup>, <sup>R</sup>, m, f) *with* <sup>m</sup> minimal *to* {(P<sup>2</sup>, <sup>R</sup>, m, f)} *if a projection function* <sup>ν</sup> *exists such that* <sup>ν</sup>() ✄ <sup>ν</sup>(p) *for all* <sup>p</sup> (A) ∈ P<sup>1</sup> *and* ν() = ν(p) *for all* p (A) ∈ P2*, is sound and complete.*

*Proof (sketch).* If the conditions are satisfied, every infinite (P, <sup>R</sup>)-chain induces an infinite ☎ · ⇒<sup>∗</sup> <sup>R</sup> sequence that starts in a strict subterm of <sup>t</sup><sup>1</sup>, contradicting minimality unless all but finitely many steps are equality. Since every occurrence of a pair in P<sup>1</sup> results in a strict ✄ step, a tail of the chain lies in P2.

*Example 62.* Using <sup>ν</sup>(map) = 2, *Proc*subcrit maps the DP problem ({(1)}, <sup>R</sup>, computable<sup>R</sup>, formative) from Example <sup>47</sup> to (∅, <sup>R</sup>, computable<sup>R</sup>, formative) .

The subterm criterion can be strengthened, following [34,46], to also handle DPs like the one in Example 28. Here, we focus on a new idea. For *computable* chains, we can build on the idea of the subterm criterion to get something more.

**Theorem 63 (Computable subterm criterion processor).** *The processor Proc*statcrit *that maps a DP problem* (P<sup>1</sup> P<sup>2</sup>, <sup>R</sup>, computable<sup>U</sup> , f) *to* {(P<sup>2</sup>, <sup>R</sup>, computable<sup>U</sup> , f)} *if a projection function* <sup>ν</sup> *exists such that* <sup>ν</sup>() ❂ <sup>ν</sup>(p) *for all* <sup>p</sup> (A) ∈ P<sup>1</sup> *and* <sup>ν</sup>() = <sup>ν</sup>(p) *for all* <sup>p</sup> (A) ∈ P2*, is sound and complete. Here,* ❂ *is the relation on base-type terms with* s ❂ t *if* s <sup>=</sup> t *and (a)* <sup>s</sup> ☎acc <sup>t</sup> *or (b) a meta-variable* <sup>Z</sup> *exists with* <sup>s</sup> ☎acc <sup>Z</sup>x<sup>1</sup>,...,x<sup>k</sup> *and* <sup>t</sup> <sup>=</sup> <sup>Z</sup>t<sup>1</sup>,...,t<sup>k</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup>*.*

*Proof (sketch).* By the conditions, every infinite (P, <sup>R</sup>)-chain induces an infinite (<sup>C</sup><sup>U</sup> ∪ ⇒β)<sup>∗</sup>· ⇒<sup>∗</sup> <sup>R</sup> sequence (where <sup>C</sup><sup>U</sup> is defined following Theorem 13). This contradicts computability unless there are only finitely many inequality steps. As pairs in P<sup>1</sup> give rise to a strict decrease, they may occur only finitely often.

*Example 64.* Following Examples 8 and 29, consider the projection function <sup>ν</sup> with <sup>ν</sup>(rec) = 1. As <sup>s</sup> <sup>X</sup> ☎acc <sup>X</sup> and lim <sup>H</sup> ☎acc <sup>H</sup>, both <sup>s</sup> <sup>X</sup> ❂ <sup>X</sup> and lim H ❂ H M hold. Thus *Proc*statc(P, <sup>R</sup>, computable<sup>R</sup>, formative) = {(∅, <sup>R</sup>, computable<sup>R</sup>, formative)}. By the dependency graph processor, the AFSM is terminating.

The computable subterm criterion processor fundamentally relies on the new computable<sup>U</sup> flag, so it has no counterpart in the literature so far.

#### **5.5 Non-termination**

While (most of) the processors presented so far are complete, none of them can actually return NO. We have not yet implemented such a processor; however, we can already provide a general specification of a *non-termination processor*.

**Theorem 65 (Non-termination processor).** *Let* M = (P, <sup>R</sup>, m, f) *be a DP problem. The processor that maps* M *to NO if it determines that a sufficient criterion for non-termination of* ⇒<sup>R</sup> *or for existence of an infinite conservative* (P, <sup>R</sup>)*-chain according to the flags* m *and* f *holds is sound and complete.*

*Proof.* Obvious.

This is a very general processor, which does not tell us *how* to determine such a sufficient criterion. However, it allows us to conclude non-termination as part of the framework by identifying a suitable infinite chain.

*Example 66.* If we can find a finite (P, <sup>R</sup>)-chain [(ρ<sup>0</sup>, s<sup>0</sup>, t<sup>0</sup>),...,(ρ<sup>n</sup>, s<sup>n</sup>, t<sup>n</sup>)] with <sup>t</sup><sup>n</sup> <sup>=</sup> <sup>s</sup><sup>0</sup><sup>γ</sup> for some substitution <sup>γ</sup> which uses only conservative DPs, is formative if <sup>f</sup> <sup>=</sup> formative and is <sup>U</sup>-computable if <sup>m</sup> <sup>=</sup> computable<sup>U</sup> , such a chain is clearly a sufficient criterion: there is an infinite chain [(ρ<sup>0</sup>, s<sup>0</sup>, t<sup>0</sup>),...,(ρ<sup>0</sup>, s<sup>0</sup>γ, t<sup>0</sup>γ),...,(ρ<sup>0</sup>, s<sup>0</sup>γγ, t<sup>0</sup>γγ),...]. If m <sup>=</sup> minimal and we find such a chain that is however not minimal, then note that ⇒<sup>R</sup> is nonterminating, which also suffices.

For example, for a DP problem (P, <sup>R</sup>, minimal, all) with <sup>P</sup> <sup>=</sup> {f F X <sup>g</sup> (F X), <sup>g</sup> X <sup>f</sup> <sup>h</sup> X}, there is a finite dependency chain: [(f F X <sup>g</sup> (F X), <sup>f</sup> <sup>h</sup> x, <sup>g</sup> (<sup>h</sup> x)), (g X <sup>f</sup> <sup>h</sup> X, <sup>g</sup> (<sup>h</sup> x), <sup>f</sup> <sup>h</sup> (<sup>h</sup> x))]. As <sup>f</sup> <sup>h</sup> (<sup>h</sup> x) is an instance of <sup>f</sup> <sup>h</sup> <sup>x</sup>, the processor maps this DP problem to NO.

To instantiate Theorem 65, we can borrow non-termination criteria from firstorder rewriting [13,21,42], with minor adaptions to the typed setting. Of course, it is worthwhile to also investigate dedicated higher-order non-termination criteria.

### **6 Conclusions and Future Work**

We have built on the static dependency pair approach [6,33,34,46] and formulated it in the language of the DP *framework* from first-order rewriting [20,22]. Our formulation is based on AFSMs, a dedicated formalism designed to make termination proofs transferrable to various higher-order rewriting formalisms.

This framework has two important additions over existing higher-order DP approaches in the literature. First, we consider not only arbitrary and minimally non-terminating dependency chains, but also minimally *non-computable* chains; this is tracked by the computable<sup>U</sup> flag. Using the flag, a dedicated processor allows us to efficiently handle rules like Example 8. This flag has no counterpart in the first-order setting. Second, we have generalised the idea of formative rules in [31] to a notion of formative *chains*, tracked by a formative flag. This makes it possible to define a corresponding processor that permanently removes rules.

*Implementation and Experiments.* To provide a strong formal groundwork, we have presented several processors in a general way, using semantic definitions of, e.g., the dependency graph approximation and formative rules rather than syntactic definitions using functions like *TCap* [21]. Even so, most parts of the DP framework for AFSMs have been implemented in the open-source termination prover WANDA [28], alongside a dynamic DP framework [31] and a mechanism to delegate some ordering constraints to a first-order tool [14]. For reduction triples, polynomial interpretations [15] and a version of HORPO [29, Ch. 5] are used. To solve the constraints arising in the search for these orderings, and also to determine sort orderings (for the accessibility relation) and projection functions (for the subterm criteria), WANDA employs an external SAT-solver. WANDA has won the higher-order category of the International Termination Competition [50] four times. In the International Confluence Competition [10], the tools ACPH [40] and CSIˆho [38] use WANDA as their "oracle" for termination proofs on HRSs.

We have tested WANDA on the *Termination Problems Data Base* [49], using AProVE [19] and MiniSat [12] as back-ends. When no additional features are enabled, WANDA proves termination of 124 (out of 198) benchmarks with static DPs, versus 92 with only a search for reduction orderings; a 34% increase. When all features except static DPs are enabled, WANDA succeeds on 153 benchmarks, versus 166 with also static DPs; an 8% increase, or alternatively, a 29% decrease in failure rate. The full evaluation is available in [17, Appendix D].

*Future Work.* While the static and the dynamic DP approaches each have their own strengths, there has thus far been little progress on a *unified* approach, which could take advantage of the syntactic benefits of both styles. We plan to combine the present work with the ideas of [31] into such a unified DP framework.

In addition, we plan to extend the higher-order DP framework to rewriting with *strategies*, such as implicit β-normalisation or strategies inspired by functional programming languages like OCaml and Haskell. Other natural directions are dedicated automation to detect non-termination, and reducing the number of term constraints solved by the reduction triple processor via a tighter integration with usable and formative rules with respect to argument filterings.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Coinduction in Uniform: Foundations for Corecursive Proof Search with Horn Clauses**

Henning Basold1(B), Ekaterina Komendantskaya2(B), and Yue Li<sup>2</sup>

<sup>1</sup> CNRS, ENS Lyon, Lyon, France henning.basold@ens-lyon.fr <sup>2</sup> Heriot-Watt University, Edinburgh, UK {ek19,yl55}@hw.ac.uk

**Abstract.** We establish proof-theoretic, constructive and coalgebraic foundations for proof search in coinductive Horn clause theories. Operational semantics of coinductive Horn clause resolution is cast in terms of *coinductive uniform proofs*; its constructive content is exposed via soundness relative to an intuitionistic first-order logic with recursion controlled by the later modality; and soundness of both proof systems is proven relative to a novel coalgebraic description of complete Herbrand models.

**Keywords:** Horn clause logic · Coinduction · Uniform proofs · Intuitionistic logic · Coalgebra · Fibrations · L¨ob modality

# **1 Introduction**

*Horn clause logic* is a Turing complete and constructive fragment of first-order logic, that plays a central role in verification [22], automated theorem proving [52, 53,57] and type inference. Examples of the latter can be traced from the Hindley-Milner type inference algorithm [55,73], to more recent uses of Horn clauses in Haskell type classes [26,51] and in refinement types [28,43]. Its popularity can be attributed to well-understood fixed point semantics and an efficient semidecidable resolution procedure for automated proof search.

According to the standard fixed point semantics [34,52], given a set P of Horn clauses, the *least Herbrand model* for P is the set of all (finite) ground atomic formulae *inductively entailed* by P. For example, the two clauses below define the set of natural numbers in the least Herbrand model.

> κ**nat**<sup>0</sup> : **nat** 0 <sup>κ</sup>**nat**s : <sup>∀</sup>x. **nat** <sup>x</sup> <sup>→</sup> **nat** (s x)

This work is supported by the European Research Council (ERC) under the EU's Horizon 2020 programme (CoVeCe, grant agreement No. 678157) and by the EPSRC research grants EP/N014758/1, EP/K031864/1-2.

Formally, the least Herbrand model for the above two clauses is the set of ground atomic formulae obtained by taking a (forward) closure of the above two clauses. The model for **nat** is given by N = {**nat** 0, **nat** (s 0), **nat** (s (s 0)),...}.

We can also view Horn clauses coinductively. The *greatest complete Herbrand model* for a set P of Horn clauses is the largest set of finite and infinite ground atomic formulae *coinductively entailed* by P. For example, the greatest complete Herbrand model for the above two clauses is the set

$$\mathcal{N}^{\infty} = \mathcal{N} \cup \{ \mathbf{nat} \,(s \,(s \,(\cdot \cdot \cdot))) \},$$

obtained by taking a backward closure of the above two inference rules on the set of all finite and infinite ground atomic formulae. The *greatest Herbrand model* is the largest set of *finite* ground atomic formulae *coinductively entailed* by P. In our example, it would be given by N already. Finally, one can also consider the *least complete Hebrand model*, which interprets entailment inductively but over potentially infinite terms. In the case of **nat**, this interpretation does not differ from N . However, finite paths in coinductive structures like transition systems, for example, require such semantics.

The need for coinductive semantics of Horn clauses arises in several scenarios: the Horn clause theory may explicitly define a coinductive data structure or a coinductive relation. However, it may also happen that a Horn clause theory, which is not explicitly intended as coinductive, nevertheless gives rise to infinite inference by resolution and has an interesting coinductive model. This commonly happens in type inference. We will illustrate all these cases by means of examples.

*Horn Clause Theories as Coinductive Data Type Declarations.* The following clause defines, together with <sup>κ</sup>**nat**<sup>0</sup> and <sup>κ</sup>**nat**s, the type of streams over natural numbers.

# κ**stream** : ∀xy. **nat** x ∧ **stream** y → **stream** (scons x y)

This Horn clause does not have a meaningful inductive, i.e. least fixed point, model. The greatest Herbrand model of the clauses is given by

$$\mathcal{S} = \mathcal{N}^{\infty} \cup \{ \texttt{stream}(\texttt{scons} \, x\_0 \, (\texttt{scons} \, x\_1 \, \cdots \,)) \mid \texttt{nat} \, x\_0, \texttt{nat} \, x\_1, \ldots \in \mathcal{N}^{\infty} \}$$

In trying to prove, for example, the goal (**stream** x), a goal-directed proof search may try to find a substitution for x that will make (**stream** x) valid relative to the coinductive model of this set of clauses. This search by resolution may proceed by means of an infinite reduction **stream** <sup>x</sup> <sup>κ</sup>**stream**:[scons y x- /x] - **nat** <sup>y</sup> <sup>∧</sup> **stream** <sup>x</sup> <sup>κ</sup>**nat**0:[0/y] **stream** <sup>x</sup> <sup>κ</sup>**stream**:[scons <sup>y</sup>- x--/x- ] - ··· , thereby generating a stream Z of zeros via composition of the computed substitutions: Z = (scons 0 x )[scons 0 x/x ] ··· . Above, we annotated each resolution step with the label of the clause it resolves against and the computed substitution. A method to compute an answer for this infinite sequence of reductions was given by Gupta et al. [41] and Simon et al. [69]: the underlined loop gives rise to the circular unifier x = scons 0 x that corresponds to the infinite term Z. It is proven that, if a loop and a corresponding circular unifier are detected, they provide an answer that is sound relative to the greatest complete Herbrand model of the clauses. This approach is known under the name of CoLP.

*Horn Clause Theories in Type Inference.* Below clauses give the typing rules of the simply typed λ-calculus, and may be used for type inference or type checking:

<sup>κ</sup>t<sup>1</sup> : <sup>∀</sup>x Γ a. **var** <sup>x</sup> <sup>∧</sup> **find** Γ xa <sup>→</sup> **typed** Γ xa <sup>κ</sup>t<sup>2</sup> : <sup>∀</sup>x Γ a m b. **typed** [<sup>x</sup> : <sup>a</sup>|Γ] m b <sup>→</sup> **typed** <sup>Γ</sup> (λxm) (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>κ</sup>t<sup>3</sup> : <sup>∀</sup>Γ a m n b. **typed** Γ m (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>∧</sup> **typed** Γ na <sup>→</sup> **typed** <sup>Γ</sup> (app m n) <sup>b</sup>

It is well known that the Y -combinator is not typable in the simply-typed λ-calculus and, in particular, self-application λx. x x is not typable either. However, by switching off the occurs-check in Prolog or by allowing circular unifiers in CoLP [41,69], we can resolve the goal "**typed** [] (λ x (app x x)) a" and would compute the circular substitution: a = b → c, b = b → c suggesting that an infinite, or circular, type may be able to type this λ-term. A similar trick would provide a typing for the Y -combinator. Thus, a coinductive interpretation of the above Horn clauses yields a theory of infinite types, while an inductive interpretation corresponds to the standard type system of the simply typed λ-calculus.

*Horn Clause Theories in Type Class Inference.* Haskell type class inference does not require circular unifiers but may require a cyclic resolution inference [37,51]. Consider, for example, the following mutually defined data structures in Haskell.

**data** OddList a = OCons a ( EvenList a ) **data** EvenList a = Nil | ECons a ( OddList a )

This type declaration gives rise to the following equality class instance declarations, where we leave the, here irrelevant, body out.

```
instance (Eq a , Eq (EvenList a)) => Eq (OddList a) where
instance (Eq a , Eq (OddList a )) => Eq (EvenList a) where
```
The above two type class instance declarations have the shape of Horn clauses. Since the two declarations mutually refer to each other, an instance inference for, e.g., **Eq** (OddList **Int**) will give rise to an infinite resolution that alternates between the subgoals **Eq** (OddList **Int**) and **Eq** (EvenList **Int**). The solution is to terminate the computation as soon as the cycle is detected [51], and this method has been shown sound relative to the greatest Herbrand models in [36]. We will demonstrate this later in the proof systems proposed in this paper.

The diversity of these coinductive examples in the existing literature shows that there is a practical demand for coinductive methods in Horn clause logic, but it also shows that no unifying proof-theoretic approach exists to allow for a generic use of these methods. This causes several problems.

**Problem 1. The existing proof-theoretic coinductive interpretations of cycle and loop detection are unclear, incomplete and not uniform.**


**Table 1. Examples of greatest (complete) Herbrand models for Horn clauses** γ1**,** γ2**,** γ3**.** The signatures are {a} for the clause γ<sup>1</sup> and {a, f} for the others.

To see this, consider Table 1, which exemplifies three kinds of circular phenomena in Horn clauses: The clause γ<sup>1</sup> is the easiest case. Its coinductive models are given by the finite set {p a}. On the other extreme is the clause γ<sup>3</sup> that, just like κ**stream**, admits only an infinite formula in its coinductive model. The intermediate case is γ2, which could be interpreted by an infinite set of finite formulae in its greatest Herbrand model, or may admit an infinite formula in its greatest complete Herbrand model. Examples like γ<sup>1</sup> appear in Haskell type class resolution [51], and examples like γ<sup>2</sup> in its experimental extensions [37]. Cycle detection would only cover computations for γ1, whereas γ2, γ<sup>3</sup> require some form of loop detection<sup>1</sup>. However, CoLP's loop detection gives confusing results here. It correctly fails to infer p a from γ<sup>3</sup> (no unifier for subgoals p a and p (f a) exists), but incorrectly fails to infer p a from γ<sup>2</sup> (also failing to unify p a and p (f a)). The latter failure is misleading bearing in mind that p a is in fact in the coinductive model of γ2. Vice versa, if we interpret the CoLP answer x = f x as a declaration of an infinite term (ff ...) in the model, then CoLP's answer for γ<sup>3</sup> and p x is exactly correct, however the same answer is badly incomplete for the query involving p x and γ2, because γ<sup>2</sup> in fact admits other, finite, formulae in its models. And in some applications, e.g. in Haskell type class inference, a finite formula would be the only acceptable answer for any query to γ2.

This set of examples shows that loop detection is too coarse a tool to give an operational semantics to a diversity of coinductive models.

**Problem 2. Constructive interpretation of coinductive proofs in Horn clause logic is unclear.** Horn clause logic is known to be a constructive fragment of FOL. Some applications of Horn clauses rely on this property in a crucial way. For example, inference in Haskell type class resolution is constructive: when a certain formula F is inferred, the Haskell compiler in fact constructs a proof term that inhabits F seen as type. In our earlier example **Eq** (OddList **Int**) of the Haskell type classes, Haskell in fact captures the cycle by a fixpoint term t and proves that t inhabits the type **Eq** (OddList **Int**).

<sup>1</sup> We follow the standard terminology of [74] and say that two formulae F and G form a cycle if F = G, and a loop if F[θ] = G[θ] for some (possibly circular) unifier θ.

**Fig. 1.** Cube of logics covered by CUP

Although we know from [36] that these computations are sound relative to greatest Herbrand models of Horn clauses, the results of [36] do not extend to Horn clauses like γ<sup>3</sup> or κ**stream**, or generally to Horn clauses modelled by the greatest *complete* Herbrand models. This shows that there is not just a need for coinductive proofs in Horn clause logic, but *constructive* coinductive proofs.

**Problem 3. Incompleteness of circular unification for irregular coinductive data structures.** Table 1 already showed some issues with incompleteness of circular unification. A more famous consequence of it is the failure of circular unification to capture irregular terms. This is illustrated by the following Horn clause, which defines the infinite stream of successive natural numbers.

$$\kappa\_{\mathbf{from}} : \forall x \, y. \mathbf{from} \, (s \, x) \, y \to \mathbf{from} \, x \, (\text{cons} \, x \, y)$$

The reductions for **from** 0 y consist only of irregular (non-unifiable) formulae:

$$\mathbf{from}\ 0\ y^{\kappa\_{\mathbf{from}}\colon[\operatorname{scons}\ 0\ y'/y]}\ \mathbf{from}\ (s\ 0)\ y'^{\kappa\_{\mathbf{from}}\colon[\operatorname{scons}\ \left(s\ 0\ \right)\ y'/y']}\ \cdots$$

The composition of the computed substitutions would suggest an infinite term as answer: **from** 0 (scons 0 (scons (s 0) ...)). However, circular unification no longer helps to compute this answer, and CoLP fails. Thus, there is a need for more general operational semantics that allows irregular coinductive structures.

#### **A New Theory of Coinductive Proof Search in Horn Clause Logic**

In this paper, we aim to give a principled and *general* theory that resolves the three problems above. This theory establishes a *constructive* foundation for coinductive resolution and allows us to give proof-theoretic characterisations of the approaches that have been proposed throughout the literature.

To solve Problem 1, we follow the footsteps of the *uniform proofs* by Miller et al. [53,54], who gave a general proof-theoretic account of resolution in firstorder Horn clause logic (*fohc*) and three extensions: first-order hereditary Harrop clauses (*fohh*), higher-order Horn clauses (*hohc*), and higher-order hereditary Harrop clauses (*hohh*). In Sect. 3, we extend uniform proofs with a general coinduction proof principle. The resulting framework is called *coinductive uniform proofs (CUP)*. We show how the coinductive extensions of the four logics of Miller et al., which we name *co-fohc*, *co-fohh*, *co-hohc* and *co-hohh*, give a precise proof-theoretic characterisation to the different kinds of coinduction described in the literature. For example, coinductive proofs involving the clauses γ<sup>1</sup> and γ<sup>2</sup> belong to *co-fohc* and *co-fohh*, respectively. However, proofs involving clauses like γ<sup>3</sup> or κ**stream** require in addition fixed point terms to express infinite data. These extentions are denoted by *co-fohc*fix, *co-fohh*fix, *co-hohc*fix and *co-hohh*fix.

Section 3 shows that this yields the cube in Fig. 1, where the arrows show the increase in logical strength. The invariant search for regular infinite objects done in CoLP is fully described by the logic *co-fohc*fix, including proofs for clauses like γ<sup>3</sup> and κ**stream**. An important consequence is that CUP is complete for γ1, γ2, and γ3, e.g. p a is provable from γ<sup>2</sup> in CUP, but not in CoLP.

In tackling Problem 3, we will find that the irregular proofs, such as those for κ**from**, can be given in *co-hohh*fix. The stream of successive numbers can be defined as a higher-order fixed point term sfr = fix f. λx.scons x (f (s x)), and the proposition <sup>∀</sup>x.**from** <sup>x</sup> (sfr <sup>x</sup>) is provable in *co-hohh*fix. This requires the use of higher-order syntax, fixed point terms and the goals of universal shape, which become available in the syntax of Hereditary Harrop logic.

In order to solve Problem 2 and to expose the constructive nature of the resulting proof systems, we present in Sect. 4 a coinductive extension of firstorder intuitionistic logic and its sequent calculus. This extension (**iFOL**-) is based on the so-called later modality (or L¨ob modality) known from provability logic [16,71], type theory [8,58] and domain theory [20]. However, our way of using the later modality to control recursion in first-order proofs is new and builds on [13,14]. In the same section we also show that CUP is sound relative to **iFOL**-, which gives us a handle on the constructive content of CUP. This yields, among other consequences, a constructive interpretation of CoLP proofs.

Section 5 is dedicated to showing soundness of both coinductive proof systems relative to *complete Herbrand models* [52]. The construction of these models is carried out by using coalgebras and category theory. This frees us from having to use topological methods and will simplify future extensions of the theory to, e.g., encompass typed logic programming. It also makes it possible to give original and constructive proofs of soundness for both CUP and **iFOL** in Sect. 5. We finish the paper with discussion of related and future work.

#### **Originality of the Contribution**

The results of this paper give a comprehensive characterisation of coinductive Horn clause theories from the point of view of proof search (by expressing coinductive proof search and resolution as coinductive uniform proofs), constructive proof theory (via a translation into an intuitionistic sequent calculus), and coalgebraic semantics (via coinductive Herbrand models and constructive soundness results). Several of the presented results have never appeared before: the coinductive extension of uniform proofs; characterisation of coinductive properties of Horn clause theories in higher-order logic with and without fixed point operators; coalgebraic and fibrational view on complete Herbrand models; and soundness of an intuitionistic logic with later modality relative to complete Herbrand models.

# **2 Preliminaries: Terms and Formulae**

In this section, we set up notation and terminology for the rest of the paper. Most of it is standard, and blends together the notation used in [53] and [11].

**Definition 1.** We define the sets T of *types* and P of *proposition types* by the following grammars, where ι and o are the *base type* and *base proposition type*.

$$\mathbb{T} \ni \sigma, \tau ::= \iota \mid \sigma \to \tau \qquad\qquad \mathbb{P} \ni \rho ::= o \mid \sigma \to \rho, \quad \sigma \in \mathbb{T}$$

We adapt the usual convention that → binds to the right.


**Fig. 2.** Well-formed terms

**Fig. 3.** Well-formed formulae

**Definition 2.** <sup>A</sup> *term signature* <sup>Σ</sup> is a set of pairs <sup>c</sup> : <sup>τ</sup> , where <sup>τ</sup> <sup>∈</sup> <sup>T</sup>, and a *predicate signature* is a set <sup>Π</sup> of pairs <sup>p</sup> : <sup>ρ</sup> with <sup>ρ</sup> <sup>∈</sup> <sup>P</sup>. The elements in <sup>Σ</sup> and Π are called *term symbols* and *predicate symbols*, respectively. Given term and predicate signatures Σ and Π, we refer to the pair (Σ,Π) as *signature*. Let Var be a countable set of variables, the elements of which we denote by x, y, . . . We call a finite list <sup>Γ</sup> of pairs <sup>x</sup> : <sup>τ</sup> of variables and types a *context*. The set <sup>Λ</sup>Σ of *(well-typed) terms* over <sup>Σ</sup> is the collection of all <sup>M</sup> with <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> for some context <sup>Γ</sup> and type <sup>τ</sup> <sup>∈</sup> <sup>T</sup>, where <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> is defined inductively in Fig. 2. A term is called *closed* if <sup>M</sup> : <sup>τ</sup> , otherwise it is called *open*. Finally, we let <sup>Λ</sup><sup>−</sup> Σ denote the set of all terms M that do not involve fix.

**Definition 3.** Let (Σ,Π) be a signature. We say that ϕ is a *(first-order) formula* in context Γ, if Γ ϕ is inductively derivable from the rules in Fig. 3.

**Definition 4.** The *reduction relation* −→ on terms in <sup>Λ</sup>Σ is given as the compatible closure (reduction under applications and binders) of β- and fixreduction:

$$(\lambda x.M)N \longrightarrow M\left[N/x\right] \qquad \text{fix } x.M \longrightarrow M\left[\text{fix } x.M/x\right]$$

We denote the reflexive, transitive closure of −→ by . Two terms M and <sup>N</sup> are called *convertible*, if <sup>M</sup> <sup>≡</sup> <sup>N</sup>, where <sup>≡</sup> is the equivalence closure of −→. Conversion of terms extends to formulae in the obvious way: if <sup>M</sup>k <sup>≡</sup> <sup>M</sup> k for <sup>k</sup> = 1,...,n, then p M<sup>1</sup> ··· <sup>M</sup>n <sup>≡</sup> p M <sup>1</sup> ··· M n.

We will use in the following that the above calculus features subject reduction and confluence, cf. [61]: if Γ M : τ and M ≡ N, then Γ N : τ ; and M ≡ N iff there is a term P, such that M P and N P.

The *order* of a type <sup>τ</sup> <sup>∈</sup> <sup>T</sup> is given as usual by ord(ι) = 0 and ord(<sup>σ</sup> <sup>→</sup> <sup>τ</sup> ) = max{ord(σ)+1, ord(τ )}. If ord(τ ) ≤ 1, then the arity of τ is given by ar(ι)=0 and ar(<sup>ι</sup> <sup>→</sup> <sup>τ</sup> ) = ar(<sup>τ</sup> ) + 1. A signature <sup>Σ</sup> is called *first-order*, if for all <sup>f</sup> : <sup>τ</sup> <sup>∈</sup> <sup>Σ</sup> we have ord(τ ) ≤ 1. We let the arity of f then be ar(τ ) and denote it by ar(f).

**Definition 5.** The set of *guarded base terms* over a first-order signature Σ is given by the following type-driven rules.

$$\begin{array}{c c c} \begin{array}{l} x:\tau\in\Gamma \qquad \text{ord}(\tau)\leq 1\\ \hline \end{array} & \begin{array}{l} \begin{array}{l} \Gamma:\tau\in\Sigma\\ \Gamma\vdash\_{g}f:\tau\end{array} \quad \begin{array}{l} \Gamma\vdash\_{g}M:\sigma\to\tau\\ \Gamma\vdash\_{g}MN:\tau\end{array} \end{array} \end{array} \xrightarrow{\begin{array}{l} \Gamma\vdash\_{g}M:\sigma\to\tau\\ \Gamma\vdash\_{g}MN:\tau\end{array}} \begin{array}{l} \Gamma\vdash\_{g}M:\sigma\to\tau\\ \Gamma\vdash\_{g}MN:\tau\end{array}$$

$$\begin{array}{l} \begin{array}{l} f:\sigma\in\Sigma\\ \Gamma\vdash\_{g}\text{fix}\,x.\,\lambda\,\widetilde{y}.f\,\widetilde{M}:\tau\end{array} \end{array}$$

General *guarded terms* are terms M, such that all fix-subterms are guarded base terms, which means that they are generated by the following grammar.

$$G ::= M \text{ (with} \vdash\_g M : \tau \text{ for some type } \tau\text{) } \mid c \in \Sigma \mid x \in \text{Var} \mid G \, G \mid \lambda x. G$$

Finally, <sup>M</sup> is a *first-order* term over <sup>Σ</sup> with <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> if ord(<sup>τ</sup> ) <sup>≤</sup> 1 and the types of all variables occurring in Γ are of order 0. We denote the set of guarded first-order terms <sup>M</sup> with <sup>Γ</sup> <sup>M</sup> : <sup>ι</sup> by <sup>Λ</sup>G,<sup>1</sup> Σ (Γ) and the set of guarded terms in Γ by Λ<sup>G</sup> Σ(Γ). If <sup>Γ</sup> is empty, we just write <sup>Λ</sup>G,<sup>1</sup> Σ and <sup>Λ</sup><sup>G</sup> Σ, respectively.

Note that an important aspect of guarded terms is that no free variable occurs under a fix-operator. *Guarded base terms* should be seen as specific fixed point terms that we will be able to unfold into potentially infinite trees. *Guarded terms* close guarded base terms under operations of the simply typed λ-calculus.

**Example 6.** Let us provide a few examples that illustrate (first-order) guarded terms. We use the first-order signature Σ = {scons: ι → ι → ι, s: ι → ι, 0 : ι}.

1. Let sfr = fix f. λx.scons x (f (s x)) be the function that computes the streams of numerals starting at the given argument. It is easy to show that g <sup>s</sup>fr : <sup>ι</sup> <sup>→</sup> <sup>ι</sup> and so <sup>s</sup>fr <sup>0</sup> <sup>∈</sup> <sup>Λ</sup>G,<sup>1</sup> Σ .


The purpose of guarded terms is that these are productive, that is, we can reduce them to a term that either has a function symbol at the root or is just a variable. In other words, guarded terms have head normal forms: We say that a term <sup>M</sup> is in *head normal form*, if <sup>M</sup> <sup>=</sup> <sup>f</sup> #—<sup>N</sup> for some <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> or if <sup>M</sup> <sup>=</sup> <sup>x</sup> for some variable x. The following lemma is a technical result that is needed to show in Lemma 8 that all guarded terms have a head normal form.

**Lemma 7.** *Let* <sup>M</sup> *and* <sup>N</sup> *be guarded base terms with* Γ, x : <sup>σ</sup> g <sup>M</sup> : <sup>τ</sup> *and* <sup>Γ</sup> g <sup>N</sup> : <sup>σ</sup>*. Then* <sup>M</sup> [N/x] *is a guarded base term with* <sup>Γ</sup> g <sup>M</sup> [N/x] : <sup>τ</sup> *.*

**Lemma 8.** *If* <sup>M</sup> *is a first-order guarded term with* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup>G,<sup>1</sup> Σ (Γ)*, then* <sup>M</sup> *reduces to a unique head normal form. This means that either (i) there is a unique* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> *and terms* <sup>N</sup>1,...,Nar(f) *with* <sup>Γ</sup> <sup>g</sup> <sup>N</sup><sup>k</sup> : <sup>ι</sup> *and* <sup>M</sup> <sup>f</sup> #—N*, and for all* <sup>L</sup> *if* <sup>M</sup> <sup>f</sup> #—L*, then* #—<sup>N</sup> <sup>≡</sup> #—L*; or (ii)* <sup>M</sup> <sup>x</sup> *for some* <sup>x</sup> : <sup>ι</sup> <sup>∈</sup> <sup>Γ</sup>*.*

We end this section by introducing the notion of an atom and refinements thereof. This will enable us to define the different logics and thereby to analyse the strength of coinduction hypotheses, which we promised in the introduction.

**Definition 9.** A formula <sup>ϕ</sup> of the shape  or p M<sup>1</sup> ··· <sup>M</sup>n is an *atom* and a


First-order, guarded and simple atoms are denoted by At1, Atg ω and At<sup>s</sup> ω. We denote conjunctions of these predicates by Atg <sup>1</sup> = At1∩At<sup>g</sup> ω and At<sup>s</sup> <sup>1</sup> = At1∩At<sup>s</sup> ω.

Note that the restriction for Atg ω only applies to fixed point terms. Hence, any formula that contains terms without fix is already in Atg ω and At<sup>g</sup> ω <sup>∩</sup> At<sup>s</sup> ω = At<sup>s</sup> ω. Since these notions are rather subtle, we give a few examples

**Example 10.** We list three examples of first-order atoms.


# **3 Coinductive Uniform Proofs**

This section introduces the eight logics of the coinductive uniform proof framework announced and motivated in the introduction. The major difference of uniform proofs with, say, a sequent calculus is the "uniformity" property, which means that the choice of the application of each proof rule is deterministic and all proofs are in normal form (cut free). This subsumes the operational semantics of resolution, in which the proof search is always goal directed. Hence, the main challenge, that we set out to solve in this section, is to extend the uniform proof framework with coinduction, while preserving this valuable operational property.

We begin by introducing the different goal formulae and definite clauses that determine the logics that were presented in the cube for coinductive uniform proofs in the introduction. These clauses and formulae correspond directly to those of the original work on uniform proofs [53] with the only difference being that we need to distinguish atoms with and without fixed point terms. The general idea is that goal formulae (G-formulae) occur on the right of a sequent, thus are the *goal* to be proved. Definite clauses (D-formulae), on the other hand, are selected from the context as assumptions. This will become clear once we introduce the proof system for coinductive uniform proofs.

**Definition 11.** Let <sup>D</sup>i be generated by the following grammar with <sup>i</sup> ∈ {1, ω}.

$$D\_i ::= \operatorname{At}\_i^s \mid G \to D \mid D \land D \mid \forall x : \tau. D$$



The sets of definite clauses (D-formulae) and goals (G-formulae) of the four logics *co-fohc*, *co-fohh*, *co-hohc*, *co-hohh* are the well-formed formulae of the corresponding shapes defined in Table 2. For the variations *co-fohh*fix etc. of these logics with fixed point terms, we replace upper index "s" with "g" everywhere in Table 2. A <sup>D</sup>-formula of the shape <sup>∀</sup>#—x.A1∧···∧An <sup>→</sup> <sup>A</sup><sup>0</sup> is called <sup>H</sup>*-formula* or *Horn clause* if <sup>A</sup>k <sup>∈</sup> At<sup>s</sup> <sup>1</sup>, and <sup>H</sup>g*-formula* if <sup>A</sup>k <sup>∈</sup> At<sup>g</sup> <sup>1</sup>. Finally, a *logic program* (or *program*) P is a set of H-formulae. Note that any set of D-formulae in *fohc* can be transformed into an intuitionistically equivalent set of H-formulae [53].

We are now ready to introduce the coinductive uniform proofs. Such proofs are composed of two parts: an outer coinduction that has to be at the root of a proof tree, and the usual the usual uniform proofs by Miller et al. [54]. The latter are restated in Fig. 4. Of special notice is the rule decide that mimics the operational behaviour of resolution in logic programming, by choosing a clause D from the given program to resolve against. The coinduction is started by the rule co-fix in Fig. 5. Our proof system mimics the typical recursion with a guard condition found in coinductive programs and proofs [5,8,19,31,40]. This guardedness condition is formalised by applying the guarding modality on the formula being proven by coinduction and the proof rules that allow us to distribute the guard over certain logical connectives, see Fig. 5. The guarding modality may be discharged only if the guarded goal was resolved against a clause in the initial program or any hypothesis, except for the coinduction hypotheses. This is reflected in the rule decide, where we may only pick a clause from P, and is in contrast to the rule decide, in which we can pick *any* hypothesis. The proof may only terminate with the initial step if the goal is no longer guarded.

Note that the co-fix rule introduces a goal as a new hypothesis. Hence, we have to require that this goal is also a definite clause. Since coinduction hypotheses play such an important role, they deserve a separate definition.

**Definition 12.** Given a language L from Table 2, a formula ϕ is a *coinduction goal* of L if ϕ simultaneously is a D- and a G-formula of L.

Note that the coinduction goals of *co-fohc* and *co-fohh* can be transformed into equivalent H- or Hg-formulae, since any coinduction goal is a D-formula.

Let us now formally introduce the coinductive uniform proof system.

#### **Fig. 4.** Uniform proof rules

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \Sigma;P;\varphi\Longrightarrow\langle\varphi\rangle\\ \Sigma;P\end{array} \end{array} \begin{array}{c} \begin{array}{c} \Sigma;P;\varphi\Longrightarrow\langle\varphi\rangle\\ \Sigma;P\end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{Cos-FIX} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{Cos-FIX} \end{array} \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \Sigma;P;\Delta\stackrel{D}{\Rightarrow}A \quad D\in P\\ \Sigma;P;\Delta\Longrightarrow\langle A\rangle \end{array} \begin{array}{c} \begin{array}{c} c:\tau,\Sigma;P;\Delta\Longrightarrow\langle\varphi\mid c/x\rangle\end{array} \longrightarrow \langle\varphi\mid c/x\rangle\quad\langle\varphi:\tau\notin\Sigma\\ \Sigma;P;\Delta\Longrightarrow\langle\forall x:\tau.\varphi\rangle \end{array} \end{array} \begin{array}{c} c:\tau\notin\Sigma\\ \Sigma;P;\Delta\Longrightarrow\langle\forall x:\tau.\varphi\rangle \end{array} \right.$$

**Fig. 5.** Coinductive uniform proof rules

**Definition 13.** Let P and Δ be finite sets of, respectively, definite clauses and coinduction goals, over the signature Σ, and suppose that G is a goal and ϕ is a coinduction goal. A *sequent* is either a *uniform provability sequent* of the form Σ; P; Δ =⇒ G or Σ; P; Δ <sup>D</sup> <sup>=</sup><sup>⇒</sup> <sup>A</sup> as defined in Fig. 4, or it is a *coinductive uniform provability sequent* of the form Σ; P ϕ as defined in Fig. 5. Let L be a language from Table 2. We say that ϕ is *coinductively provable* in L, if P is a set of D-formulae in L, ϕ is a coinduction goal in L and Σ; P ϕ holds.

The logics we have introduced impose different syntactic restrictions on Dand G-formulae, and will therefore admit coinduction goals of different strength. This ability to explicitly use stronger coinduction hypotheses within a goaldirected search was missing in CoLP, for example. And it allows us to account for different coinductive properties of Horn clauses as described in the introduction. We finish this section by illustrating this strengthening.

The first example is one for the logic *co-fohc*, in which we illustrate the framework on the problem of type class resolution.

**Example 14.** Let us restate the Haskell type class inference problem discussed in the introduction in terms of Horn clauses:

$$\begin{aligned} \kappa\_{\mathbf{i}} &: \mathbf{eq} \text{ i } \\ \kappa\_{\text{odd}} &: \forall x. \mathbf{eq} \ x \land \mathbf{eq} \text{ (even } x) \to \mathbf{eq} \text{ (odd } x) \\ \kappa\_{\text{even}} &: \forall x. \mathbf{eq} \ x \land \mathbf{eq} \text{ (odd } x) \to \mathbf{eq} \text{ (even } x) \end{aligned}$$

To prove **eq** (odd i) for this set of Horn clauses, it is sufficient to use this formula directly as coinduction hypothesis, as shown in Fig. 6. Note that this formula is indeed a coinduction goal of *co-fohc*, hence we find ourselves in the simplest scenario of coinductive proof search. In Table 1, γ<sup>1</sup> is a representative for this kind of coinductive proofs with simplest atomic goals.

It was pointed out in [37] that Haskell's type class inference can also give rise to irregular corecursion. Such cases may require the more general coinduction

**Fig. 6.** The *co-fohc* proof for Horn clauses arising from Haskell Type class examples. ϕ abbreviates the coinduction hypothesis **eq** (odd i). Note its use in the branch ♠.

hypothesis (e.g. universal and/or implicative) of *co-fohh* or *co-hohh*. The below set of Horn clauses is a simplified representation of a problem given in [37]:

$$\begin{aligned} \kappa\_i &: \mathbf{eq} \, \mathrm{i} \\ \kappa\_s &: \forall x. (\mathbf{eq} \, x) \land \mathbf{eq} \, (s \, (g \, x)) \to \mathbf{eq} \, (s \, x) \\ \kappa\_g &: \forall x. \, \mathbf{eq} \, x & \to \mathbf{eq} \, (g \, x) \end{aligned}$$

Trying to prove **eq** (s i) by using **eq** (s i) directly as a coinduction hypothesis is deemed to fail, as the coinductive proof search is irregular and this coinduction hypothesis would not be applicable in any guarded context. But it is possible to prove **eq** (s i) as a corollary of another theorem: ∀x.(**eq** x) → **eq** (s x). Using this formula as coinduction hypothesis leads to a successful proof, which we omit here. From this more general goal, we can derive the original goal by instantiating the quantifier with i and eliminating the implication with κi. This second derivation is sound with respect to the models, as we show in Theorem 34.

We encounter γ<sup>2</sup> from Table 1 in a similar situation: To prove p a, we first have to prove <sup>∀</sup>x. p x in *co-fohh*, and then obtain p a as a corollary by appealing to Theorem 34. The next example shows that we can cover all cases in Table 1 by providing a proof in *co-hohh*fix that involves irregular recursive terms.

**Example 15.** Recall the clause ∀x y.**from** (s x) y → **from** x (scons x y) that we named κ**from** in the introduction. Proving ∃y.**from** 0 y is again not possible directly. Instead, we can use the term sfr = fix f. λx.scons x (f (s x)) from Example 6 and prove ∀x.**from** x (sfr x) coinductively, as shown in Fig. 7. This formula gives a coinduction hypothesis of sufficient generality. Note that the correct coinduction hypothesis now requires the fixed point definition of an infinite stream of successive numbers and universal quantification in the goal. Hence the need for the richer language of *co-hohh*fix. From this more general goal we can derive our initial goal ∃ y.**from** 0 y by instantiating y with sfr 0.

**Fig. 7.** The *co-hohh*fix proof for ϕ = ∀x.**from** x (sfr x). Note that the last step of the leftmost branch involves **from** c (scons c (sfr (s c))) ≡ **from** c (sfr c).

There are examples of coinductive proofs that require a fixed point definition of an infinite stream, but do not require the syntax of higher-order terms or hereditary Harrop formulae. Such proofs can be performed in the *co-fohc*fix logic. A good example is a proof that the stream of zeros satisfies the Horn clause theory defining the predicate **stream** in the introduction. The goal (**stream** s0), with s<sup>0</sup> = fix x.scons 0 x can be proven directly by coinduction. Similarly, one can type self-application with the infinite type a = fix t. t → b for some given type b. The proof for **typed** [x : a] (app x x) b is then in *co-fohc*fix. Finally, the clause γ<sup>3</sup> is also in this group. More generally, circular unifiers obtained from CoLP's [41] loop detection yield immediately guarded fixed point terms, and thus CoLP corresponds to coinductive proofs in the logic *co-fohc*fix. A general discussion of Horn clause theories that describe infinite objects was given in [48], where the above logic programs were identified as being productive.

# **4 Coinductive Uniform Proofs and Intuitionistic Logic**

In the last section, we introduced the framework of coinductive uniform proofs, which gives an operational account to proofs for coinductively interpreted logic programs. Having this framework at hand, we need to position it in the existing ecosystem of logical systems. The goal of this section is to prove that coinductive uniform proofs are in fact constructive. We show this by first introducing an extension of intuitionistic first-order logic that allows us to deal with recursive

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash \mathsf{I} \\ \Gamma \mid \Delta \vdash \varphi \end{array} \left(\begin{array}{c} \Gamma \vdash \Delta \quad \varphi \mathrel{\mathop{:}\big{\mathsf{T}}} \quad\varphi \equiv \varphi' \\ \Gamma \mid \Delta \vdash \varphi \end{array}\right) \\ \begin{array}{c} \Gamma \mid \Delta \vdash \varphi \\ \Gamma \mid \Delta \vdash \varphi \land \psi \end{array} \left(\begin{array}{c} \Gamma \mid \Delta \vdash \psi \\ \Gamma \mid \Delta \vdash \varphi \end{array}\right) \left(\begin{array}{c} \Gamma \mid \Delta \vdash \varphi\_{1} \land \varphi\_{2} \\ \Gamma \mid \Delta \vdash \varphi\_{i} \end{array}\right) \left(\begin{array}{c} \Gamma \mid \Delta \vdash \varphi\_{1} \land \varphi\_{2} \\ \Gamma \mid \Delta \vdash \varphi\_{i} \end{array}\right) \\ \begin{array}{c} \Gamma \mid \Delta \vdash \varphi\_{i} \quad \Gamma \vdash \varphi\_{j} \quad \jmath \neq i \end{array}\left(\begin{array}{c} \Gamma \mid \Delta, \varphi\_{1} \vdash \psi \quad \Gamma \mid \Delta, \varphi\_{2} \vdash \psi \\ \Gamma \mid \Delta \vdash \varphi\_{i} \lor \varphi\_{2} \vdash \psi \end{array}\right) \left(\begin{array}{c} \Gamma \mid \Delta, \varphi\_{1} \vdash \psi \quad \Gamma \mid \Delta, \varphi\_{2} \vdash \psi \\ \Gamma \mid \Delta, \varphi\_{1} \lor \varphi\_{2} \vdash \psi \end{array}\right) \left(\begin{array}{c} \Gamma \mid \Delta \vdash \varphi \to \psi \\ \Gamma \mid \Delta \vdash \psi \\ \Gamma \mid \Delta \vdash \psi \end{array}\right) \\ \begin{array}{c} \Gamma, x \mathrel{$$

**Fig. 8.** Intuitionistic rules for standard connectives

proofs for coinductive predicates. Afterwards, we show that coinductive uniform proofs are sound relative to this logic by means of a proof tree translation. The model-theoretic soundness proofs for both logics will be provided in Sect. 5.

We begin by introducing an extension of intuitionistic first-order logic with the so-called *later modality*, written . This modality is the essential ingredient that allows us to equip proofs with a controlled form of recursion. The later modality stems originally from provability logic, which characterises transitive, well-founded Kripke frames [30,72], and thus allows one to carry out induction without an explicit induction scheme [16]. Later, the later modality was picked up by the type-theoretic community to control recursion in coinductive programming [8,9,21,56,58], mostly with the intent to replace syntactic guardedness checks for coinductive definitions by type-based checks of well-definedness.

Formally, the logic **iFOL**is given by the following definition.

**Definition 16.** The formulae of **iFOL**are given by Definition 3 and the rule:

$$\frac{F \Vdash \varphi}{F \Vdash \blacktriangleright \varphi}$$

Conversion extends to these formulae in the obvious way. Let ϕ be a formula and Δ a sequence of formulae in **iFOL**-. We say ϕ is *provable in context* Γ *under the assumptions* Δ in **iFOL**-, if <sup>Γ</sup> <sup>|</sup> <sup>Δ</sup> <sup>ϕ</sup> holds. The *provability relation* is thereby given inductively by the rules in Figs. 8 and 9.


**Fig. 9.** Rules for the later modality

The rules in Fig. 8 are the usual rules for intuitionistic first-order logic and should come at no surprise. More interesting are the rules in Fig. 9, where the rule **(L¨ob)** introduces recursion into the proof system. Furthermore, the rule **(Mon)** allows us to to distribute the later modality over implication, and consequently over conjunction and universal quantification. This is essential in the translation in Theorem 18 below. Finally, the rule **(Next)** gives us the possibility to proceed without any recursion, if necessary.

Note that so far it is not possible to use the assumption ϕ introduced in the **(L¨ob)**-rule. The idea is that the formulae of a logic program provide us the obligations that we have to prove, possibly by recursion, in order to prove a coinductive predicate. This is cast in the following definition.

**Definition 17.** Given an <sup>H</sup>g-formula <sup>ϕ</sup> of the shape <sup>∀</sup>#—x.(A<sup>1</sup> ∧···∧ <sup>A</sup>n) <sup>→</sup> <sup>ψ</sup>, we define its *guarding* <sup>ϕ</sup> to be <sup>∀</sup>#—x.( <sup>A</sup><sup>1</sup> ∧···∧ <sup>A</sup>n) <sup>→</sup> <sup>ψ</sup>. For a logic program P, we define its guarding P by guarding each formula in P.

The translation given in Definition 17 of a logic program into formulae that admit recursion corresponds unfolding a coinductive predicate, cf. [14]. We show now how to transform a coinductive uniform proof tree into a proof tree in **iFOL**-, such that the recursion and guarding mechanisms in both logics match up.

**Theorem 18.** *If* P *is a logic program over a first-order signature* Σ *and the sequent* <sup>Σ</sup>; <sup>P</sup> <sup>ϕ</sup> *is provable in co-hohh*fix*, then* <sup>P</sup> <sup>ϕ</sup> *is provable in* **iFOL**-*.*

To prove this theorem, one uses that each coinductive uniform proof tree starts with an initial tree that has an application of the co-fix-rule at the root and that eliminates the guard by using the rules in Fig. 5. At the leaves of this tree, one finds proof trees that proceed only by means of the rules in Fig. 4. The initial tree is then translated into a proof tree in **iFOL** that starts with an application of the **(L¨ob)**-rule, which corresponds to the co-fix-rule, and that simultaneously transforms the coinduction hypothesis and applies introduction rules for conjunctions etc. This ensures that we can match the coinduction hypothesis with the guarded formulae of the program P.

The results of this section show that it is irrelevant whether the guarding modality is used on the right (CUP-style) or on the left (**iFOL**--style), as the former can be translated into the latter. However, CUP uses the guarding on the right to preserve proof uniformity, whereas **iFOL** extends a general sequent calculus. Thus, to obtain the reverse translation, we would have to have an admissible cut rule in CUP. The main ingredient to such a cut rule is the ability to prove several coinductive statements simultaneously. This is possible in CUP by proving the conjunction of these statements. Unfortunately, we cannot eliminate such a conjunction into one of its components, since this would require nondeterministic guessing in the proof construction, which in turn breaks uniformity. Thus, we leave a solution of this problem for future work.

# **5 Herbrand Models and Soundness**

In Sect. 4 we showed that coinductive uniform proofs are sound relative to the intuitionistic logic **iFOL**-. This gives us a handle on the constructive nature of coinductive uniform proofs. Since **iFOL** is a non-standard logic, we still need to provide semantics for that logic. We do this by interpreting in Sect. 5.4 the formulae of **iFOL** over the well-known (complete) Herbrand models and prove the soundness of the accompanying proof system with respect to these models. Although we obtain soundness of coinductive uniform proofs over Herbrand models from this, this proof is indirect and does not give a lot of information about the models captured by the different calculi *co-fohc* etc. For this reason, we will give in Sect. 5.3 a direct soundness proof for coinductive uniform proofs. We also obtain coinduction invariants from this proof for each of the calculi, which allows us to describe their proof strength.

### **5.1 Coinductive Herbrand Models and Semantics of Terms**

Before we come to the soundness proofs, we introduce in this section (complete) Herbrand models by using the terminology of final coalgebras. We then utilise this description to give operational and denotational semantics to guarded terms. These semantics show that guarded terms allow the description and computation of potentially infinite trees.

The coalgebraic approach has been proven very successful both in logic and programming [1,75,76]. We will only require very little category theoretical vocabulary and assume that the reader is familiar with the category **Set** of sets and functions, and functors, see for example [12,25,50]. The terminology of algebras and coalgebras [4,47,64,65] is given by the following definition.

**Definition 19.** <sup>A</sup> *coalgebra* for a functor <sup>F</sup> : **Set** <sup>→</sup> **Set** is a map <sup>c</sup> : <sup>X</sup> <sup>→</sup> F X. Given coalgebras d : Y → F Y and c : X → F X, we say that a map h : Y → X is a *homomorphism* <sup>d</sup> <sup>→</sup> <sup>c</sup> if F h ◦ <sup>d</sup> <sup>=</sup> <sup>c</sup> ◦ <sup>h</sup>. We call a coalgebra <sup>c</sup> : <sup>X</sup> <sup>→</sup> F X *final*, if for every coalgebra <sup>d</sup> there is a unique homomorphism <sup>h</sup>: <sup>d</sup> <sup>→</sup> <sup>c</sup>. We will refer to h as the *coinductive extension* of d.

The idea of (complete) Herbrand models is that a set of Horn clauses determines for each predicate symbol a set of potentially infinite terms. Such terms are (potentially infinite) trees, whose nodes are labelled by function symbols and whose branching is given by the arity of these function symbols. To be able to deal with open terms, we will allow such trees to have leaves labelled by variables. Such trees are a final coalgebra for a functor determined by the signature.

**Definition 20.** Let Σ be first-order signature. The *extension* of a first-order signature Σ is a (polynomial) functor [38] -Σ : **Set** → **Set** given by

$$\mathbb{E}[\Sigma](X) = \coprod\_{f \in \Sigma} X^{\operatorname{ar}(f)},$$

where ar: <sup>Σ</sup> <sup>→</sup> <sup>N</sup> is defined in Sect. <sup>2</sup> and <sup>X</sup><sup>n</sup> is the <sup>n</sup>-fold product of <sup>X</sup>. We define for a set V a functor -Σ+V : **Set** → **Set** by (-Σ+V )(X) = -Σ(X)+V , where + is the coproduct (disjoint union) in **Set**.

To make sense of the following definition, we note that we can view Π as a signature and we thus obtain its extension -Π. Moreover, we note that the final coalgebra of -Σ + V exists because -Σ is a polynomial functor.

**Definition 21.** Let Σ be a first-order signature. The *coterms* over Σ are the final coalgebra rootV : <sup>Σ</sup>∞(<sup>V</sup> ) <sup>→</sup> -Σ(Σ∞(V )) + V . For brevity, we denote the coterms with no variables, i.e. Σ∞(∅), by root: Σ<sup>∞</sup> → -Σ(Σ∞), and call it the *(complete) Herbrand universe* and its elements *ground* coterms. Finally, we let the *(complete) Herbrand base* <sup>B</sup><sup>∞</sup> be the set -Π(Σ∞).

The construction Σ∞(V ) gives rise to a functor Σ<sup>∞</sup> : **Set** → **Set**, called the *free completely iterative monad* [5]. If there is no ambiguity, we will drop the injections <sup>κ</sup>i when describing elements of <sup>Σ</sup>∞(<sup>V</sup> ). Note that <sup>Σ</sup>∞(<sup>V</sup> ) is final with property that for every <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>∞(<sup>V</sup> ) either there are <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> and #—<sup>t</sup> <sup>∈</sup> (Σ∞(<sup>V</sup> ))ar(f) with rootV (s) = <sup>f</sup>( #—<sup>t</sup> ), or there is <sup>x</sup> <sup>∈</sup> <sup>V</sup> with rootV (s) = <sup>x</sup>. Finality allows us to specify unique maps into Σ∞(V ) by giving a coalgebra X → -Σ(X) + V . In particular, one can define for each θ : V → Σ<sup>∞</sup> the substitution t[θ] of variables in the coterm t by θ as the coinductive extension of the following coalgebra.

$$
\Sigma^{\infty}(V) \xrightarrow{\text{root}\_V} \mathbb{[\Sigma]}(\Sigma^{\infty}(V)) + V \xrightarrow{[\text{id}, \text{root} \theta]} \mathbb{[\Sigma]}(\Sigma^{\infty}(V))
$$

Now that we have set up the basic terminology of coalgebras, we can give semantics to guarded terms from Definition 5. The idea is that guarded terms guarantee that we can always compute with them so far that we find a function symbol in head position, see Lemma 8. This function symbol determines then the label and branching of a node in the tree generated by a guarded term. If the computation reaches a constant or a variable, then we stop creating the tree at the present branch. This idea is captured by the following lemma.

**Lemma 22.** *There is a map* [[−]]<sup>1</sup> : <sup>Λ</sup>G,<sup>1</sup> Σ (Γ) <sup>→</sup> <sup>Σ</sup>∞(Γ) *that is unique with*


*Proof (sketch).* By Lemma 8, we can define a coalgebra on the quotient of guarded terms by convertibility c : ΛG,<sup>1</sup> Σ (Γ)/<sup>≡</sup> <sup>→</sup> -Σ ΛG,<sup>1</sup> Σ (Γ)/<sup>≡</sup> + Γ with c[M] = f[ #—N] if <sup>M</sup> <sup>f</sup> #—<sup>N</sup> and <sup>c</sup>[M] = <sup>x</sup> if <sup>M</sup> <sup>x</sup>. This yields a homomorphism h: ΛG,<sup>1</sup> Σ (Γ)/<sup>≡</sup> <sup>→</sup> <sup>Σ</sup>∞(Γ) and we can define [[−]]<sup>1</sup> <sup>=</sup> <sup>h</sup> ◦ [−]. The rest follows from uniqueness of h.

#### **5.2 Interpretation of Basic Intuitionistic First-Order Formulae**

In this section, we give an interpretation of the formulae in Definition 3, in which we restrict ourselves to guarded terms. This interpretation will be relative to models in the complete Herbrand universe. Since we later extend these models to Kripke models to be able to handle the later modality, we formulate these models already now in the language of fibrations [17,46].

**Definition 23.** Let <sup>p</sup>: **<sup>E</sup>** <sup>→</sup> **<sup>B</sup>** be a functor. Given an object <sup>I</sup> <sup>∈</sup> **<sup>B</sup>**, the *fibre* **<sup>E</sup>**I above <sup>I</sup> is the category of objects <sup>A</sup> <sup>∈</sup> **<sup>E</sup>** with <sup>p</sup>(A) = <sup>I</sup> and morphisms <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> with <sup>p</sup>(f) = idI . The functor <sup>p</sup> is a *(split) fibration* if for every morphism <sup>u</sup>: <sup>I</sup> <sup>→</sup> <sup>J</sup> in **<sup>B</sup>** there is functor <sup>u</sup><sup>∗</sup> : **<sup>E</sup>**J <sup>→</sup> **<sup>E</sup>**I , such that id<sup>∗</sup> <sup>I</sup> = Id**E***<sup>I</sup>* and (<sup>v</sup> ◦ <sup>u</sup>)<sup>∗</sup> <sup>=</sup> <sup>u</sup><sup>∗</sup> ◦ <sup>v</sup>∗. We call <sup>u</sup><sup>∗</sup> the *reindexing along* <sup>u</sup>.

To give an interpretation of formulae, consider the following category **Pred**.

$$\mathbf{Pred} = \begin{cases} \text{objects}: & (X, P) \text{ with } X \in \mathbf{Set} \text{ and } P \subseteq X \\ \text{ morphisms}: f: (X, P) \to (Y, Q) \text{ is a map } f: X \to Y \text{ with } f(P) \subseteq Q \end{cases}$$

The functor <sup>P</sup>: **Pred** <sup>→</sup> **Set** with <sup>P</sup>(X, P) = <sup>X</sup> and <sup>P</sup>(f) = <sup>f</sup> is a split fibration, see [46], where the reindexing functor for f : X → Y is given by taking preimages: <sup>f</sup> <sup>∗</sup>(Q) = <sup>f</sup> <sup>−</sup><sup>1</sup>(Q). Note that each fibre **Pred**X is isomorphic to the complete lattice of predicates over X ordered by set inclusion. Thus, we refer to this fibration as the *predicate fibration*.

Let us now expose the logical structure of the predicate fibration. This will allow us to conveniently interpret first-order formulae over this fibration, but it comes at the cost of having to introduce a good amount of category theoretical language. However, doing so will pay off in Sect. 5.4, where we will construct another fibration out of the predicate fibration. We can then use category theoretical results to show that this new fibration admits the same logical structure and allows the interpretation of the later modality.

The first notion we need is that of fibred products, coproducts and exponents, which will allow us to interpret conjunction, disjunction and implication.

**Definition 24.** A fibration <sup>p</sup>: **<sup>E</sup>** <sup>→</sup> **<sup>B</sup>** has *fibred finite products* (**1**, <sup>×</sup>), if each fibre **<sup>E</sup>**I has finite products (**1**I , <sup>×</sup>I ) and these are preserved by reindexing: for all <sup>f</sup> : <sup>I</sup> <sup>→</sup> <sup>J</sup>, we have <sup>f</sup> <sup>∗</sup>(**1**J ) = **<sup>1</sup>**I and <sup>f</sup> <sup>∗</sup>(<sup>A</sup> <sup>×</sup>J <sup>B</sup>) = <sup>f</sup> <sup>∗</sup>(A) <sup>×</sup>I <sup>f</sup> <sup>∗</sup>(B). Fibred finite coproducts and exponents are defined analogously.

The fibration P is a so-called first-order fibration, which allows us to interpret first-order logic, see [46, Def. 4.2.1].

# **Definition 25.** A fibration <sup>p</sup>: **<sup>E</sup>** <sup>→</sup> **<sup>B</sup>** is a *first-order fibration* if<sup>2</sup>


A *first-order* λ*-fibration* is a first-order fibration with Cartesian closed base **B**.

<sup>2</sup> Technically, the quantifiers should also fulfil the Beck-Chevalley and Frobenius conditions, and the fibration should admit equality. Since these are fulfilled in all our models and we do not need equality, we will not discuss them here.


The fibration <sup>P</sup>: **Pred** <sup>→</sup> **Set** is a first-order <sup>λ</sup>-fibration, as all its fibres are posets and **Set** is Cartesian closed; <sup>P</sup> has fibred finite products (,∩), given by X <sup>=</sup> <sup>X</sup> and intersection; fibred distributive coproducts (∅,∪); fibred exponents <sup>⇒</sup>, given by (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>) = {#—<sup>t</sup> <sup>|</sup> if #—<sup>t</sup> <sup>∈</sup> <sup>P</sup>, then #—<sup>t</sup> <sup>∈</sup> <sup>Q</sup>}; and universal and existential quantifiers given for <sup>P</sup> <sup>∈</sup> **Pred**X×Y by

$$\forall X, Y \, P = \{ x \in X \mid \forall y \in Y. (x, y) \in P \} \quad \exists\_{X, Y} P = \{ x \in X \mid \exists y \in Y. (x, y) \in P \}.$$

The purpose of first-order fibrations is to capture the essentials of first-order logic, while the λ-part takes care of higher-order features of the term language. In the following, we interpret types, contexts, guarded terms and formulae in the fibration <sup>P</sup>: **Pred** <sup>→</sup> **Set**: We define for types <sup>τ</sup> and context <sup>Γ</sup> sets τ and -Γ; for guarded terms M with Γ M : τ we define a map -M : -Γ → τ in **Set**; and for a formula Γ ϕ we give a predicate ϕ ∈ **Pred**-Γ.

The semantics of types and contexts are given inductively in the Cartesian closed category **Set**, where the base type ι is interpreted as coterms, as follows.

$$\begin{aligned} \left[\iota\right] &= \Sigma^{\infty} & \quad \left[\varnothing\right] &= \mathbf{1} \\ \left[\tau \to \sigma\right] &= \left[\sigma\right]^{\left[r\right]} & \quad \left[\varGamma, x:\tau\right] &= \left[\varGamma\right] \times \left[\tau\right] \end{aligned}$$

We note that a coterm t ∈ Σ∞(V ) can be seen as a map (Σ∞)<sup>V</sup> → Σ<sup>∞</sup> by applying a substitution in (Σ∞)<sup>V</sup> to t: σ → t[σ]. In particular, the semantics of a guarded first-order term <sup>M</sup> <sup>∈</sup> <sup>Λ</sup>G,<sup>1</sup> Σ (Γ) is equivalently a map [[M]]<sup>1</sup> : -Γ → Σ∞. We can now extend this map inductively to -M : -Γ → τ for all guarded terms M ∈ Λ<sup>G</sup> Σ(Γ) with <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> by

$$\begin{aligned} [M](\gamma)(\overrightarrow{t}) &= [M \; \overrightarrow{x}] \, \_1(\left[\overrightarrow{x} \mapsto \overrightarrow{t}\right]) \qquad \vdash\_g M : \tau \text{ with } \text{ar}(\tau) = \left|\overrightarrow{t}\right| = \left|\overrightarrow{x}\right| \\ [c](\gamma)(\overrightarrow{t}) &= c \; \overrightarrow{t} \\ [x](\gamma) &= \gamma(x) \\ [M \; N](\gamma) &= [M](\gamma)(\left[N\right](\gamma)) \\ (\lambda x.M](\gamma)(t) &= [M](\gamma[x \mapsto t]) \end{aligned}$$

**Lemma 26.** *The mapping* -<sup>−</sup> *is a well-defined function from guarded terms to functions, such that* <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> *implies* -M : -Γ → τ *.*

Since <sup>P</sup>: **Pred** <sup>→</sup> **Set** is a first-order fibration, we can interpret inductively all logical connectives of the formulae from Definition 3 in this fibration. The only case that is missing is the base case of predicate symbols. Their interpretation will be given over a Herbrand model that is constructed as the largest fixed point of an operator over all predicate interpretations in the Herbrand base. Both the operator and the fixed point are the subjects of the following definition.

**Definition 27.** We let the set of *interpretations* <sup>I</sup> be the powerset <sup>P</sup>(B<sup>∞</sup>) of the complete Herbrand base. For I ∈ I and p ∈ Π, we denote by I| p the interpretation of p in I (the fibre of I above p)

$$\left. I \right|\_p = \left\{ \begin{array}{l} \overrightarrow{t} \in \left( \Sigma^{\infty} \right)^{\text{ar}(p)} \, \big|\, p(\overrightarrow{t}) \in I \right\}. \end{array}$$

Given a set <sup>P</sup> of <sup>H</sup>g-formulae, we define a monotone map <sup>Φ</sup>P : I→I by

$$\Phi\_P(I) = \{ [\psi]\_1[\theta] \mid (\forall \overline{x}. \bigwedge\_{k=1}^n \varphi\_k \to \psi) \in P, \theta \colon |\overrightarrow{x}| \to \Sigma^\infty, \forall k. \{\varphi\_k\}\_1[\theta] \in I\},$$

where [[−]]1[θ] is the extension of semantics and substitution from coterms to the Herbrand base by functoriality of -<sup>Π</sup>. The *(complete) Herbrand model* <sup>M</sup>P of <sup>P</sup> is the largest fixed point of <sup>Φ</sup>P , which exists because <sup>I</sup> is a complete lattice.

Given a formula ϕ with Γ ϕ that contains only guarded terms, we define the semantics of ϕ in **Pred** from an interpretation I ∈ I inductively as follows.

$$\begin{aligned} \text{Lemma 8.7 } \varphi &\text{ in } \Gamma \text{ and an } \text{hom} \,\varphi \subset \varphi \text{ in } \text{hom} \,\varphi \text{ and} \\ \|\Gamma \Vdash p \,\overrightarrow{M}\|\_{I} &= \left(\overrightarrow{\|M\|}\right)^{\*} (I|\_{p}) \\ \|\Gamma \Vdash \top\|\_{I} &= \top\_{\|I\|} \\ \|\Gamma \Vdash \varphi \square \psi\|\_{I} &= \|\Gamma \Vdash \varphi\|\_{I} \sqcap \|\Gamma \Vdash \psi\|\_{I} \\ \|\Gamma \Vdash Qx : \tau.\varphi\|\_{I} &= Q\_{\|\Gamma\}, \|\tau\| \; \|\Gamma, x : \tau \Vdash \varphi\|\_{I} \qquad \begin{array}{l} \Box \in \{\land, \lor, \rightarrow\} \\ Q \in \{\forall, \exists\} \end{array} \end{aligned}$$

**Lemma 28.** *The mapping* -<sup>−</sup>I *is a well-defined function from formulae to predicates, such that* Γ ϕ *implies* ϕI <sup>⊆</sup> -Γ *or, equivalently,* ϕI <sup>∈</sup> **Pred**-Γ*.*

This concludes the semantics of types, terms and formulae. We now turn to show that coinductive uniform proofs are sound for this interpretation.

#### **5.3 Soundness of Coinductive Uniform Proofs for Herbrand Models**

In this section, we give a direct proof of soundness for the coinductive uniform proof system from Sect. 3. Later, we will obtain another soundness result by combining the proof translation from Theorem 18 with the soundness of **iFOL**- (Theorems 39 and 42). The purpose of giving a direct soundness proof for uniform proofs is that it allows the extraction of a coinduction invariant, see Lemma 32.

The main idea is as follows. Given a formula ϕ and a uniform proof π for Σ; P ϕ, we construct an interpretation I ∈ I that validates ϕ, i.e. ϕI <sup>=</sup> , and that is contained in the complete Herbrand model <sup>M</sup>P . Combining these two facts, we obtain that <sup>ϕ</sup><sup>M</sup>*<sup>P</sup>* <sup>=</sup> , and thus the soundness of uniform proofs.

To show that the constructed interpretation <sup>I</sup> is contained in <sup>M</sup>P , we use the usual coinduction proof principle, as it is given in the following definition.

**Definition 29.** An *invariant for* <sup>K</sup> ∈ I is a set <sup>I</sup> ∈ I, such that <sup>K</sup> <sup>⊆</sup> <sup>I</sup> and <sup>I</sup> is a <sup>Φ</sup>P -invariant, that is, <sup>I</sup> <sup>⊆</sup> <sup>Φ</sup>P (I). If <sup>K</sup> has an invariant, then <sup>K</sup> ⊆ MP .

Thus, our goal is now to construct an interpretation together with an invariant. This invariant will essentially collect and iterate all the substitutions that appear in a proof. For this we need the ability to compose substitutions of coterms, which we derive from the monad [5] (Σ∞, η, μ) with μ: Σ∞Σ<sup>∞</sup> ⇒ Σ∞.

**Definition 30.** A *(Kleisli-)substitution* θ from V to W, written θ : V W, is map V → Σ∞(W). Composition of θ : V W and δ : U V is given by

$$
\theta \odot \delta = U \xrightarrow{\delta} \Sigma^{\infty}(V) \xrightarrow{\Sigma^{\infty}(\theta)} \Sigma^{\infty}(\Sigma^{\infty}(W)) \xrightarrow{\mu\_W} \Sigma^{\infty}(W).
$$

The notions in the following definition will allow us to easily organise and iterate the substitutions that occur in a uniform proof.

**Definition 31.** Let <sup>S</sup> be a set with <sup>S</sup> <sup>=</sup> {1,...,n} for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>. We call the set S<sup>∗</sup> of lists over S the set of *substitution identifiers*. Suppose that we have substitutions <sup>θ</sup><sup>0</sup> : <sup>V</sup> <sup>∅</sup> and <sup>θ</sup>k : <sup>V</sup> <sup>V</sup> for each <sup>k</sup> <sup>∈</sup> <sup>S</sup>. Then we can define a map Θ: S<sup>∗</sup> → (Σ∞) V , which turns each substitution identifier into a substitution, by iteration from the right:

$$
\Theta(\varepsilon) = \theta\_0 \quad \text{and} \quad \Theta(w:k) = \Theta(w) \odot \theta\_k
$$

After introducing these notations, we can give the outline of the soundness proof for uniform proofs relative to the complete Herbrand model. Given an <sup>H</sup>g-formula <sup>∀</sup>#—x.ϕ, we note that a uniform proof <sup>π</sup> for <sup>Σ</sup>; <sup>P</sup> <sup>∀</sup>#—x.ϕ starts with

$$\begin{array}{cc} \overrightarrow{c}: \iota, \Sigma; P; \Delta \Longrightarrow \langle \varphi[\overrightarrow{c}/\overrightarrow{x}] \rangle & \overrightarrow{c}: \iota \notin \Sigma\\ \hline \Sigma; P; \forall \overrightarrow{x}. \varphi \Longrightarrow \langle \forall \overrightarrow{x}. \varphi \rangle\\ \hline \Sigma; P \leftrightarrow \forall \overrightarrow{x}. \varphi & \text{Co-FIX} \end{array} \forall R \langle \rangle$$

where the eigenvariables in #—<sup>c</sup> are all distinct. Let <sup>Σ</sup><sup>c</sup> be the signature #—<sup>c</sup> : ι, Σ and <sup>C</sup> the set of variables in #—<sup>c</sup> . Suppose the following is a valid subtree of <sup>π</sup>.

$$\frac{\Sigma^c; P; \Delta \stackrel{\varphi[\vec{N}/\vec{x}]}{\longrightarrow} A}{\frac{\Sigma^c; P; \Delta \stackrel{\forall \vec{x}, \varphi \in \Delta}{} A}{\Sigma^c; P; \Delta \Longrightarrow A} \text{DECIDE}} \forall L$$

This proof tree gives rise to a substitution δ : C C by δ(c) = -<sup>N</sup>c, which we call an *agent* of <sup>π</sup>. We let <sup>D</sup> <sup>⊆</sup> At<sup>g</sup> <sup>1</sup> be the set of atoms that are proven in π:

$$D = \{ A \mid \Sigma^c; P; \Delta \Longrightarrow \langle A \rangle \text{ or } \Sigma^c; P; \Delta \Longrightarrow A \text{ appears in } \pi \} $$

From the agents and atoms in π we extract an invariant for the goal formula.

**Lemma 32.** *Suppose that* <sup>ϕ</sup> *is an* <sup>H</sup>g*-formula of the form* <sup>∀</sup>#—x.A<sup>1</sup> ∧···∧ <sup>A</sup>n <sup>→</sup> A<sup>0</sup> *and that there is a proof* π *for* Σ; P ϕ*. Let* D *be the proven atoms in* π *and* <sup>θ</sup>0,...,θs *be the agents of* <sup>π</sup>*. Define* <sup>A</sup><sup>c</sup> k <sup>=</sup> <sup>A</sup>k[ #—c /#—x] *and suppose further that* <sup>I</sup><sup>1</sup> *is an invariant for* {A<sup>c</sup> k[Θ(ε)] <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>n</sup>}*. If we put*

$$I\_2 = \bigcup\_{w \in S^\*} D\left[\Theta\left(w\right)\right]$$

*then* <sup>I</sup><sup>1</sup> <sup>∪</sup> <sup>I</sup><sup>2</sup> *is an invariant for* <sup>A</sup><sup>c</sup> <sup>0</sup>[Θ(ε)]*.*

Once we have Lemma 32 the following soundness theorem is easily proven.

**Theorem 33.** *If* ϕ *is an* Hg*-formula and* Σ; P ϕ*, then* <sup>ϕ</sup><sup>M</sup>*<sup>P</sup>* <sup>=</sup> *.*

Finally, we show that extending logic programs with coinductively proven lemmas is sound. This follows easily by coinduction.

**Theorem 34.** *Let* <sup>ϕ</sup> *be an* <sup>H</sup>g*-formula of the shape* <sup>∀</sup>#—x.ψ<sup>1</sup> <sup>→</sup> <sup>ψ</sup>2*, such that, for all substitutions* <sup>θ</sup> *if* [[ψ1]]1[θ] ∈ MP,ϕ*, then* [[ψ1]]1[θ] ∈ MP *. Then* <sup>Σ</sup>; <sup>P</sup> <sup>ϕ</sup> *implies* <sup>M</sup>P ∪{ϕ} <sup>=</sup> <sup>M</sup><sup>P</sup> *, that is,* <sup>P</sup> ∪ {ϕ} *is a conservative extension of* <sup>P</sup> *with respect to the Herbrand model.*

As a corollary we obtain that, if there is a proof for Σ; P ϕ, then a proof for <sup>Σ</sup>; P, ϕ <sup>ψ</sup> is sound with respect to <sup>M</sup>P . Indeed, by Theorem <sup>34</sup> we have that <sup>M</sup>P <sup>=</sup> <sup>M</sup>P <sup>∪</sup>ϕ and by Theorem <sup>33</sup> that <sup>Σ</sup>; P, ϕ <sup>ψ</sup> is sound with respect to <sup>M</sup>P ∪{ϕ}. Thus, the proof of <sup>Σ</sup>; P, ϕ <sup>ψ</sup> is also sound with respect to <sup>M</sup><sup>P</sup> . We use this property implicitly in our running examples, and refer the reader to [15,49] for proofs, further examples and discussion.

#### **5.4 Soundness of iFOLover Herbrand Models**

In this section, we demonstrate how the logic **iFOL** can be interpreted over Herbrand models. Recall that we obtained a fixed point model from the monotone map <sup>Φ</sup>P on interpretations. In what follows, it is crucial that we construct the greatest fixed point of <sup>Φ</sup>P by iteration, c.f. [6,32,77]: Let **Ord** be the class of all ordinals equipped with their (well-founded) order. We denote by **Ord**op the class of ordinals with their reversed order and define a monotone function ←− <sup>Φ</sup>P : **Ord**op → I, where we write the argument ordinal in the subscript, by

$$\left(\overleftarrow{\Phi\_P}\right)\_\alpha = \bigcap\_{\beta < \alpha} \Phi\_P \left(\overleftarrow{\Phi\_P}\right).$$

Note that this definition is well-defined because < is well-founded and because <sup>Φ</sup>P is monotone, see [14]. Since <sup>I</sup> is a complete lattice, there is an ordinal <sup>α</sup> such that ←− <sup>Φ</sup>P α <sup>=</sup> <sup>Φ</sup>P ←− <sup>Φ</sup>P α , at which point ←− <sup>Φ</sup>P α is the largest fixed point <sup>M</sup>P of <sup>Φ</sup>P . In what follows, we will utilise this construction to give semantics to **iFOL**-.

The fibration <sup>P</sup>: **Pred** <sup>→</sup> **Set** gives rise to another fibration as follows. We let **Pred** be the category of functors (monotone maps) with fixed predicate domain:

$$\overline{\mathbf{Pred}} = \begin{cases} \text{objects} & u \colon \mathbf{Prof}^{\text{op}} \to \mathbf{Pred}, \text{such that } \mathbb{P} \circ u \text{ is constant} \\ \text{omorphisms:} & u \to v \text{ are natural transformations } f \colon u \Rightarrow v, \\ & \text{such that } \mathbb{P}f \colon \mathbb{P} \circ u \Rightarrow \mathbb{P} \circ v \text{ is the identity} \end{cases}$$

The fibration <sup>P</sup>: **Pred** <sup>→</sup> **Set** is defined by evaluation at any ordinal (here 0), i.e. by <sup>P</sup>(u) = <sup>P</sup>(u(0)) and <sup>P</sup>(f)=(Pf)0, and reindexing along <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> by applying the reindexing of <sup>P</sup> point-wise, i.e. by <sup>f</sup>#(u)α <sup>=</sup> <sup>f</sup> <sup>∗</sup>(uα).

Note that there is a (full) embedding K : **Pred** → **Pred** that is given by <sup>K</sup>(X, P)=(X, <sup>P</sup>) with <sup>P</sup> α <sup>=</sup> <sup>P</sup>. One can show [14] that <sup>P</sup> is again a first-order fibration and that it models the later modality, as in the following theorem.

**Theorem 35.** *The fibration* P *is a first-order fibration. If necessary, we denote the first-order connectives by* ˙ *,* <sup>∧</sup>˙ *etc. to distinguish them from those in* **Pred***. Otherwise, we drop the dots. Finite (co)products and quantifiers are given pointwise, while for* <sup>X</sup> <sup>∈</sup> **Set** *and* u, v <sup>∈</sup> **Pred**X *exponents are given by*

$$(v \Rightarrow u)\_{\alpha} = \bigcap\_{\beta \le \alpha} (v\_{\beta} \Rightarrow u\_{\beta}).$$

*There is a fibred functor* : **Pred** <sup>→</sup> **Pred** *with* <sup>π</sup> ◦ <sup>=</sup> <sup>π</sup> *given on objects by*

$$(\blacklozenge u)\_{\alpha} = \bigcap\_{\beta < \alpha} u\_{\beta}$$

*and a natural transformation* next: Id <sup>⇒</sup> *from the identity functor to . The functor preserves reindexing, products, exponents and universal quantification:* (f#u) = <sup>f</sup>#( <sup>u</sup>)*,* (u∧v) = <sup>u</sup>∧ <sup>v</sup>*,* (uv) <sup>→</sup> ( <sup>u</sup>) <sup>v</sup>*,* <sup>∀</sup>n<sup>u</sup> <sup>=</sup> <sup>∀</sup>n( <sup>u</sup>)*. Finally, for all* <sup>X</sup> <sup>∈</sup> **Set** *and* <sup>u</sup> <sup>∈</sup> **Pred**X*, there is* l¨ob: ( <sup>u</sup> <sup>⇒</sup>˙ <sup>u</sup>) <sup>→</sup> <sup>u</sup> *in* **Pred**X*.*

Using the above theorem, we can extend the interpretation of formulae to **iFOL** as follows. Let <sup>u</sup>: **Ord**op → I be a descending sequence of interpretations. As before, we define the restriction of u to a predicate symbol p ∈ Π by u| p α <sup>=</sup> <sup>u</sup>α<sup>|</sup> p <sup>=</sup> #—t p #—t <sup>∈</sup> <sup>u</sup>α . The semantics of formulae in **iFOL** as objects in **Pred** is given by the following iterative definition.

$$\begin{aligned} \left[\boldsymbol{I}\top\vdash p\,\overrightarrow{M}\right]\_{u} &= \left(\overrightarrow{\left[M\right]}\right)^{\#}(\boldsymbol{u}|\_{p}) \\ \left[\boldsymbol{I}\dashv\vdash\!\right]\_{u} &= \dot{\top}\_{\left[\boldsymbol{I}\right]} \\ \left[\boldsymbol{I}\dashv\right]\mathbin{\begin{subarray}{c}\boldsymbol{\varPi}\vdash\varphi\square\boldsymbol{\psi}\right\rbrack\_{u} = \left[\boldsymbol{I}\dashv\right]\mathbin{\begin{subarray}{c}\boldsymbol{\varPi}\vdash\varphi\square\boldsymbol{\upPi}\end{subarray}}} \square\left[\boldsymbol{I}\dashv\right]\_{u} & \left[\boldsymbol{I}\dashv\right]\boldsymbol{\upleft}\_{u} \\ \left[\boldsymbol{I}\dashv\right]\mathbin{\begin{subarray}{c}\boldsymbol{\varPi}\vdash\varphi\square\boldsymbol{\upPi}\end{subarray}}\left[\boldsymbol{I}\dashv\right]\left[\boldsymbol{I}\dashv\right]\boldsymbol{\upleft}\_{u} & \left[\boldsymbol{I}\dashv\right]\boldsymbol{\upleft}\_{u} \\ \left[\boldsymbol{I}\dashv\right]\vdash\varphi\square\boldsymbol{\upleft}\_{u} & \left[\boldsymbol{I}\dashv\right]\boldsymbol{\upleft}\_{u} \end{aligned}\right]\_{u} & \qquad \boldsymbol{\upleft}\in\left\{\boldsymbol{\upleft},\boldsymbol{\upleft},\boldsymbol{\upleft}\right]\right\}\boldsymbol{\upleft}\_{u} \\ \left[\boldsymbol{I}\dashv\right]\vdash\varphi\right]\_{u} & \mapsto \left[\boldsymbol{I}\dashv\right]\boldsymbol{\upleft}\_{u} \end{aligned}$$

The following lemma is the analogue of Lemma 28 for the interpretation of formulae without the later modality.

**Lemma 36.** *The mapping* -<sup>−</sup>u *is a well-defined map from formulae in* **iFOL**- *to sequences of predicates, such that* Γ ϕ *implies* ϕu <sup>∈</sup> **Pred**-Γ*.*

**Lemma 37.** *All rules of* **iFOL** *are sound with respect to the interpretation* -<sup>−</sup>u *of formulae in* **Pred***, that is, if* <sup>Γ</sup> <sup>|</sup> <sup>Δ</sup> <sup>ϕ</sup>*, then* ψ∈Δψu <sup>⇒</sup>˙ ϕu <sup>=</sup> ˙ *. In particular,* <sup>Γ</sup> <sup>ϕ</sup> *implies* ϕu <sup>=</sup> ˙ *.*

The following lemma shows that the guarding of a set of formulae is valid in the chain model that they generate.

**Lemma 38.** *If* ϕ *is an* H*-formula in* P*, then* ϕ←−−<sup>=</sup> ˙ *.*

Φ*<sup>P</sup>* Combining this with soundness from Lemma 37, we obtain that provability in **iFOL**relative to a logic program P is sound for the model of P.

**Theorem 39.** *For all logic programs* <sup>P</sup>*, if* <sup>Γ</sup> <sup>|</sup> <sup>P</sup> <sup>ϕ</sup> *then* ϕ←−−Φ*<sup>P</sup>* <sup>=</sup> ˙ *.*

The final result of this section is to show that the descending chain model, which we used to interpret formulae of **iFOL**-, is sound and complete for the fixed point model, which we used to interpret the formulae of coinductive uniform proofs. This will be proved in Theorem 42 below. The easiest way to prove this result is by establishing a functor **Pred** <sup>→</sup> **Pred** that maps the chain ←− <sup>Φ</sup>P to the model <sup>M</sup>P , and that preserves and reflects truth of first-order formulae (Proposition 41). We will phrase the preservation of truth of first-order formulae by a functor by appealing to the following notion of fibrations maps, cf. [46, Def. 4.3.1].

**Definition 40.** Let <sup>p</sup>: **<sup>E</sup>** <sup>→</sup> **<sup>B</sup>** and <sup>q</sup> : **<sup>D</sup>** <sup>→</sup> **<sup>A</sup>** be fibrations. A *fibration map* p → q is a pair (F : **E** → **D**, G: **B** → **A**) of functors, s.t. q ◦ F = G ◦ p and F preserves Cartesian morphisms: if f : X → Y in **E** is Cartesian over p(f), then F(f) is Cartesian over G(p(f)). (F, G) is a map of *first-order (* λ*-)fibrations*, if p and q are first-order (λ-)fibrations, and F and G preserve this structure.

Let us now construct a first-order λ-fibration map **Pred** → **Pred**. We note that since every fibre of the predicate fibration is a complete lattice, for every chain <sup>u</sup> <sup>∈</sup> **Pred**X there exists an ordinal <sup>α</sup> at which <sup>u</sup> stabilises. This means that there is a limit lim <sup>u</sup> of <sup>u</sup> in **Pred**X, which is the largest subset of <sup>X</sup>, such that <sup>∀</sup>α. lim <sup>u</sup> <sup>⊆</sup> <sup>u</sup>α. This allows us to define a map <sup>L</sup>: **Pred** <sup>→</sup> **Pred** by

$$\begin{aligned} L(X, u) &= (X, \lim u) \\ L(f \colon (X, u) \to (Y, v)) &= f. \end{aligned}$$

In the following proposition, we show that L gives us the ability to express first-order properties of limits equivalently through their approximating chains. This, in turn, provides soundness and completeness for the interpretation of the logic **iFOL**over descending chains with respect to the largest Herbrand model.

**Proposition 41.** <sup>L</sup>: **Pred** <sup>→</sup> **Pred***, as defined above, is a map of first-order fibrations. Furthermore,* <sup>L</sup> *is right-adjoint to the embedding* <sup>K</sup> : **Pred** <sup>→</sup> **Pred***. Finally, for each* <sup>p</sup> <sup>∈</sup> <sup>Π</sup> *and* <sup>u</sup> <sup>∈</sup> **Pred**B<sup>∞</sup>*, we have* <sup>L</sup> u| p = L(u)| p*.*

We get from Proposition 41 soundness and completeness of ←− <sup>Φ</sup>P for Herbrand models. More precisely, if ϕ is a formula of plain first-order logic (-free), then its interpretation in the coinductive Herbrand model is true if and only if its interpretation over the chain approximation of the Herbrand model is true.

**Theorem 42.** *If* ϕ *is -free (Definition 3) then* ϕ←−−Φ*<sup>P</sup>* <sup>=</sup> ˙ *if and only if* <sup>ϕ</sup><sup>M</sup>*<sup>P</sup>* <sup>=</sup> *.*

*Proof (sketch).* First, one shows for all -free formulae ϕ that L(ϕ←−−Φ*<sup>P</sup>* ) = <sup>ϕ</sup><sup>M</sup>*<sup>P</sup>* by induction on <sup>ϕ</sup> and using Proposition 41. Using this identity and K L, the result is then obtained from the following adjoint correspondence.

$$\frac{\dot{\top} = K(\top) \longrightarrow \lceil \varphi \rceil\_{\widehat{\Phi\_P}}}{\top \longrightarrow L(\lceil \varphi \rceil\_{\widehat{\Phi\_P}}) = \lceil \varphi \rceil\_{\mathcal{M}\_P}} \quad \text{in } \overline{\operatorname{Pred}}$$

# **6 Conclusion, Related Work and the Future**

In this paper, we provided a comprehensive theory of resolution in coinductive Horn-clause theories and coinductive logic programs. This theory comprises of a uniform proof system that features a form of guarded recursion and that provides operational semantics for proofs of coinductive predicates. Further, we showed how to translate proofs in this system into proofs for an extension of intuitionistic FOL with guarded recursion, and we provided sound semantics for both proof systems in terms of coinductive Herbrand models. The Herbrand models and semantics were thereby presented in a modern style that utilises coalgebras and fibrations to provide a conceptual view on the semantics.

*Related Work.* It may be surprising that automated *proof search for coinductive predicates* in first-order logic does not have a coherent and comprehensive theory, even after three decades [3,60], despite all the attention that it received as programming [2,29,42,44] and proof [33,35,39,40,45,59,64–67] method. The work that comes close to algorithmic proof search is the system CIRC [63], but it cannot handle general coinductive predicates and corecursive programming. Inductive and coinductive data types are also being added to SMT solvers [24,62]. However, both CIRC and SMT solving are inherently based on classical logic and are therefore not suited to situations where proof objects are relevant, like programming, type class inference or (dependent) type theory. Moreover, the proposed solutions, just like those in [41,69] can only deal with regular data, while our approach also works for irregular data, as we saw in the **from**-example.

This paper subsumes Haskell type class inference [37,51] and exposes that the inference presented in those papers corresponds to coinductive proofs in *co-fohc* and *co-hohh*. Given that the proof systems proposed in this paper are constructive and that uniform proofs provide proofs (type inhabitants) in normal form, we could give a propositions-as-types interpretation to all eight coinductive uniform proof systems. This was done for *co-fohc* and *co-hohh* in [37], but we leave the remaining cube from the introduction for future work.

*Future Work.* There are several directions that we wish to pursue in the future. First, we know that CUP is incomplete for the presented models, as it is intuitionistic and it lacks an admissible cut rule. The first can be solved by moving to Kripke/Beth-models, as done by Clouston and Gor´e [30] for the propositional part of **iFOL**-. However, the admissible cut rule is more delicate. To obtain such a rule one has to be able to prove several propositions simultaneously by coinduction, as discussed at the end of Sect. 4. In general, completeness of recursive proof systems depends largely on the theory they are applied to, see [70] and [18]. However, techniques from cyclic proof systems [27,68] may help. We also aim to extend our ideas to other situations like higher-order Horn clauses [28,43] and interactive proof assistants [7,10,23,31], typed logic programming, and logic programming that mix inductive and coinductive predicates.

**Acknowledgements.** We would like to thank Damien Pous and the anonymous reviewers for their valuable feedback.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Accattoli, Beniamino 410 Ahman, Danel 30 Alvarez-Picallo, Mario 525 Ariola, Zena M. 119 Balzer, Stephanie 611 Basold, Henning 783 Besson, Frédéric 499 Bi, Xuan 381 Blazy, Sandrine 499 Bocchi, Laura 583 Boutillier, Pierre 176 Buro, Samuele 293 Castellan, Simon 322 Chopra, Nikita 697 Cristescu, Ioana 176 D'Souza, Deepak 697 Dal Lago, Ugo 263 Dang, Alexandre 499 Downen, Paul 119 Dumitrescu, Victor 30 Eyers-Taylor, Alex 525 Feret, Jérôme 176 Fisher, Kathleen 205 Frumin, Dan 60 Fuhs, Carsten 752 Garg, Deepak 469 Gavazzo, Francesco 263 Giannarakis, Nick 30 Giarrusso, Paolo G. 553 Gilbert, Frederic 440 Gondelman, Léon 60 Gordon, Colin S. 88 Guerrieri, Giulio 410

Hawblitzel, Chris 30 Höfner, Peter 668 Hriţcu, Cătălin 30 Igarashi, Atsushi 353 Jensen, Thomas 499 Jourdan, Jacques-Henri 3 Journault, Matthieu 724 Komendantskaya, Ekaterina 783 Kop, Cynthia 752 Krebbers, Robbert 60 Kuru, Ismail 88 Leberle, Maico 410 Li, Yue 783 Markl, Michael 668 Martínez, Guido 30 Mastroeni, Isabella 293 McDermott, Dylan 235 Mével, Glen 3 Miné, Antoine 724 Murgia, Maurizio 583 Mycroft, Alan 235 Narasimhamurthy, Monal 30 Oliveira, Bruno C. d. S. 381 Ong, C.-H. Luke 525 Orchard, Dominic 147 Ouadjaout, Abdelraouf 724 Pai, Rekha 697 Paquet, Hugo 322 Paraskevopoulou, Zoe 30 Patrignani, Marco 469

Peyton Jones, Michael 525

Peyton Jones, Simon 119 Pfenning, Frank 611 Pit-Claudel, Clément 30 Pottier, François 3 Protzenko, Jonathan 30 Ramananandro, Tahina 30 Rastogi, Aseem 30 Régis-Gianas, Yann 553 Sakayori, Ken 640 Schrijvers, Tom 381 Schuster, Philipp 553 Sekiyama, Taro 353

Sullivan, Zachary 119 Swamy, Nikhil 30

Toninho, Bernardo 611 Tsukada, Takeshi 640

van Glabbeek, Rob 668 Vasconcelos, Vasco Thudichum 583 Vesely, Ferdinand 205

Wang, Meng 147 Wilke, Pierre 499

Xia, Li-yao 147 Xie, Ningning 381

Yoshida, Nobuko 583