# **Programming Languages and Systems**

**33rd European Symposium on Programming, ESOP 2024 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024 Luxembourg City, Luxembourg, April 6–11, 2024 Proceedings, Part II**

# Lecture Notes in Computer Science 14577

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

# Editorial Board Members

Elisa Bertino, USA Wen Gao, China

Bernhard Steffen , Germany Moti Yung , USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at https://link.springer.com/bookseries/558

Stephanie Weirich Editor

# Programming Languages and Systems

33rd European Symposium on Programming, ESOP 2024 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024 Luxembourg City, Luxembourg, April 6–11, 2024 Proceedings, Part II

Editor Stephanie Weirich University of Pennsylvania Philadelphia, PA, USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-57266-1 ISBN 978-3-031-57267-8 (eBook) https://doi.org/10.1007/978-3-031-57267-8

© The Editor(s) (if applicable) and The Author(s) 2024. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.

# ETAPS Foreword

Welcome to the 27th ETAPS! ETAPS 2024 took place in Luxembourg City, the beautiful capital of Luxembourg.

ETAPS 2024 is the 27th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming languages, analysis tools, and formal approaches to software engineering. Organising these conferences in a coherent, highly synchronized conference programme enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe.

ETAPS 2024 received 352 submissions in total, 117 of which were accepted, yielding an overall acceptance rate of 33%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2024 featured the unifying invited speakers Sandrine Blazy (University of Rennes, France) and Lars Birkedal (Aarhus University, Denmark), and the invited speakers Ruzica Piskac (Yale University, USA) for TACAS and Jérôme Leroux (Laboratoire Bordelais de Recherche en Informatique, France) for FoSSaCS. Invited tutorials were provided by Tamar Sharon (Radboud University, the Netherlands) on computer ethics and David Monniaux (Verimag, France) on abstract interpretation.

As part of the programme we had the first ETAPS industry day. The goal of this day was to bring industrial practitioners into the heart of the research community and to catalyze the interaction between industry and academia. The day was organized by Nikolai Kosmatov (Thales Research and Technology, France) and Andrzej Wąsowski (IT University of Copenhagen, Denmark).

ETAPS 2024 was organized by the SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg. The University of Luxembourg was founded in 2003. The university is one of the best and most international young universities with 6,000 students from 130 countries and 1,500 academics from all over the globe. The local organisation team consisted of Peter Y.A. Ryan (general chair), Peter B. Roenne (organisation chair), Maxime Cordy and Renzo Gaston Degiovanni (workshop chairs), Magali Martin and Isana Nascimento (event manager), Marjan Skrobot (publicity chair), and Afonso Arriaga (local proceedings chair). This team also organised the online edition of ETAPS 2021, and now we are happy that they agreed to also organise a physical edition of ETAPS.

ETAPS 2024 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology).

The ETAPS Steering Committee consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Marieke Huisman (Twente, chair), Andrzej Wąsowski (Copenhagen), Thomas Noll (Aachen), Jan Kofroň (Prague), Barbara König (Duisburg), Arnd Hartmanns (Twente), Caterina Urban (Inria), Jan Křetínský (Munich), Elizabeth Polgreen (Edinburgh), and Lenore Zuck (Chicago).

Other members of the steering committee are: Maurice ter Beek (Pisa), Dirk Beyer (Munich), Artur Boronat (Leicester), Luís Caires (Lisboa), Ana Cavalcanti (York), Ferruccio Damiani (Torino), Bernd Finkbeiner (Saarland), Gordon Fraser (Passau), Arie Gurfinkel (Waterloo), Reiner Hähnle (Darmstadt), Reiko Heckel (Leicester), Marijn Heule (Pittsburgh), Joost-Pieter Katoen (Aachen and Twente), Delia Kesner (Paris), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovács (Vienna), Mark Lawford (Hamilton), Tiziana Margaria (Limerick), Claudio Menghi (Hamilton and Bergamo), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Peter Y.A. Ryan (Luxembourg), Don Sannella (Edinburgh), Viktor Vafeiadis (Kaiserslautern), Stephanie Weirich (Pennsylvania), Anton Wijs (Eindhoven), and James Worrell (Oxford).

I would like to take this opportunity to thank all authors, keynote speakers, attendees, organizers of the satellite workshops, and Springer Nature for their support. ETAPS 2024 was also generously supported by a RESCOM grant from the Luxembourg National Research Foundation (project 18015543). I hope you all enjoyed ETAPS 2024.

Finally, a big thanks to both Peters, Magali and Isana and their local organization team for all their enormous efforts to make ETAPS a fantastic event.

April 2024 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

# Preface

These proceedings volumes contain papers that were presented at the 33rd European Symposium on Programming (ESOP 2024), held during April 6–11 in Luxembourg City, Luxembourg, along with associated artifact reports. ESOP is part of the European Joint Conferences on Theory and Practice of Software (ETAPS) and promotes the specification, design, analysis and implementation of programming languages and systems.

In total, these two volumes include 25 research papers, one "fresh perspective" and four "artifact reports". The latter two paper categories are new to ESOP. In addition to standard research papers, the ESOP 2024 call-for-papers included the new submission categories: "fresh perspectives" that provide new insights in a particularly elegant way and "experience reports" that describe tools and systems used in practice. Furthermore, authors of accepted papers were allowed to submit short "artifact reports", to appear together with their research papers, that describe associated software, tools, data sets, or machine checked proofs to substantiate the claims made in their papers.

The papers in this volume were selected from 66 papers submitted in the research paper category and 6 papers submitted in the "fresh perspectives" category. There were no submissions for "experience reports". While papers in these new categories had strict formatting requirements, ESOP 2024 allowed research papers to be submitted in any format, of any length, under the advisement that the final paper should be formatted to fit this volume. Fourteen submissions took advantage of this flexibility.

Each submitted paper received at least three reviews by the members of the ESOP program committee. The median PC member was assigned eight papers to review over the seven week review period. In some cases, PC members solicited additional reviews to aid in the decision making process. In total, 39 external reviewers added their insight to the paper selection process. ESOP employed full double-blind review and author identities were only revealed to reviewers on paper acceptance. Authors were also given a chance to respond to their reviews, before the program was selected through a two week online, asynchronous PC meeting, facilitated by the EasyChair system. The program chair had no conflicts with any submitted paper.

ESOP 2024 also employed an artifact evaluation process. Nineteen of the 26 accepted papers elected to make their artifacts available on the archive sites Zenodo and figshare. The committee awarded the badge "Functional" to five of these and the badges "Functional and reusable" to the remaining fourteen. Four accepted papers in this volume are accompanied by artifact reports. These reports were all accepted following a light review by both the program committee and the ESOP/FASE/FoSSaCS joint artifact evaluation committee.

Indeed, my sincere thanks go to all who worked together to produce this event and its proceedings. Foremost, to the authors, who provided the technical content of the meeting. Also to the program committee, artifact evaluation committee, and external reviewers, who provided their well-reasoned and detailed judgments, sometimes on short notice. Tobias Kappé as the representative for ESOP among the artifact evaluation committee co-chairs, deserves particular thanks. I also would like to thank the ETAPS steering committee and its chair Marieke Huisman, the Proceedings coordinator Barbara König and the local proceedings chair Afonso Delerue Arriaga, and webmaster Jan Kofroň for their assistance in fitting ESOP together with the entire ETAPS meeting. Finally, thanks are due to the members of the ESOP steering committee. In particular, Luis Caires, as chair of the SC, was a constant source of support, encouragement, information and guidance.

April 2024 Stephanie Weirich ESOP PC Chair

# Organization

# ESOP Steering Committee


# Program Chair


# Program Committee

Ana Bove Chalmers University of Technology, Sweden Loris D'Antoni University of Wisconsin-Madison, USA Ugo Dal Lago Università di Bologna and Inria Sophia Antipolis, Italy Ornela Dardha University of Glasgow, UK Mike Dodds University of York, UK Sophia Drossopoulou Imperial College London, UK Robby Findler Northwestern University, USA Amir Goharshady Hong Kong University of Science and Technology, China Andrew D. Gordon Microsoft Research and University of Edinburgh, UK Alexey Gotsman IMDEA Software Institute, Spain Limin Jia Carnegie Mellon University, USA Josh Ko Institute of Information Science, Academia Sinica, Taiwan András Kovács Eötvös Loránd University, Hungary Kazutaka Matsuda Tohoku University, Japan Anders Miltner Simon Fraser University, Canada Santosh Nagarakatte Rutgers University, USA Dominic Orchard University of Kent, UK Frank Pfenning Carnegie Mellon University, USA Clément Pit-Claudel EPFL, Switzerland François Pottier Inria, France Matija Pretnar University of Ljubljana, Slovenia Azalea Raad Imperial College London, UK


# ESOP/FASE/FoSSaCS Joint Artifact Evaluation Committee

# AEC Co-chairs

### AEC Members


Tobias Kappé Open Universiteit and ILLC, University of Amsterdam, The Netherlands Ryosuke Sato University of Tokyo, Japan Stefan Winter LMU Munich, Germany

Levente Bajczi Budapest University of Technology and Economics, Hungary James Baxter University of York, UK Matthew Alan Le Brun University of Glasgow, UK Laura Bussi University of Pisa, Italy Gustavo Carvalho Universidade Federal de Pernambuco, Brazil Chanhee Cho Carnegie Mellon University, USA Ryan Doenges Northeastern University, USA Zainab Fatmi University of Oxford, UK Luke Geeson University College London, UK Hans-Dieter Hiep Leiden University, Belgium Philipp Joram Tallinn University of Technology, Estonia Ulf Kargén Linköping University, Sweden Hiroyuki Katsura University of Tokyo, Japan Calvin Santiago Lee Reykjavík University, Iceland Livia Lestingi Politecnico di Milano, Italy Nuno Macedo University of Porto and INESC TEC, Portugal Kristóf Marussy Budapest University of Technology and Economics, Hungary Ivan Nikitin University of Glasgow, UK Hugo Pacheco University of Porto, Portugal Lucas Sakizloglou Brandenburgische Technische Universität Cottbus-Senftenberg, Germany Michael Schröder TU Wien, Austria Michael Schwarz TU Munich, Germany Wenjia Ye University of Hong Kong, China

# Additional Reviewers

Thorsten Altenkirch Carlo Angiuli Martin Avanzini Aurèle Barrière Clément Blaudeau Timothy Bourke Marco Carbone Tej Chajed John Cyphert Francesco Dagnino Hoang-Hai Dang Jana Dunfield Peter Dybjer Oskar Eriksson Simon Fowler Jose Fragoso Santos Lorenzo Gheri Raymond Hu Patrik Jansson Delia Kesner

Jinwoo Kim Robbert Krebbers Ivan Lanese Sam Lindley Peter Ljunglöf Kenji Maillard Daniel Marshall Stephen Mell Yasuhiko Minamide Hugo Moeneclaey Alexandre Moine Charlie Murphy Shaan Nagy Ulf Norell Mário Pereira Alejandro Russo Bernardo Toninho Paulo Torrens Ruifeng Xie

# Contents – Part II



Program Analysis

xiv Contents – Part II


# Contents – Part I

#### Effects and Modal Types


#### Dependent Types



# **Quantum Programming/Domain-Specific Languages**

# Circuit Width Estimation via Efect Typing and Linear Dependency

and Ugo Dal Lago1,<sup>2</sup> Andrea Colledan1,2(B)

> <sup>1</sup> University of Bologna, Bologna, Italy 2 INRIA Sophia Antipolis, Valbonne, France {andrea.colledan,ugo.dallago}@unibo.it

Abstract. Circuit description languages are a class of quantum programming languages in which programs are classical and produce a description of a quantum computation, in the form of a quantum circuit. Since these programs can leverage all the expressive power of high-level classical languages, circuit description languages have been successfully used to describe complex and practical quantum algorithms, whose circuits, however, may involve many more qubits and gate applications than current quantum architectures can actually muster. In this paper, we present Proto-Quipper-R, a circuit description language endowed with a linear dependent type-and-efect system capable of deriving parametric upper bounds on the width of the circuits produced by a program. We prove both the standard type safety results and that the resulting resource analysis is correct with respect to a big-step operational semantics. We also show that our approach is expressive enough to verify realistic quantum algorithms.

Keywords: Efect Typing · Lambda Calculus · Quantum Computing · Quipper

# 1 Introduction

With the promise of providing efcient algorithmic solutions to many problems [11,27,31], some of which are traditionally believed to be intractable [54], quantum computing is the subject of intense investigation by various research communities within computer science, not least that of programming language theory [24,43,51]. Various proposals for idioms capable of tapping into this new computing paradigm have appeared in the literature since the late 1990s. Some of these approaches turn out to be fundamentally new [1,49,52], while many others are strongly inspired by classical languages and traditional programming paradigms [44,48,53,63].

One of the major obstacles to the practical adoption of quantum algorithmic solutions is the fact that despite huge eforts by scientists and engineers alike, it seems that reliable quantum hardware, contrary to classical one, does not scale too easily: although quantum architectures with up to a couple hundred qubits have recently seen the light [9,10,38], it is not yet clear whether the so-called quantum advantage [45] is a concrete possibility, given the tremendous challenges posed by the quantum decoherence problem [50].

#### 4 A. Colledan, U. Dal Lago

This entails that software which makes use of quantum hardware must be designed with great care: whenever part of a computation has to be run on quantum hardware, the amount of resources it needs, and in particular the amount of qubits it uses, should be kept to a minimum. More generally, a fne control over the low-level aspects of the computation, something that we willingly abstract from when dealing with most forms of classical computation, should be exposed to the programmer in the quantum case. This, in turn, has led to the development and adoption of many domain-specifc programming languages and libraries in which the programmer explicitly manipulates qubits and quantum circuits, while still making use of all the features of a high-level classical programming language. This is the case of the Qiskit and Cirq libraries [17], but also of the Quipper language [25,26].

At the fundamental level, Quipper is a circuit description language embedded in Haskell. Because of this, Quipper inherits all the expressiveness of the high level, higher-order functional programming language that is its host, but for the same reason it also lacks a formal semantics. Nonetheless, over the past few years, a number of calculi, collectively known as the Proto-Quipper language family, have been developed to formalize interesting fragments and extensions of Quipper in a type-safe manner [46,48]. Extensions include, among others, dynamic lifting [8,21,35] and dependent types [20,22], but resource analysis is still a rather unexplored research direction in the Proto-Quipper community [56].

The goal of this work is to show that type systems indeed enable the possibility of reasoning about the size of the circuits produced by a Proto-Quipper program. Specifcally, we show how linear dependent types in the style of Gaboardi and Dal Lago [12,14,15,23] can be adapted to Proto-Quipper, allowing to derive upper bounds on circuit widths that are parametric on the number of input wires to the circuit, be they classical or quantum. This enables a form of static analysis of the resource consumption of circuit families and, consequently, of the quantum algorithms described in the language. Technically, a key ingredient of this analysis, besides linear dependency, is a novel form of efect typing in which the quantitative information coming from linear dependency informs the efect system and allows it to keep circuit widths under control.

The rest of the paper is organized as follows. Section 2 informally explores the problem of estimating the width of circuits produced by Quipper, while also introducing the language. Section 3 provides a more formal defnition of the Proto-Quipper language. In particular, it gives an overview of the system of simple types due to Rios and Selinger [46], which however is not meant to reason about the size of circuits. We then move on to the most important technical contribution of this work, namely the linear dependent and efectful type system, which is introduced in Section 4 and proven to guarantee both type safety and a form of total correctness in Section 5. Section 6 is dedicated to an example of a practical application of our type and efect system, that is, a program that builds the Quantum Fourier Transform (QFT) circuit [11,39] and which is verifed to do so without any ancillary qubits.

To conclude this introduction, we wish to emphasize that while it is true that quantum computing can be a difcult and intimidating subject, the class of languages analyzed in this work focuses on circuit construction, which is an entirely classical process, paying little to no concern to the actual quantum semantics of circuit execution. Because of this, and due to space constraints, we refrain from providing a general introduction to quantum computing in this paper. Instead, we refer the interested reader to the excellent works of Nielsen and Chuang [39], Yanofsky and Mannucci [60], and Mingsheng [61].

# 2 An Overview on Circuit Width Estimation

Quipper allows programmers to describe quantum circuits in a high-level and elegant way, using both gate-by-gate and circuit transformation approaches. Quipper also supports hierarchical and parametric circuits, thus promoting a view in which circuits become frst-class citizens. Quipper has been shown to be scalable, in the sense that it has been efectively used to describe complex quantum algorithms that easily translate to circuits involving trillions of gates applied to millions of qubits. The language allows the programmer to optimize the circuit, e.g. by using ancilla qubits for the sake of reducing the circuit depth, or recycling qubits that are no longer needed.

One feature that Quipper lacks is a methodology for statically proving that important parameters — such as the the width — of the underlying circuit are below certain limits, which of course would need to be parametric on the input size of the circuit. If this kind of analysis were available, then it would be possible to derive bounds on the number of qubits needed to solve any instance of a problem, and ultimately to know in advance how big of an instance can be possibly solved given a fxed amount of qubits.

In order to illustrate the kind of scenario we are reasoning about, this section ofers some simple examples of Quipper programs, showing in what sense we can think of capturing the quantitative information that we are interested in through types and efect systems and linear dependency. We proceed at a very high level for now, without any ambition of formality.

Let us start with the example of Figure 1. The Quipper function on the left builds the structure on the right, which we call a quantum circuit. For the purposes of this work, it sufces to say that horizontal lines represent qubits, while other symbols represent elementary operations applied to them, e.g. initializations, gate applications, and so on. Time fows from left to right. The specifc circuit in Figure 1 consists in an (admittedly contrived) implementation of the quantum not operation. The dumbNot function implements negation using a controlled not gate and an ancillary qubit a, which is initialized and discarded within the body of the function. This qubit does not appear in the interface of the circuit, but it clearly adds to its overall width, which is 2.

Consider now the higher-order function in Figure 2. This function takes as input a circuit building function f, an integer n and describes the circuit obtained by applying f's circuit n times to the input qubit q. It is easy to see that the 6 A. Colledan, U. Dal Lago

Fig. 1. A contrived implementation of the quantum not operation using an ancilla

width of the circuit produced in output by iter dumbNot n is equal to 2, even though, overall, the number of qubits initialized during the computation is equal to n. The point is that each ancilla is created only after the previous one has been discarded, thus enabling a form of qubit recycling.

Fig. 2. A higher-order function which iterates a circuit-building function f on an input qubit q and the result of its application to the dumbNot function from Figure 1

Is it possible to statically analyze the width of the circuit produced in output by iter dumbNot n so as to conclude that it is constant and equal to 2? What techniques can we use? Certainly, the presence of higher order types complicates an already non-trivial problem. The approach we propose in this paper is based on two ingredients. The frst is the so-called efect typing [40]. In this context the efect produced by the program is nothing more than the circuit and therefore it is natural to think of an efect system in which the width of such circuit, and only that, is exposed. Therefore, the arrow type A → B should be decorated with an expression indicating the width of the circuit produced by the corresponding function when applied to an argument of type A. Of course, the width of an individual circuit is a natural number, so it would make sense to annotate the arrow with such a number. For technical reasons, however, it will also be necessary to keep track of another natural number, corresponding to the number of wire resources that the function captures from the surrounding environment. This necessity stems from a need to keep track of wires even in the presence of data hiding, and will be explained in further detail in Section 4.

Under these premises, the dumbNot function would receive type Qubit →2,<sup>0</sup> Qubit, meaning that it takes as input a qubit and produces a circuit of width 2 which outputs a qubit. Note that the second annotation is 0, since we do not capture anything from the function's environment, let alone a wire. Consequently,

because iter iterates in sequence and because the ancillary qubit in dumbNot can be reused, the type of iter dumbNot n would also be Qubit →2,<sup>0</sup> Qubit.

$$\begin{array}{ll} \mathsf{hadamardN} :: \mathsf{l} \quad \mathsf{[qubit} \quad \mathsf{\{qubit\}} \quad \mathsf{\{qubit\}} \quad & q \xleftarrow{\mathsf{\{\{\{\{\{\begin{l\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{\{$$

Fig. 3. The hadamardN function implements a circuit family where circuits have width linear in their input size.

Let us now consider a slightly diferent situation, in which the width of the produced circuit is not constant, but rather increases proportionally to the circuit's input size. Figure 3 shows a Quipper function that returns a circuit on n qubits in which the Hadamard gate is applied to each qubit, a common preprocessing step in many quantum algorithms. It is obvious that this function works on inputs of arbitrary size, and therefore we can interpret it as a circuit family, parametrized on the length of the input list of qubits. This quantity, although certainly a natural number, is unknown statically and corresponds precisely to the width of the produced circuit. It is thus natural to wonder whether the kind of efect typing we briefy hinted at in the previous paragraph is capable of dealing with such a function. Certainly, the expressions used to annotate arrows cannot be, like in the previous case, mere constants, as they clearly depend on the size of the input list. Is there a way to refect this dependency in types? Certainly, one could go towards a fully-fedged notion of dependent types, like the ones proposed in [22], but a simpler approach, in the style of Dal Lago and Gaboardi's linear dependent types [12,14,15,23] turns out to be enough for this purpose. This is precisely the route that we follow in this paper. In this approach, terms can indeed appear in types, but that is only true for a very restricted class of terms, disjoint from the ordinary ones, called index terms. As an example, the type of the function hadamardN above could become List<sup>i</sup> Qubit →i,<sup>0</sup> List<sup>i</sup> Qubit, where i is an index variable. The meaning of the type would thus be that hadamardN takes as input any list of qubits of length i and produces a circuit of width at most i which outputs i qubits. Indices are better explained in Section 4, but in general we can say that they consist of arithmetical expressions over natural numbers and index variables, and can thus express non-trivial dependencies between input sizes and corresponding circuit widths.

#### 8 A. Colledan, U. Dal Lago

# 3 The Proto-Quipper Language

This section aims at introducing the Proto-Quipper family of calculi to the nonspecialist, without any form of resource analysis. At its core, Proto-Quipper is a linear lambda calculus with bespoke constructs to build and manipulate circuits. Circuits are built as a side-efect of a computation, behind the scenes, but they can also appear and be manipulated as data in the language.


Fig. 4. The Proto-Quipper types

The types of Proto-Quipper are given in Figure 4. Speaking at a high level, we can say that Proto-Quipper employs a linear-nonlinear typing discipline. In particular, w ∈ {Bit, Qubit} is a wire type and is linear, while ⊸ is the linear arrow constructor. A subset of types, called parameter types, represent the values of the language that are not linear and that can therefore be copied. Any term of type A can be lifted into a duplicable parameter of type ! A if its type derivation does not require the use of linear resources.


Fig. 5. The Proto-Quipper syntax

The syntax of Proto-Quipper is given in Figure 5. At a very high level, we are dealing with an efectful lambda calculus with bespoke constructs for manipulating circuits. A return expression turns a value into a trivial computation, while a let expression is used to sequence computations. Note that let is associative and that return acts as its identity. Now, let us informally dissect the domain-specifc aspects of this language, starting with the language of values. The constructs of greatest interest are labels and boxed circuits. A label ℓ represents a reference to a free wire of the underlying circuit being built and is attributed a wire type w ∈ {Bit, Qubit}. Due to the no-cloning property of quantum states [39], labels have to be treated linearly. Arbitrary structures of labels form a subset of values which we call wire bundles and which are given bundle types. On the other hand, a boxed circuit ( ¯ℓ, C, ¯k) represents a circuit object C as a datum within the language, together with its input and output interfaces ¯ℓ and ¯k. Such a value is given parameter type Circ(T, U), where bundle types T and U are the input and output types of the circuit, respectively. Boxed circuits can be copied, manipulated by primitive functions and, more importantly, applied to the underlying circuit. This last operation, which lies at the core of Proto-Quipper's circuit-building capabilities, is possible thanks to the apply operator. This operator takes as frst argument a boxed circuit ( ¯ℓ, C, ¯k) and appends C to the underlying circuit D. How does apply know where exactly in D to apply C? Thanks to a second argument: a bundle of wires t¯ coming from the free output wires of D, which identify the exact location where C is supposed to be attached.

The language is expected to be endowed with constant boxed circuits corresponding to fundamental gates and operations (e.g. Hadamard, CNOT, initialization, etc.), but the programmer can also introduce their own boxed circuits via the box operator. Intuitively, box takes as input a circuit-building function and executes it in a sandboxed environment, on dummy arguments, in a way that leaves the underlying circuit unchanged. Said function produces a standalone circuit C, which is then returned by the box operator as a boxed circuit ( ¯ℓ, C, ¯k).

Figure 6 shows the Proto-Quipper term corresponding to the Quipper program in Figure 1, as an example of the use of the language. Note that let ⟨x, y⟩ = M in N is syntactic sugar for let z = M in let ⟨x, y⟩ = z in N. The dumbNot function is given type Qubit ⊸ Qubit and builds the circuit shown in Figure 1 when applied to an argument.

```
dumbNot ≜ λqQubit. let a = apply(INIT1, ∗) in
                    let ⟨q, a⟩ = apply(CNOT,⟨q, a⟩) in
                    let _ = apply(DISCARD, a) in
                    return q
```
Fig. 6. An example Proto-Quipper program. INIT1, CNOT and DISCARD are primitive boxed circuits implementing the corresponding elementary operations.

On the classical side of things, it is worth mentioning that Proto-Quipper as presented in this section does not support general recursion. A limited form of recursion on lists is instead provided via a primitive fold constructor, which takes as argument a (copiable) step function of type !((B⊗A) ⊸ B), an initial value of type B, and constructs a function of type List A ⊸ B. Although this workaround is not enough to recover the full power of general recursion, it appears that it is enough to describe many quantum algorithms. Figure 7 shows an example of the use of fold to reverse a list. Note that λ⟨x, y⟩A⊗B.M is syntactic sugar for λzA⊗B.let ⟨x, y⟩ = z in M.

rev ≜ fold lift(λ⟨revList, q⟩List Qubit⊗Qubit.return (cons q revList)) nil

Fig. 7. An example of the use of fold: the function that reverses a list

To conclude this section, we just remark how all of the Quipper programs shown in Section 2 can be encoded in Proto-Quipper. However, Proto-Quipper's system of simple types in unable to tell us anything about the resource consumption of these programs. Of course, one could run hadamardN on a concrete input and examine the size of the circuit produced at run-time, but this amounts to testing, not verifying the program, and lacks the qualities of staticity and parametricity that we seek.

# 4 Incepting Linear Dependency and Efect Typing

We are now ready to expand on the informal defnition of the Proto-Quipper language given in Section 3, to reach a formal defnition of Proto-Quipper-R: a linearly and dependently typed language whose type system supports the derivation of upper bounds on the width of the circuits produced by programs.

#### 4.1 Types and Syntax of Proto-Quipper-R


Fig. 8. Proto-Quipper-R syntax and types

The types and syntax of Proto-Quipper-R are given in Figure 8. As we mentioned, one of the key ingredients of our type system are the index terms with which we annotate standard Proto-Quipper types. These indices provide quantitative information about the elements of the resulting types, in a manner reminiscent of refnement types [18,47]. In our case, we are primarily concerned with

circuit width, which means that the natural starting point of our extension of Proto-Quipper is precisely the circuit type: Circ<sup>I</sup> (T, U) has elements the boxed circuits of input type T, output type U, and width bounded by I. Term I is precisely what we call an index, that is, an arithmetical expression denoting a natural number. Looking at the grammar for indices, their interpretation is fairly straightforward, with a few notes: n is a natural number, i is an index variable, I − J denotes natural subtraction, such that I − J = 0 whenever I ≤ J, and lastly maxi<I J is the maximum for i going from 0 (included) to I (excluded) of J, where i can occur in J. Note that I = 0 implies maxi<I J = 0. While the index in a circuit type denotes an upper bound, the index in a type of the form List<sup>I</sup> A denotes the exact length of the lists of that type. While this refnement might seem quite restrictive in a generic scenario, it allows us to include lists of labels among wire bundles, something that was not possible with simple lists. This is due to the fact that sized lists are efectively isomorphic to fnite tensors, and therefore a sized list of labels represent a wire bundle of known size, whereas the same is not true for a simple list of labels. Lastly, as we anticipated in Section 2, an arrow type A ⊸I,J B is annotated with two indices: I is an upper bound to the width of the circuit built by the function once it is applied to an argument of type A, while J describes the exact number of wires captured in the function's closure. The utility of this last annotation will be clearer in Section 4.3.

The languages for terms and values are almost the same as in Proto-Quipper, with the minor diference that the fold operator now binds the index variable name i within its frst argument. This variable appears locally in the type of the step function, in such a way as to allow each iteration of the fold to contribute to the overall circuit width in a diferent way.

#### 4.2 A Formal Language for Circuits

The type system of Proto-Quipper-R is designed to allow for reasoning about the width of circuits. Therefore, before we formally introduce the type system in Section 4.3, we ought to introduce circuits themselves in a formal way. So far, we have only spoken of circuits at a very high and intuitive level, and we have represented them only graphically. Looking at the circuits in Section 2, what do they have in common? At the fundamental level, they are made up of elementary operations applied to specifc wires. Of course, the order of these operations matters, as does the order of wires that they are applied to. In the existing literature on Proto-Quipper, circuits are usually interpreted as morphisms in a symmetric monoidal category [46], but this approach makes it particularly hard to reason about their intensional properties, such as width. For this reason, we opt for a concrete model of wires and circuits, rather than an abstract one.

Luckily, we already have a datatype modeling ordered structures of wires, that is, the wire bundles that we introduced in the previous sections. We use them as the basis upon which we build circuits.

That being said, Figure 9 introduces the Circuit Representation Language (CRL) which we use as the target for circuit building in Proto-Quipper-R. Wire bundles are exactly as in Figure 8 and represent arbitrary structures of wires,


Fig. 9. CRL syntax and types

while circuits themselves are defned very simply as sequences of elementary operations applied to said structures. We call Q a label context and defne it as a mapping from label names to wire types. We use label contexts as a means to keep track of the set of labels available in a computation, alongside their respective types. Circuit id<sup>Q</sup> represents the identity circuit taking as input the labels in Q and returning them unchanged, while C; g( ¯ℓ) → ¯k represents the application of the elementary operation g to the wires identifed by ¯ℓ among the outputs of C. Operation g outputs the wire bundle ¯k, whose labels become part of the outputs of the overall circuit. Note that an "elementary operation" is usually the application of a gate, but it could also be a measurement, or the initialization or discarding of a wire. Although semantically very diferent, from the perspective of circuit building these operations are just elementary building blocks in the construction of a more complex structure, and it makes no sense to distinguish between them syntactically. Circuits are amenable to a form of concatenation. We write the concatenation of C and D as C :: D and defne it in the natural way, that is, as C followed by all the operations occurring in D.

Circuit Typing Naturally, not all circuits built from the CRL syntax make sense. For example id(ℓ:Qubit) ; H(k) → k and id(ℓ:Qubit) ; CNOT(⟨ℓ, ℓ⟩) → ⟨k, t⟩ are both syntactically correct, but the frst applies a gate to a non-existing wire, while the second violates the no-cloning theorem by duplicating ℓ. To rule out such ill-formed circuits, we employ a rudimentary type system for circuits which allows us to derive judgments of the form C : Q → L, which informally read "circuit C is well-typed with input label context Q and output label context L".

The typing rules for CRL are given in Figure 10. We call Q ⊢<sup>w</sup> ¯ℓ : T a wire judgment, and we use it to give a structured type to an otherwise unordered label context, by means of a wire bundle. Most rules are straightforward, except those for lists, which rely on a judgment of the form ⊨ I = J. This is to be intended as a semantic judgment asserting that I and J are closed and equal when interpreted as natural numbers. Within the rule, this refects the idea that there are many ways to syntactically represent the length of a list. For example, nil can be given type List<sup>0</sup> T, but also List<sup>1</sup>−<sup>1</sup> T or List<sup>0</sup>×<sup>5</sup> T. This kind of fexibility might seem unwarranted for such a simple language, but it is useful to efectively interface CRL and the more complex Proto-Quipper-R. Speaking of the actual circuit judgments, the seq rule tells us that the the application of an elementary operation g is well typed whenever g only acts on labels occurring in the outputs of C (those in ¯ℓ, that is in H), produces in output labels that do not clash with the remaining outputs of C (since L, K denotes the disjoint Circuit Width Estimation via Effect Typing and Linear Dependency 13

$$\begin{array}{ll} unit \frac{\operatorname{unit}}{\operatorname{\bf{@}} \vdash\_{w} \star: 1} & \operatorname{lab} \frac{\operatorname{lab} \vdash\_{w} \ell: w}{\ell: w \vdash\_{w} \ell: w} & nil \frac{\operatorname{l} \vdash I = 0}{\operatorname{\bf{@}} \vdash\_{w} \operatorname{nil}: \operatorname{lst}^{I} \operatorname{T}}\\\\ pair \frac{Q\_{1} \vdash\_{w} \bar{\ell}: T \qquad Q\_{2} \vdash\_{w} \bar{k}: U}{Q\_{1}, Q\_{2} \vdash\_{w} \langle \bar{\ell}, \bar{k} \rangle: T \otimes U} \\\\cons \frac{Q\_{1} \vdash\_{w} \bar{\ell}: T \qquad Q\_{2} \vdash\_{w} \bar{k}: \operatorname{lst}^{I} \operatorname{T}}{Q\_{1}, Q\_{2} \vdash\_{w} \operatorname{cons} \, \bar{\ell} \,\bar{k}: \operatorname{lst}^{I} \operatorname{T}} & \vdash I = J + 1} \\\\ id \frac{\mathcal{C}: Q \to L, H \begin{array}{ll} \mathcal{C}: Q \to L, H \begin{array}{l} \mathcal{C}: T \ \operatorname{H} \vdash\_{w} \bar{\ell}: T \qquad K \vdash\_{w} \bar{k}: U \end{array} \end{array} \begin{array}{l} g \in \mathcal{G}(T, U) \end{array}}{\operatorname{ $\mathcal{C}$ } \begin{array}{l} \mathcal{C}: \operatorname{U} \end{array}} \end{array}$$

Fig. 10. The CRL type system

union of the two label contexts) and is of the right type. This last requirement is expressed as g ∈ G (T, U), where G (T, U) is the subset of elementary operations that can be applied to an input of type T to obtain an output of type U. For example, the Hadamard gate, which acts on a single qubit, is in G (Qubit, Qubit).

Circuit Width Among the many properties of circuits, we are interested in width, so we conclude this section by giving a formal status to this quantity.

Defnition 1 (Circuit Width). We defne the width of a CRL circuit C, written width(C), as follows

$$\text{width}(id\_Q) = |Q|,\tag{1}$$

$$\text{width}(\mathcal{C}; g(\bar{\ell}) \to \bar{k}) = \text{width}(\mathcal{C}) + \max(0, \text{new}(g) - \text{discrated}(\mathcal{C})),\tag{2}$$

where |Q| is the number of labels in Q, new(g) represents the net number of new wires initialized by g, and discarded(C) is the number of wires that have been efectively discarded by the end of C, obtained as the diference between C's width and the number of its outputs. Note that one expects new(g) to be equal to the diference between the number of labels in ¯k and those in ¯ℓ. The overarching idea behind this defnition is that whenever we require new wires in our computation, we frst try to reuse as many previously discarded wires as possible. As long as we can do this (new(g) ≤ discarded(C)), the initializations do not add to the total width of the circuit. Otherwise (new(g) > discarded(C)) we must actually create new wires, increasing the overall width of the circuit.

Now that we have a formal defnition of circuit types and width, we can state a fundamental property of the concatenation of well-typed circuits, which is illustrated in Figure 11 and proven in Theorem 1. We use this result pervasively in proving the correctness of Proto-Quipper-R in section 5.

Theorem 1 (CRL). Given C : Q → L, H and D : H → K such that the labels shared by C and D are all and only those in H, we have

1. C :: D : Q → L, K,

14 A. Colledan, U. Dal Lago

2. width(C :: D) ≤ max(width(C), width(D) + |L|).

Proof. By induction of the derivation of D : H → K.

Fig. 11. The kind of scenario described by Theorem 1

#### 4.3 Typing Programs

Going back to Proto-Quipper-R, we have already seen how the standard Proto-Quipper types are refned with quantitative information. However, decorating types is not enough for the purposes of width estimation. Recall that, in general, a Proto-Quipper program produces a circuit as a side efect of its evaluation. If we want to reason about the width of said circuit, it is not enough to rely on a regular linear type system, although dependent. Rather, we have to introduce the second ingredient of our analysis and turn to a type-and-efect system [40], revolving around a type judgment of the form

$$
\Theta; I; Q \vdash\_c M : A; I, \tag{3}
$$

which intuitively reads "for all natural values of the index variables in Θ, under typing context Γ and label context Q, term M has type A and produces a circuit of width at most I". Therefore, Θ is a collection of index variables which are universally quantifed in the rest of the judgment, while Γ is a typing context for parameter and linear variables alike. When a typing context contains exclusively parameter variables, we write it as Φ. In this judgment, I plays the role of an efect annotation, describing a relevant aspect of the side efect produced by the evaluation of M (i.e. the width of the produced circuit). The attentive reader might wonder why this annotation consists only of one index, whereas when we discussed arrow types in previous sections we needed two. The reason is that the second index, which we use to keep track of the number of wires captured by a function, is redundant in a typing judgment where the same quantity can be inferred directly from the environments Γ and Q. A similar typing judgment of the form Θ; Γ; Q ⊢<sup>v</sup> V : A is introduced for values, which are efect-less.

The rules for deriving typing judgments are those in Figure 12, where Γ1, Γ<sup>2</sup> denotes the union of two contexts with disjoint domains. A well-formedness judgment of the form Θ ⊢ I means that all the free index variables occurring in I are in Θ. Well-formedness is lifted to types and typing contexts in the


Fig. 12. Proto-Quipper-R type system

natural way. Among interesting typing rules, we can see how the circ rule bridges between CRL and Proto-Quipper-R. A boxed circuit ( ¯ℓ, C, ¯k) is well typed with type Circ<sup>I</sup> (T, U) when C is no wider than the quantity denoted by I, C : Q → L and ¯ℓ, ¯k contain all and only the labels in Q and L, respectively, acting as a language-level interface to C.

The two main constructs that interact with circuits are apply and box. The apply rule is the foremost place where efects enter the type derivation: V represents some boxed circuit of width at most I, so its application to an appropriate wire bundle W produces exactly a circuit of width at most I. The box rule, on the other hand, works approximately in the opposite direction. If V is a circuit building function that, once applied to an input of type T, would build a circuit of output type U and width at most I, then boxing it means turning it into a boxed circuit with the same characteristics. Note that the box rule requires that the typing context be devoid of linear variables. This refects the idea that V is meant to be executed in complete isolation, to build a standalone, replicable circuit, and therefore it should not capture any linear resource (e.g. a label) from the surrounding environment.

Wire Count Notice that many rules rely on an operator written #(·), which we call the wire count operator. Intuitively, this operator returns the number of wire resources (in our case, bits or qubits) represented by a type or context. To understand how this is important, consider the return rule. The return operator turns a value V into a trivial computation that evaluates immediately to V , and therefore it would be tempting to give it an efect annotation of 0. However, V is not necessarily a closed value. In fact, it might very well contain many bits and qubits, coming both from the typing context Γ and the label context Q. Although nothing happens to these bits and qubits, they still corresponds to wires in the underlying circuit, and these wires have a width which must be accounted for in the judgment for the otherwise trivial computation. The return rule therefore produces an efect annotation of the form #(Γ; Q), which is shorthand for #(Γ)+#(Q) and corresponds exactly to this quantity. A formal defnition of the wire count operator on types is given in the following defnition, which is lifted to contexts in the natural way.

Defnition 2 (Wire Count). We defne the wire count of a type A, written #(A), as a function #(·) : TYPE → INDEX such that

$$\#(\mathbb{1}) = \#(!A) = \#(\text{Circ}^I(T, U)) = 0, \qquad \#(w) = 1,$$

#(A ⊗ B) = #(A) + #(B), #(A ⊸I,J B) = J, #(List<sup>I</sup> A) = I × #(A).

This defnition is fairly straightforward, except for the arrow case. By itself, an arrow type does not give us any information about the amount of qubits or bits captured in the corresponding closure. This is precisely where the second index J, which keeps track exactly of this quantity, comes into play. This annotation is introduced by the abs rule and allows our analysis to circumvent data hiding.

The let rule is another rule in which wire counts are essential. The two terms M and N in let x = M in N build the circuits C<sup>M</sup> and C<sup>N</sup> , whose widths are bounded by I and J, respectively. Once again, it might be tempting to conclude that the overall circuit built by the let construct has width bounded by max(I, J), but this fails to take into account the fact that while M is building C<sup>M</sup> starting from the wires contained in Γ<sup>1</sup> and Q1, we must keep aside the wires contained in Γ<sup>2</sup> and Q2, which will be used by N to build C<sup>N</sup> . These wires must fow alongside C<sup>M</sup> and their width, i.e. #(Γ2; Q2), adds up to the total width of the left-hand side of the let construct, leading to an overall width upper bound of max(I + #(Γ2; Q2), J). This situation is better illustrated in Figure 13.

Fig. 13. The shape of a circuit built by a let construct

The last rule that makes substantial use of wire counts is fold, arguably the most complex rule of the system. The main ingredient of the fold rule is the bound index variable i, which occurs in the accumulator type B and is used to keep track of the number of steps performed by the fold. Let (·){I/i} denote the capture-avoiding substitution of the index term I for the index variable i inside an index, type, context, value or term, not unlike (·)[V /x] denotes the capture-avoiding substitution of the value V for the variable x. Intuitively, if the accumulator has initially type B{0/i} and each application of the step function increases i by one, then when we fold over a list of length I we get an output of type B{I/i}. Index E is the upper bound to the width of the overall circuit built by the fold: if the input list is empty, then the width of the circuit is just the number of wires contained in the initial accumulator, that is, #(Γ; Q). If the input list is non-empty, on the other hand, things get slightly more complicated. At each step i, the step function builds a circuit C<sup>i</sup> of width bounded by J, where J might depend on i. This circuit takes as input all the wires in the accumulator, as well as the wires contained in the frst element of the input list, which are #(A). The wires contained in remaining I −1−i elements have to fow alongside Ci , giving a width upper bound of J + (I − 1 − i) × #(A) at each step i. The overall width upper bound is then the maximum for i going from 0 to I − 1 of this quantity, i.e. precisely maxi<I J + (I −1−i)×#(A). Once again, a graphical representation of this scenario is given in Figure 14.

Fig. 14. The shape of a circuit built by a fold applied to an input list of type List<sup>I</sup> A

Subtyping Notice that Proto-Quipper-R's type system includes two subsumption rules, which are efectively the same rule for terms and values, respectively: csub and vsub. We mentioned that our type system resembles a refnement type system, and all such systems induce a subtyping relation between types, where A is a subtype of B whenever the former is "at least as refned" as the latter. In our case, a subtyping judgment such as Θ ⊢<sup>s</sup> A <: B means that for all natural values of the index variables in Θ, A is a subtype of B.

$$\begin{array}{llll} unit & \underline{\Theta \vdash\_{s} 1 <: : 1} & wire \xrightarrow{wire} \frac{\Theta \vdash\_{s} w <: w}{\Theta \vdash\_{s} w <: w} & bag \xrightarrow{\Theta \vdash\_{s} A <: B} \\ tensor & \frac{\Theta \vdash\_{s} A\_{1} <: A\_{2} \quad \Theta \vdash\_{s} B\_{1} <: B\_{2}}{\Theta \vdash\_{s} A\_{1} \otimes B\_{1} <: A\_{2} \otimes B\_{2}} \\\\ arrow & \frac{\Theta \vdash\_{s} A\_{2} <: A\_{1} \quad \Theta \vdash\_{s} B\_{1} <: B\_{2} \quad \Theta \vdash I\_{1} \leq I\_{2} \quad \Theta \vdash I\_{1} = J\_{2}}{\Theta \vdash\_{s} A\_{1} \quad \Theta \vdash\_{s} A\_{2} \quad \Theta \vdash I\_{2} \quad \Theta \vdash\_{s} B\_{2}} \\\\ list & \frac{\Theta \vdash\_{s} A <: B \qquad \Theta \vdash I = J}{\Theta \vdash\_{s} \text{List}^{l} A <: \text{List}^{l} B} \\\\ circ & \frac{\Theta \vdash\_{s} T\_{1} <: ? \, T\_{2} \quad \Theta \vdash\_{s} U\_{1} <: : ? \, U\_{2} \quad \Theta \vdash I \leq J}{\Theta \vdash\_{s} \text{List}^{l}(T\_{1}, U\_{1}) <: \text{List}^{l}(T\_{2}, U\_{2})} \end{array}$$

Fig. 15. Proto-Quipper-R subtyping rules

We derive this kind of judgments by the rules in Figure 15. Note that Θ ⊢<sup>s</sup> A <:> B is shorthand for "Θ ⊢<sup>s</sup> A <: B and Θ ⊢<sup>s</sup> B <: A". Subtyping relies in turn on a judgment of the form Θ ⊨ I ≤ J, which is a generalization of the semantic judgment that we used in the CRL type system in Section 4.2. Such a judgment asserts that for all natural values of the index variables in Θ, I is lesser or equal than J. Consequently, Θ ⊨ I = J is shorthand for "Θ ⊨ I ≤ J and Θ ⊨ J ≤ I". We purposefully leave the decision procedure of this kind of judgments unspecifed, with the prospect that, from a more practical perspective, they could be delegated to an SMT solver [7].

#### 4.4 Operational Semantics

Operationally speaking, it does not make sense, in the Proto-Quipper languages, to speak of the semantics of a term in isolation: a term is always evaluated in the context of an underlying circuit that supplies all of the term's free labels. We therefore defne the operational semantics of Proto-Quipper-R as a big-step evaluation relation ⇓ on confgurations, i.e. circuits paired with either terms or values. Intuitively, (C, M) ⇓ (D, V ) means that M evaluates to V and updates C to D as a side efect.


Fig. 16. Proto-Quipper-R big-step operational semantics

The rules for evaluating confgurations are given in Figure 16, where C, D and E are circuits, M and N are terms, while V, W, X, Y and Z are values. Most evaluation rules are straightforward, with the exception perhaps of apply, box and fold-step. Being the fundamental block of circuit-building, the semantics of apply lies almost entirely in the way it updates the underlying circuit. The concatenation of the underlying circuit C and the applicand D is delegated entirely to the append function, which is given in Defnition 4. Before we examine the append function, however, consider than when we deal with circuit objects we are not really interested in the concrete labels that occur in them, but rather

in the structure that they convey. For this reason, we introduce the following notion of circuit equivalence.

Defnition 3 (Circuit Equivalence). We say that two boxed circuits ( ¯ℓ, C, ¯k) and (t,¯ D, q¯) are equivalent, and we write ( ¯ℓ, C, ¯k) ∼= (t,¯ D, q¯), when there exists a renaming ρ of labels such that ρ( ¯ℓ) = t¯, ρ( ¯k) = ¯q and ρ(C) = D.

We can now move on to the defnition of append, where the notion of circuit equivalence is used to instantiate the generic input interface of a boxed circuit with the actual labels that it is going to be appended to, and to ensure that there are no name clashes between the appended circuit and the underlying circuit.

Defnition 4 (append). We defne the append of ( ¯ℓ, D, ¯k) to C on t¯, written append(C,t,¯ ( ¯ℓ, D, ¯k)), as the function that performs the following steps:


On the other hand, the semantics of a term of the form box<sup>T</sup> (lift M) relies on the freshlabels function. What freshlabels does is take as input a bundle type T and instantiate fresh Q, ¯ℓ such that Q ⊢<sup>w</sup> ¯ℓ : T. The wire bundle ¯ℓ is then used as a dummy argument to V , the circuit-building function resulting from the evaluation of M. This function application is evaluated in the context of the identity circuit id<sup>Q</sup> and eventually produces a circuit D, together with its output labels ¯k. Finally, ¯ℓ and ¯k become respectively the input and output interfaces of the boxed circuit ( ¯ℓ, D, ¯k), which is the result of the evaluation of box<sup>T</sup> (lift M).

Note, at this point, that T controls how many labels are initialized by the freshlabels function. Because T can contain indices (e.g. it could be that T ≡ List<sup>3</sup> Qubit), it follows that in Proto-Quipper-R indices are not only relevant to typing, but they also have operational value. For this reason, the semantics of Proto-Quipper-R is well-defned only on terms closed both in the sense of regular variables and index variables, since a circuit-building function of input type, say, List<sup>i</sup> Qubit does not correspond to any individual circuit, and therefore it makes no sense to box it. This aspect of the semantics is also apparent in the fold-step rule, where the index variable i occurring free in M is instantiated to 0 before evaluating M to obtain the step function Y . Then, before evaluating the next fold, i is replaced with i + 1 in M, increasing the index for the next iteration.

# 5 Type Safety and Correctness

Because the operational semantics of Proto-Quipper-R is based on confgurations, we ought to adopt a notion of well-typedness which is also based on confgurations. The following defnition of well-typed confguration is thus central to our type-safety analysis.

Defnition 5 (Well-typed Confguration). We say that confguration (C, M) is well-typed with input Q, type A, width I and output L, and we write Q ⊢ (C, M) : A; I;L, whenever C : Q → L, H for some H such that ∅; ∅; H ⊢<sup>c</sup> M : A; I. We write Q ⊢ (C, V ) : A;L whenever C : Q → L, H for some H such that ∅; ∅; H ⊢<sup>v</sup> V : A.

The three results that we want to show in this section are that any well-typed term confguration Q ⊢ (C, M) : A; I;L evaluates to some confguration (D, V ), that Q ⊢ (D, V ) : A;L and that D is obtained from C by extending it with a sub-circuit of width at most I. These claims correspond to the subject reduction and total correctness properties that we will prove at the end of this section. However, both these results rely on a central lemma and on the mutual notions of realization and reducibility, which we frst give formally.

Defnition 6 (Realization). We defne V ⊩<sup>Q</sup> A, which reads V realizes A under Q, as the smallest relation such that


Defnition 7 (Reducibility). We say that M is reducible under Q with type A and width I, and we write M ⊩ I <sup>Q</sup> A, if, for all C such that C : L → Q, H, there exist D, V such that

1. (C, M) ⇓ (C :: D, V ), 2. ⊨ width(D) ≤ I, 3. D : Q → K for some K such that V ⊩<sup>K</sup> A.

Both relations, and in particular reducibility, are given in the form of unary logical relations [55]. The intuition is pretty straightforward: a term is reducible with width I if it evaluates correctly when paired with any circuit C which provides its free labels and if it extends C with a sub-circuit D whose width is bounded by I. Realization, on the other hand, is less immediate. For most cases, realizing type A loosely corresponds to being closed and well-typed with type A, but a value realizes an arrow type A ⊸I,J B when its application to a value realizing A is reducible with type B and width I.

By themselves, realization and reducibility are defned only on terms and values closed in the sense both of regular and index variables. To extend these notions to open terms and values, we adopt the standard approach of reasoning explicitly about the substitutions that would render them closed. A closing value substitution γ is a function that turns an open term M into a closed term γ(M) by substituting a value for each free variable occurring in M. We say that γ implements a typing context Γ using label context Q, and we write γ ⊨<sup>Q</sup> Γ, when it replaces every variable x<sup>i</sup> in the domain of Γ with a value V<sup>i</sup> such that V<sup>i</sup> ⊩Q<sup>i</sup> Γ(xi) and Q = U <sup>x</sup>i∈dom(Γ) Q<sup>i</sup> . A closing index substitution θ is similar, only it substitutes closed indices for index variables and can be applied to indices, types, contexts, values and terms alike. We say that θ implements an index context Θ, and we write θ ⊨ Θ, when it replaces every index variable in Θ with a closed index term. This allows us to give the following fundamental lemma, which will be used while proving all other claims.

Lemma 1 (Core Correctness). Let Π be a type derivation. For all θ ⊨ Θ and γ ⊨<sup>Q</sup> θ(Γ), we have that

$$\begin{aligned} \Pi \rhd \Theta; \varGamma; L \vdash\_c M: A; I &\implies \gamma(\theta(M)) \Vdash\_{Q,L}^{\theta(I)} \theta(A), \\ \Pi \rhd \Theta; \varGamma; L \vdash\_v V: A &\implies \gamma(\theta(V)) \Vdash\_{Q,L} \theta(A). \end{aligned}$$

Proof. By induction on the size of Π, making use of Theorem 1.

Lemma 1 tells us that any well-typed term (resp. value) is reducible (resp. realizes its type) when we instantiate its free variables according to its contexts. Now that we have Lemma 1, we can proceed to proving the aforementioned results of subject reduction and total correctness. We start with the former, which unsurprisingly requires the following substitution lemmata.

Lemma 2 (Index Substitution). Let Π be a type derivation and let I be an index such that Θ ⊢ I. We have that

$$\begin{aligned} \Pi \rhd \Theta, i; \Gamma; Q \vdash\_c M: A; J &\implies \Theta; \Gamma \{ I/i \}; Q \vdash\_c M \{ I/i \} : A \{ I/i \}; J \{ I/i \}, \\ \Pi \rhd \Theta, i; \Gamma; Q \vdash\_v V: A &\implies \Theta; \Gamma \{ I/i \}; Q \vdash\_v V \{ I/i \} : A \{ I/i \}. \end{aligned}$$

Proof. By induction on the size of Π.

Lemma 3 (Value Substitution). Let Π be a type derivation and let V be a value such that Θ; Φ, Γ1; Q<sup>1</sup> ⊢<sup>v</sup> V : A. We have that

$$\begin{aligned} \Pi \rhd \Theta; \Phi, \Gamma\_2, x:A; Q\_2 \vdash\_c M:B; I \implies \Theta; \Phi, \Gamma\_1, \Gamma\_2; Q\_1, Q\_2 \vdash\_c M[V/x]:B; I, J, D\_1, D\_2, D\_1, D\_2] \\ \Pi \rhd \Theta; \Phi, \Gamma\_2, x:A; Q\_2 \vdash\_v W:B \implies \Theta; \Phi, \Gamma\_1, \Gamma\_2; Q\_1, Q\_2 \vdash\_v W[V/x]:B. \end{aligned}$$

Proof. By induction on the size of Π.

Theorem 2 (Subject Reduction). If Q ⊢ (C, M) : A; I;L and (C, M) ⇓ (D, V ), then Q ⊢ (D, V ) : A;L.

Proof. By induction on the derivation of (C, M) ⇓ (D, V ) and case analysis on the last rule used in its derivation. Lemma 3 is essential to the app,dest and let cases, while Lemma 2 is used in the fold-step case. Lemma 1 is essential to the box case, as it is the only case in which the side efect of the evaluation (the circuit built by the function being boxed), whose preservation is the a matter of correctness, becomes a value (the resulting boxed circuit).

Of course, type soundness is not enough: we also want the resource analysis carried out by our type system to be correct, as stated in the following theorem.

Theorem 3 (Total Correctness). If Q ⊢ (C, M) : A; I;L, then there exist D, V such that (C, M) ⇓ (C :: D, V ) and ⊨ width(D) ≤ I.

Proof. By defnition, Q ⊢ (C, M) : A; I;L entails that C : Q → L, H and ∅; ∅; H ⊢<sup>c</sup> M : A; I. Since an empty context is trivially implemented by an empty closing substitution, by Lemma 1 we get M ⊩ I <sup>H</sup> A, which by defnition entails that there exist D, V such that (C, M) ⇓ (C :: D, V ) and ⊨ width(D) ≤ I.

# 6 A Practical Example

This section provides an example of how Proto-Quipper-R can be used to verify the resource usage of realistic quantum algorithms. In particular, we use our language to implement the QFT algorithm [11,39] and verify that the circuits it produces have width no greater than the size of their input, i.e. that the QFT algorithm does not overall use additional ancillary qubits.

```
qft ≜ foldj qftStep nil
qftStep ≜ lift(return λ⟨qs, q⟩Listj Qubit⊗Qubit.
       let ⟨n, qs⟩ = qlen qs in
       let revQs = rev qs in
       let ⟨q, qs⟩ = (folde (lift(rotate n)) ⟨q, nil⟩) revQs in
       let q = apply(H, q) in
       return (cons q qs))
 rotate ≜ λnNat.return λ⟨⟨q, cs⟩, c⟩(Qubit⊗Liste Qubit)⊗Qubit.
       let ⟨m, cs⟩ = qlen cs in
       let rgate = makeRGate (n + 1 − m) in
       let ⟨q, c⟩ = apply(rgate,⟨q, c⟩) in
       return ⟨q, cons c cs⟩
```
Fig. 17. A Proto-Quipper-R implementation of the Quantum Fourier Transform circuit family. The usual syntactic sugar is employed.

The Proto-Quipper-R implementation of the QFT algorithm is given in Figure 17. As we walk through the various parts of the program, be aware that we will focus on the resource aspects of the algorithm, ignoring much of its actual

meaning. Starting bottom-up, we assume that we have an encoding of naturals in the language and that we can perform arithmetic on them. We also assume some primitive gates and gate families: H is the boxed circuit corresponding to the Hadamard gate and has type Circ<sup>1</sup> (Qubit, Qubit), whereas the makeRGate function has type Nat ⊸0,<sup>0</sup> Circ<sup>2</sup> (Qubit ⊗ Qubit, Qubit ⊗ Qubit) and produces instances of the parametric controlled R<sup>n</sup> gate. On the other hand, qlen and rev stand for regular language terms which implement respectively the linear list length and reverse functions. They have type qlen : List<sup>i</sup> Qubit ⊸i,<sup>0</sup> (Nat ⊗ List<sup>i</sup> Qubit) and rev : List<sup>i</sup> Qubit ⊸i,<sup>0</sup> List<sup>i</sup> Qubit in our type system.

We now turn our attention to the actual QFT algorithm. Function qftStep builds a single step of the QFT circuit. The width of the circuit produced at step j is dominated by the folding of the rotate n function, which applies controlled rotations between appropriate pairs of qubits and has type

$$(\mathbf{Q}\mathbf{u}\mathbf{b}\mathbf{t}\otimes\mathbf{L}\mathbf{t}\mathbf{t}^{e}\mathbf{Q}\mathbf{u}\mathbf{b}\mathbf{t})\otimes\mathbf{Q}\mathbf{u}\mathbf{b}\mathbf{t}\mathbf{t}\rightharpoonup\_{e+2,0}\mathbf{Q}\mathbf{u}\mathbf{b}\mathbf{t}\otimes\mathbf{L}\mathbf{t}\mathbf{t}^{e+1}\mathbf{Q}\mathbf{u}\mathbf{b}\mathbf{t},\tag{4}$$

meaning that rotate n rearranges the structure of its inputs, but overall does not introduce any new wire. We fold this function starting from an accumulator ⟨q, nil⟩, meaning that we can give fold<sup>j</sup> (lift(rotate n)) ⟨q, nil⟩ type as follows:

$$\begin{split} &i,j,e;n:\mathsf{Nat};\emptyset\vdash\_{v} \mathsf{lift}(rotate\ n):\mathsf{l}((\mathsf{Q}\otimes\mathsf{List}^{c}\,\mathsf{Q})\otimes\mathbb{Q}\multimap\_{e+2,0}\,\mathsf{Q}\otimes\mathsf{List}^{c+1}\,\mathsf{Q})\\ &j,j;q:\mathsf{Q};\emptyset\vdash\_{v} \langle q,\mathsf{nil}\rangle:\mathsf{Q}\otimes\mathsf{List}^{0}\,\mathsf{Q} \qquad i,j\vdash j \qquad i,j\vdash\mathsf{Q}\\ &i,j;n:\mathsf{Nat},q:\mathsf{Q};\emptyset\vdash\_{v} \mathsf{fold}\_{e}\,\mathsf{I}\mathsf{tl}(rotate\ n)\ \langle q,\mathsf{nil}\rangle:\mathsf{List}^{j}\,\mathsf{Q}\multimap\_{j+1,1}\,\mathsf{Q}\otimes\mathsf{List}^{j}\,\mathsf{Q}\\ &(5)\end{split}$$

where Q is shorthand for Qubit and where we implicitly use the fact that i, j ⊨ max(1, maxe<j e + 2 + (j − 1 − e) × 1) = j + 1 to simplify the arrow's width annotation using vsub and the arrow subtyping rule. Next, we fold over revQs, which has the same elements as qs and thus has length j, and we obtain that the fold produces a circuit whose width is bounded by j + 1. Therefore, qftStep has type

$$\mathbb{M}(\mathtt{List}^{j}\mathtt{Qubit}\otimes\mathtt{Qubit})\ - \circ\_{j+1,0}\mathtt{List}^{j+1}\mathtt{Qubit}),\tag{6}$$

which entails that when we pass it as an argument to the topmost fold together with nil we can conclude that the type of the qft function is

$$\begin{array}{c} i, j; \emptyset; \emptyset \vdash\_{v} qftStep: \text{!} (\text{(List}^{j} \text{ Qubit} \otimes \text{qubit)} \multimap\_{j+1, 0} \text{ List}^{j+1} \text{ Qubit})\\ j. \emptyset; \emptyset \vdash\_{v} \text{nil}: \text{List}^{0} \text{ Qubit} & i \vdash i & i \vdash \text{Qubit} \\\hline i; \emptyset; \emptyset \vdash\_{v} \text{fold}\_{j}: qftStep \text{nil}: \text{List}^{i} \text{ Qubit} \multimap\_{i, 0} \text{ List}^{i} \text{ Qubit} \end{array} (7)$$

where we once again implicitly simplify the arrow type using the fact that i ⊨ max(0, maxj<i j + 1 + (i − 1 − j) × 1) = i. This concludes our analysis and the resulting type tells us that qft produces a circuit of width at most i on inputs of size i, without overall using any additional wires. If we instantiate i to 3, for example, we can apply qft to a list of 3 qubits to obtain the circuit shown in Figure 18, whose width is exactly 3.

To conclude this section, note that for ease of exposition qft actually produces the reversed QFT circuit. This is not a problem, since the two circuits are

Fig. 18. The circuit of input size 3 produced by qft (cons q<sup>1</sup> cons q<sup>2</sup> cons q<sup>3</sup> nil)

equivalent resource-wise and the actual QFT circuit can be recovered by boxing the result of qft and reversing it via a primitive operator. Besides, note that Quipper's internal implementation of the QFT is also reversed [16].

# 7 Related Work

The metatheory of quantum circuit description languages, and in particular of Quipper-style languages, has been the subject of quite some work in recent years, starting with Ross's thesis on Proto-Quipper-S [48] and going forward with Selinger and Rios's Proto-Quipper-M [46]. In the last fve years, some proposals have also appeared for more expressive type systems or for language extensions that can handle non-standard language features, such as the so-called dynamic lifting [8,21,35], available in the Quipper language, or dependent types [22]. Although some embryonic contributions in the direction of analyzing the size of circuits produced using Quipper have been given [56], no contribution tackles the problem of deriving resource bounds parametric on the size of the input. In this, the ability to have types which depend on the input, certainly a feature of Proto-Quipper-D [22], is not useful for the analysis of intensional attributes of the underlying circuit, simply because such attributes are not visible in types.

If we broaden the horizon to quantum programming languages other than Quipper, we come across, for example, the recent works of Avanzini et al. [5] and Liu et al. [36] on adapting the classic weakest precondition technique to the cost analysis of quantum programs, which however focus on programs in an imperative language. The work of Dal Lago et al. [13] on a quantum language which characterizes complexity classes for quantum polynomial time should certainly be remembered: even though the language allows the use of higher-order functions, the manipulation of quantum data occurs directly and not through circuits. Similar considerations hold for the recent work of Hainry et al. [29] and Yamakami's algebra of functions [59] in the style of Bellantoni and Cook [6], both characterizing quantum polynomial time.

If we broaden our scope further and become interested in the analysis of the cost of classical or probabilistic programs, we face a vast literature, with contributions employing a variety of techniques on heterogeneous languages and calculi: from functional programs [2,32,33] and term rewriting systems [3,4,41]

to probabilistic [34] and object-oriented programs [19,28]. In this context, the resource under analysis is often assumed to be computation time, which is relatively easy to analyze given its strictly monotonic nature. Circuit width, although monotonically non-decreasing, evolves in a way that depends on a non-monotonic quantity, i.e. the number of wires discarded by a circuit. As a result, width has the favor of space and its analysis is less straightforward.

It is also worth mentioning that linear dependent types can be seen as a specialized version of refnement types [18], which have been used extensively in the literature to automatically verify interesting properties of programs [37,62]. In particular, the work of Vazou et al. on Liquid Haskell [57,58] has been of particular inspiration, on account of Quipper being embedded precisely in Haskell. The liquid type system [47] of Liquid Haskell relies on SMT solvers to discharge proof obligations and has been used fruitfully to reason about both the correctness and the resource consumption (mainly time complexity) of concrete, practical programs [30].

# 8 Generalization to Other Resource Types

This work focuses on estimating the width of the circuits produced by Quipper programs. This choice is dictated by the fact that the width of a circuit corresponds to the maximum number of distinct wires, and therefore individual qubits, required to execute it. Nowadays, this is considered as one of the most precious resources in quantum computing, and as such must be kept under control. However, this does not mean that our system could not be adapted to the estimation of other parameters. This section outlines how this may be possible.

First, estimating strictly monotonic resources, such as the total number of gates in a circuit, is possible and in fact simpler than estimating width. A single index term I that measures the number of gates in the circuit built by a computation would be enough to carry out this analysis. This index would be appropriately increased any time an apply instruction is executed, while sequencing two terms via let would simply add together the respective indices.

If we were instead interested in the depth of a circuit, then we would need a slightly diferent approach. Although in principle it would be possible to still rely on a single index I, this would give rise to a very coarse approximation, efectively collapsing the analysis of depth to a gate count analysis. A more precise approximation could instead be obtained by keeping track of depth locally. More specifcally, it would be sufcient to decorate each occurrence of a wire type w with an index term I so that if a label ℓ were typed with w I , it would mean that the sub-circuit rooted in ℓ has a depth at most equal to I.

Finally, it should be mentioned that the resources considered, i.e. the depth, width, and gate count of a circuit, can be further refned so as to take into account only some kinds of wires and gates. For instance, one could want to keep track of the maximum number of qubits needed, ignoring the number of classical bits, or at least distinguishing the two parameters, which of course have distinct levels of criticality in current quantum hardware.

# 9 Conclusion and Future Work

In this paper we introduced a linear dependent type system based on index refnements and efect typing for the paradigmatic calculus Proto-Quipper, with the purpose of using it to derive upper bounds on the width of the circuits produced by programs. We proved not only the classic type safety properties, but also that the upper bounds derived via the system are correct. We also showed how our system can verify a realistic quantum algorithm and elaborated on some ideas on how our technique could be adapted to other crucial resources types, like gate count and circuit depth. Ours is the frst type system designed specifcally for the purpose of resource analysis to target circuit description languages such as Quipper. Technically, the main novelties are the smooth combination of efect typing and index refnements, but also the proof of correctness, in which reducibility and efects are shown to play well together.

Among topics for further work, we can identify three main research directions. First and foremost, it would be valuable to investigate the ideas presented in this paper from a more practical perspective, that is, to provide a prototype implementation of the language with its type-checking procedure. The necessity to count the wires present in the context (e.g. when typing abstractions) makes it difcult to embed Proto-Quipper-R into existing languages, even those that, in principle, seem like ideal hosts, like Liquid Haskell [57] or Granule [42]. Because of this, we think that it would be better to produce a standalone implementation of Proto-Quipper-R that interfaces directly with SMT solvers to discharge the semantic judgments that are used pervasively in the typing rules.

Staying instead on the theoretical side of things, on one hand we have the prospect of denotational semantics: most incarnations of Proto-Quipper are endowed with categorical semantics that model both circuits and the terms of the language that build them [21,22,35,46]. We already mentioned how the intensional nature of the quantity under analysis renders the formulation of an abstract categorical semantics for Proto-Quipper-R and its circuits a nontrivial task, but we believe that one such semantics would help Proto-Quipper-R ft better in the Proto-Quipper landscape.

On the other hand, in Section 8 we briefy discussed how our system could be modifed to handle the analysis of diferent resource types. It would be interesting to test this path and to investigate the possibility of actually generalizing our resource analysis, that is, of making it parametric on the kind of resource being analyzed. This would allow for the same program in the same language to be amenable to diferent forms of verifcation, in a very fexible fashion.

Acknowledgments The research leading to these results has received funding from the European Union - NextGenerationEU through the Italian Ministry of University and Research under PNRR - M4C2 - I1.4 Project CN00000013 "National Centre for HPC, Big Data and Quantum Computing".

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# On the Hardness of Analyzing Quantum Programs Quantitatively

Martin Avanzini<sup>1</sup> , Georg Moser<sup>2</sup> , Romain P´echoux , and Simon Perdrix<sup>3</sup> 3(B)

{romain.pechoux,simon.perdrix}@loria.fr <sup>1</sup> Centre Inria d'Universit´e Cˆote d'Azur, Valbonne, France martin.avanzini@inria.fr <sup>2</sup> Universit¨at Innsbruck, Innsbruck, Austria georg.moser@uibk.ac.at <sup>3</sup> CNRS-Inria-Universit´e de Lorraine, LORIA, Nancy, France

Abstract. In this paper, we study quantitative properties of quantum programs. Properties of interest include (positive) almost-sure termination, expected runtime or expected cost, that is, for example, the expected number of applications of a given quantum gate, etc. After studying the completeness of these problems in the arithmetical hierarchy over the Cliford+T fragment of quantum mechanics, we express these problems using a variation of a quantum pre-expectation transformer, a weakest pre-condition based technique that allows to symbolically compute these quantitative properties. Under a smooth restriction—a restriction to polynomials of bounded degree over a real closed feld—we show that the quantitative problem, which consists in fnding an upper-bound to the pre-expectation, can be decided in time double-exponential in the size of a program, thus providing, despite its great complexity, one of the frst decidable results on the analysis and verifcation of quantum programs. Finally, we sketch how the latter can be transformed into an efcient synthesis method.

# 1 Introduction

Motivations. Quantum computation is a promising and emerging computational paradigm which can efciently solve problems considered to be intractable on classical computers [41,20]. However, the unintuitive nature of quantum mechanics poses challenging questions for the design and analysis of corresponding quantum programming. Indeed, the quantum program dynamics are considerably more complicated compared to the behavior of classical or probabilistic programs. Therefore, formal reasoning requires the development of novel methods and tools, a development that has already started and recently gathered momentum in various areas, like design automation [43,22], programming languages [39,2,31,23,15], verifcation [36,11], etc.

Among these formal methods, those that allow us to obtain quantitative properties on quantum programs are particularly interesting. They can be used to obtain relevant information about the computations of a quantum program,

Fig. 1. Repeat-until-success program RUS and step-circuit.

such as the number of qubits used and the number of unitary operators used, thus enabling the corresponding compiled quantum circuit to be optimized (for example, by minimizing the use of gates that are hard to make fault-tolerant, or by reducing the number of qubits) or to avoid undesirable behavior such as non-termination. Another quantitative property of interest may also be the question whether or not a program terminates almost-surely, that is, whether its probability of non-termination is zero or not. Similarly, we could aim to capture the expected values of (classical) program variables upon program termination. The latter can also be employed to reason about the expected runtime or the expected cost of quantum programs, if we suitably instrument the code with counter variables.

To illustrate this, the program of Figure 1 implements a Repeat-Until-Success algorithm that can be used to simulate quantum unitary operators on input qubit q<sup>1</sup> by using repeated measurements. The quantum step-circuit on the right part corresponds to one iteration of the loop. Variable i in the program just acts as a counter for T-gates. Hence an analysis on the expected value of variable i can be used to infer an upper-bound on the expected T-count, i.e., the expected number of times a T-gate is used in the fully compiled quantum circuit. Such an approach ofers the advantage to allow the programmer to implement quantum programs using fewer T-gates, which are costly to implement fault-tolerantly [10,16], and it therefore provides a simple quantum program to illustrate that the study of quantitative properties is paramount.

In [6,30], new methodologies named quantum expectation transformers based on predicate transformers [13,28] and expectation transformers [32,17] have been put forward to naturally express and study the quantitative properties of quantum programs. However, no attempt was made to automate the corresponding techniques or delineate how complicated such an automation could be. Automation of these formal verifcation techniques in the context of quantum programs is a particularly difcult problem. Indeed, the consideration of Hilbert spaces as a mathematical framework for describing principles and laws of quantum mechanics makes it seemingly impossible to reason fully automatically about quantitative properties of quantum program: they involve computational objects of exponential dimensions (in the number of qubits) with scalars ranging over an uncountable domain (i.e., complex numbers C). This problem is directly linked to the fact that the set C includes non-computable numbers [42] and that testing the inequality ≤ or the equality = of two real numbers is not decidable, even if one restricts their study to computable real numbers. Consequently, the particular nature of quantum programs and of their semantic domain, Hilbert spaces, makes it impossible to directly apply the results obtained in the classical and probabilistic setting [37,24].

Contributions. In this paper, we study the hardness of the quantitative properties of mixed classical-quantum programs and provide a frst step towards their (full) automation using quantum expectation transformers.

To this end, we restrict the considered quantum gates to the Cliford+T fragment, which is known to be the simplest approximately universal fragment of quantum mechanics [1]. Cliford+T makes it possible to only consider quantum states with algebraic amplitudes, thus restricting the study to a countable domain. It implies that our results can accommodate quantum gates employed in actual hardware, recently employed to claim quantum advantage, cf [3]. Moreover, the obtained results are very general as it can be extended to any set of gates with algebraic coefcients.

As motivated, our frst contribution is about the general hardness of deciding quantitative properties for mixed classical-quantum programs. For a given input state, we study properties such as (positive) almost-sure termination, (P)Ast for short; testing problems, TestR, which consist in comparing a quantum expectation (for example, the mean value of a variable) with a given value (an algebraic and positive real number) wrt the relation R; and the fniteness problem, Test̸=<sup>∞</sup>, which consists in checking that a quantum expectation is fnite. For each of those problems, we also study the related universal problem, which consists in checking the corresponding property for every input. We establish a precise mapping (Theorem 1) of the inherent complexity of each problem in the arithmetical hierarchy [34] that is summarized in Table 1 (provided in Section 3). E.g., Ast is Π<sup>0</sup> 2 -complete while Past is Σ<sup>0</sup> 2 -complete.

Our second contribution aims to overcome the aforementioned undecidability results. For that, we study approximations. More precisely, we focus on inferring bounding functions (in general depending on the input) on the expected values of classical program variables upon termination. The decision problem has thus been altered to an inference problem. Further, we restrict the set of potential bounding functions. As a suitable class of functions, we consider polynomials over the real-closed feld of the algebraic numbers. The restriction to algebraic numbers guarantees that comparison operations between real numbers remain decidable. On the other hand, for any real closed feld, quantifer elimination for formulas over polynomials is decidable, that is, there exists a double-exponential algorithm computing a quantifer-free formula equivalent to the original formula [21]. This recasting of the problem and restriction of the solution space sufces to render the problem decidable. The inference algorithm established remains double-exponential (Theorem 4), thus of similar complexity as the underlying quantifer elimination procedure.

Finally, our last contribution (Section 5) studies efective automation of the inference of upper bounds on the expected values of program variables. To improve upon the double-exponential complexity, we further restrict the class of polynomials considered, that is, to degree-2 polynomials and sketch how techniques from optimization theory can be employed. Several simple quantum algorithms such as program RUS can be analyzed using this approach (Example 6). This further reduction in expressivity allows the encoding of the problem in SMT and thus paves the way towards (full) automation.

Related Work. Predicate transformers [13,28]—on which our work is based were introduced as a method for reasoning about the semantics of imperative programs. They have been adapted to the probabilistic setting, leading to the notion of expectation transformer [32,17], which has been used to reason about expected values [26,8], runtimes [27,33], and costs [7,4,33], and to the quantum paradigm, leading to the notion of quantum pre-expectation transformer [35,30,6].

The problem of studying the difculty of analyzing quantitative program properties has been deeply studied in the classical setting. To mention a few, [14] and [37] study termination properties and runtime/derivational properties of frst-order programs, respectively. Further, in [24] completeness results for various quantitative properties of (pure) probabilistic programs have been established. The inference problem of expectation transformers, i.e., establishing an implementation that automates the search for pre-expectations, has been studied extensively. Examples of successful implementation are presented in [33,7,8]. Up to now, however, no practical, feasible studies have been carried out on quantum languages. Among the techniques using quantum expectation transformers, we believe [6] to be the most amenable to automation. Indeed, by lifting upper invariants of [27] to the quantum setting, it enables approximate reasoning and eliminate the need to reason about fxpoints or limits, stemming from the semantics of loops.

# 2 Quantum Programming Language

In this section, we introduce the syntax and operational semantics of the considered mixed-quantum imperative programming language.

Syntax. We make use of three basic datatypes B, N and Q for Boolean, numbers (non-negative integers), and qubit data, respectively. Let K be an arbitrary N Exp ∋ n, n1, n<sup>2</sup> ::= x <sup>N</sup> | n ∈ N | n<sup>1</sup> + n<sup>2</sup> | n<sup>1</sup> − n<sup>2</sup> | n<sup>1</sup> × n<sup>2</sup> BExp ∋ b, b1, b<sup>2</sup> ::= x B | tt | ff | n<sup>1</sup> = n<sup>2</sup> | n<sup>1</sup> < n<sup>2</sup> | ¬b | b<sup>1</sup> ∧ b<sup>2</sup> | b<sup>1</sup> ∨ b<sup>2</sup> Exp ∋ e, e1, e<sup>2</sup> ::= n | b Stmt ∋ stm, stm1, stm<sup>2</sup> ::= skip | x <sup>K</sup> = e <sup>K</sup> | stm1; stm<sup>2</sup> | if b B then stm<sup>1</sup> else stm<sup>2</sup> | while b B do stm | q <sup>Q</sup> ∗= U | x <sup>B</sup> = meas q Q

Fig. 2. Syntax of quantum programs.

classical type in {B, N }. Each program variable comes with a fxed datatype and can be optionally annotated by its type as a superscript. In what follows, we will use x, x ′ , y, . . . to denote classical variables of type K and q, q ′ , . . . to denote quantum variables of type Q. A program, denoted P, is simply a statement; see Figure 2. Program statements are either classical assignments, conditionals, sequences, loops, quantum assignments q <sup>Q</sup> ∗= U, or measurements x <sup>B</sup> = meas q Q. A quantum assignment consists in the application of a quantum unitary gate U of arity ar(U) to a sequence of qubits q ≜ q1, . . . , qar(U) . As we will see in the semantics section, a unitary matrix U will be associated with each quantum gate U. A measurement performs a single qubit measurement of q in the computational basis: the outcome is a Boolean value and the quantum state evolves accordingly. For a given syntactic construct t, let B(t) (respectively N (t), Q(t)) be the set of Boolean (respectively, number, qubit) variables in t.

Notice that the language encompasses qubit-initializing in the basis states. In particular, we will use q <sup>Q</sup> = |0⟩ as syntactic sugar for x = meas q; if x then q ∗= X else skip, for X being the Pauli X gate and for some fresh variable x of type B.

Example 1. Consider the program of Figure 3, adapted from [6], as a simple leading example. Let H be the unitary operator computing the Hadamard gate. This program simulates coin tossing by repeatedly measuring the qubit q, until the measurement outcome ff occurs. The probability to terminate within n steps depends on the initial state ρ = α β γ δ (a density matrix in C 2×2 , which implies α + δ = 1 and γ = β¯) of the qubit q. Variable i is increased by one at each iteration, and hence, when the program terminates, i stores as fnal value the number of loop iterations performed. The overall probability of termination is 1. The mean value of variable i, that is, the expected number of loop iterations, depends on the program input, in particular on the initial quantum state. After termination, for an initial state ρ = α β β δ ¯ , its expected value is given by

$$F(\rho) = p\_0 \times 1 + \sum\_{i=1}^{\infty} \frac{p\_1}{2^i} (i+1) = p\_0 + p\_1 + 2p\_1 = 1 + (\alpha - \beta - \bar{\beta} + \delta) = 2 - (\beta + \bar{\beta}),$$

where p<sup>0</sup> = α+β+β¯+δ <sup>2</sup> = 1+β+β¯ 2 and p<sup>1</sup> = 1−p<sup>0</sup> are the probabilities of measuring |0⟩ and |1⟩, respectively, on the frst iteration of the loop. For instance, for a qubit Cntoss ≜ x <sup>B</sup> = tt; i <sup>N</sup> = 0; while x do { i = i + 1; q <sup>Q</sup> ∗= H; x = meas q } ≜ stm with H = 1 √ 2 1 1 1 −1 

Fig. 3. Quantum Coin tossing

initialized in state |ϕ⟩ = p1/<sup>3</sup> |0⟩ + p2/<sup>3</sup> |1⟩, the corresponding density matrix is ρ<sup>|</sup>ϕ⟩ = |ϕ⟩⟨ϕ| = 1/3 √ <sup>2</sup>/<sup>3</sup> <sup>√</sup> 2/3 2/3 and hence the expected number of loop iterations is F(ρ<sup>|</sup>ϕ⟩) = 2 − 2 √ 2/3. It will be simply 2 in the case of an initialization in the computational basis |ϕ⟩ = |0⟩ or |ϕ⟩ = |1⟩.

Operational Semantics. Following [6], we model the dynamics of our language as a probabilistic abstract reduction system (see [9]), a transition system where reduction is defned as a relation over probability distributions.

Probabilistic abstract reduction systems. Given a subset K of R, let K<sup>+</sup> be the set of non-negative numbers in K, i.e., K<sup>+</sup> ≜ K ∩ {x | x ≥ 0} and let K<sup>∞</sup> be defned by K<sup>∞</sup> ≜ K ∪ {∞}.

A discrete (sub)distribution δ over a set A is a function δ : A → [0, 1] with countable support supp(δ) ≜ {a ∈ A | δ(a) ̸= 0} that maps an element a of A to a probability δ(a) such that |δ| ≜ P a∈supp(δ) δ(a) = 1 (|δ| ≤ 1). Any (sub)distribution δ can be written as {δ(a) : a}a∈supp(δ) . The set of subdistributions over P A, denoted by D(A), is closed under denumerable convex combinations i pi · δ<sup>i</sup> ≜ λa.P i piδi(a), with p<sup>i</sup> ∈ [0, 1] and P i p<sup>i</sup> ≤ 1. Slightly simplifying standard notation, given f : A → R <sup>+</sup><sup>∞</sup> and a subdistribution δ ∈ D(A), we defne Eδ(f), the expectation of f on δ, by Eδ(f) ≜ Σa∈supp(δ)δ(a)f(a). Note that Eδ(f) ∈ R <sup>+</sup><sup>∞</sup> is always defned, since the images of f are non-negative reals.

Bournez and Garnier [9] introduced the notion of Probabilistic Abstract Reduction System (PARS) as a means to study reduction systems that evolve probabilistically. A PARS → on A is a binary relation · → · ⊆ A × D(A). The intended meaning is that when a → δ, then a reduces to b ∈ supp(δ) with probability δ(b). Here, we focus on deterministic PARSs, i.e., PARSs → with a → δ<sup>1</sup> and a → δ<sup>2</sup> implies δ<sup>1</sup> = δ2. An object a ∈ A is called terminal if there is no rule a → δ, which we write as a ̸→.

Every deterministic PARS → over A naturally lifts to a reduction relation −→→ over distributions so that δ −→→ ε, if the reduct distribution ε is obtained from δ by replacing reducts in supp(δ) according to the PARS →. In fact, we defne this lifting in terms of a ternary relation · ·−→→ · ⊆ D(A) × R <sup>+</sup> × D(A) on distributions, where in a step δ <sup>c</sup>−→→ ε the weight c signifes the probability that a reduction has occurred. This relation is defned wrt. the following three rules.

$$\begin{array}{ccccc} a \xleftarrow{a \xleftarrow{\delta}} & & a \to \delta \\ \hline \{1 : a\} \xrightarrow{0} \{1 : a\} & & \{1 : a\} \xrightarrow{1} \delta \end{array} \qquad \begin{array}{ccc} \delta\_i \xrightarrow{c\_i} \epsilon\_i & \sum\_i p\_i \le 1 \\ \hline \sum\_i p\_i \cdot \delta\_i \xrightarrow{\sum\_i p\_i c\_i} \sum\_i p\_i \cdot \epsilon\_i \end{array}$$

We may sometimes use the n-fold (n ≥ 0) composition of ·−→→, denoted ·−→→<sup>n</sup>, given by δ <sup>c</sup>−→→<sup>n</sup> ϵ if δ <sup>c</sup><sup>1</sup> −→→ · · · <sup>c</sup><sup>n</sup> −→→ ϵ and the weights satisfy c = P<sup>n</sup> <sup>i</sup>=1 c<sup>i</sup> . Notice that since → is deterministic, so is <sup>c</sup>−→→, in the sense that δ <sup>c</sup><sup>1</sup> −→→ ϵ<sup>1</sup> and δ <sup>c</sup><sup>2</sup> −→→ ϵ<sup>2</sup> implies c<sup>1</sup> = c<sup>2</sup> and ϵ<sup>1</sup> = ϵ2. Thus, in particular, for every a ∈ A there is precisely one (infnite) reduction

$$\{1:a\} = \delta\_0 \xrightarrow{c\_0} \delta\_1 \xrightarrow{c\_1} \delta\_2 \xrightarrow{c\_2} \delta\_3 \xrightarrow{} \cdots \ \ .$$

For any b ∈ A, δi(b) gives the probability that a reduces to b in i steps. Note that when b is terminal, this probability only increases along reductions (i.e., δi(b) ≤ δi+1(b) for all i). This justifes that we defne the terminal distribution of a as the distribution δ(b) ≜ limi→∞ δi(b). Note that δ(b) gives the probability that a reaches b in an arbitrary (but fnite) number of steps. Since the weights ci indicate the probability that a step has been performed from δ<sup>i</sup> to δi+1, the infnite sum P<sup>∞</sup> <sup>i</sup>=0 <sup>c</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> <sup>+</sup><sup>∞</sup> gives the expected number of reduction steps carried out, the expected derivation length of a [5].

For a PARS →, we denote by term<sup>→</sup> : A → D(A) the function associating with each a ∈ A its terminal distribution. The expected derivation length function edl<sup>→</sup> : A → R <sup>+</sup><sup>∞</sup> associates each a ∈ A to its expected derivation length. The PARS → is almost surely terminating [40] (a.s. terminating for short) if a ∈ A reduces to a terminal object b ̸→ with probability 1, that is, if |term→(a)| = 1 for every a. It is positive almost surely terminating, if the expected derivation length is always fnite, that is, edl→(a) < ∞ for all a ∈ A.

Apart from termination, we are interested also in questions related to functional correctness, such as (i) what is the probability that a reaches a terminal b, (ii) what is the probability that a reaches a terminal satisfying predicate P, and more generally, (iii) which value does a function f : A → R <sup>+</sup><sup>∞</sup> take, in expectation, when fully reducing an object a. In the literature [32], one tool to answer all of these are given by weakest pre-expectation transformers, the natural generalization of classical weakest pre-condition transformers to a quantitative, probabilistic setting. We suite this notion to PARSs.

Defnition 1 (Weakest pre-expectation). The weakest pre-expectation for a PARS → over A is given by the function

$$\begin{aligned} \mathsf{wp}\_{\rightarrow} &: (A \to \mathbb{R}^{+\infty}) \to (A \to \mathbb{R}^{+\infty}), \\ \mathsf{wp}\_{\rightarrow} &\stackrel{\scriptstyle \Delta}{=} \lambda f. \lambda a. \ \mathbb{E}\_{\mathsf{term}\_{\rightarrow}(a)}(f). \end{aligned}$$

For 1<sup>b</sup> the indicator function evaluating to 1 on argument b and to 0 otherwise, and by seeing a predicate P as a 0, 1-valued function, wp<sup>→</sup> 1<sup>b</sup> a answers


Fig. 4. Operational semantics in terms of PARS.

question (i), wp<sup>→</sup> P a answers (ii), and generally wp<sup>→</sup> f a answers question (iii). Note also that a PARS is a.s. terminating if wp<sup>→</sup> (λb. 1) a = 1 for each a ∈ A. On the other hand, positive a.s. termination cannot be expressed through an application of wp→.

Quantum programs as PARSs. We now endow quantum programs with an operational semantics defned in terms of a PARS. Given a totally ordered set of qubits Q = {q1, . . . , qn}, let H<sup>Q</sup> be the 2n-dimensional Hilbert space defned by H<sup>Q</sup> ≜ ⊗<sup>n</sup> <sup>i</sup>=1H<sup>q</sup><sup>i</sup> , with H<sup>q</sup> = C <sup>2</sup> being the vector space of computational basis {|0⟩, |1⟩} and ⊗ being the tensor product. With ⟨k| we denote the transpose conjugate of |k⟩, for k ∈ {0, 1}. Let M(HQ) be the set of complex square matrices acting on the Hilbert space HQ, i.e., M(HQ) = C 2 <sup>n</sup>×2 n . Given M ∈ M(HQ), M† denotes the transpose conjugate of M, and I2<sup>n</sup> denotes the identity matrix over M(HQ). We will write I when the dimension is clear from the context.

Let D(HQ) ⊊ M(HQ) be the set of all density operators (or quantum states), i.e., positive semi-defnite matrices of trace equal to 1 on HQ. Density operators can be viewed as the mathematical representation of a (mixed) quantum state. A unitary operator U is a matrix in M(HQ) such that UU† = U †U = I. A superoperator Φ<sup>U</sup> : D(HQ) → D(HQ), an endomorphism over density operators, is attached to each unitary operator U and defned by Φ<sup>U</sup> ≜ λρ.U ρU† . By defnition, Φ<sup>U</sup> is a completely positive trace preserving linear map. Indeed, tr(U ρU† ) = tr(ρ), by unitarity. Hence U ρU† is a density operator in D(HQ) for each ρ ∈ D(HQ).

Regarding measurements, for each i, 1 ≤ i ≤ card(Q), we defne Mk,i ∈ M(HQ), with k ∈ {0, 1}, by M0,i ≜ I<sup>2</sup> <sup>i</sup>−<sup>1</sup> ⊗(|0⟩ ⟨0|)⊗I2n−<sup>i</sup> and M1,i ≜ I −M0,i. The measurement of the qubit q<sup>i</sup> (in the computational basis) of a density matrix ρ ∈ D(HQ), produces the classical outcome k ∈ {0, 1} with probability tr(Mk,iρ). The (normalized) quantum state, after the measurement, is defned by

$$m\_{k,i}(\rho) \stackrel{\Delta}{=} \begin{cases} \frac{M\_{k,i}\rho M\_{k,i}^\dagger}{tr(M\_{k,i}\rho)}, & \text{if } tr(M\_{k,i}\rho) \neq 0, \\\frac{I}{2^n} & \text{otherwise.} \end{cases}$$

Note that for all ρ ∈ D(HQ), mk,i(ρ) ∈ D(HQ), as it holds that tr(mk,i(ρ)) = 1. Indeed, tr(Mk,iρM† k,i) = tr(M<sup>2</sup> k,iρ) = tr(Mk,iρ), as Mk,i is a projection. Hence mk,i is a map in D(HQ) → D(HQ).

We set <sup>J</sup>B<sup>K</sup> <sup>≜</sup> {0, <sup>1</sup>} and <sup>J</sup><sup>N</sup> <sup>K</sup> <sup>≜</sup> <sup>N</sup>. The classical state is modeled as a (welltyped) store s of domain dom(s) mapping each variable x of type K to a value in <sup>J</sup>KK. With Store, we denote the set of all such stores. Let <sup>s</sup>[<sup>x</sup> <sup>K</sup> := k] with <sup>k</sup> <sup>∈</sup> <sup>J</sup>K<sup>K</sup> be the store obtained from <sup>s</sup> by updating the value assigned to <sup>x</sup> in the map <sup>s</sup>. Given a store <sup>s</sup>, let <sup>J</sup>−<sup>K</sup> s : <sup>K</sup>Exp <sup>→</sup> <sup>J</sup>K<sup>K</sup> be the map associating to each expression e of type K and such that B(e) ∪ N (e) ⊆ dom(s), a value in <sup>J</sup>KK, defned in the obvious way. For example <sup>J</sup>x<sup>K</sup> <sup>s</sup> <sup>≜</sup> <sup>s</sup>(x), <sup>J</sup>n<sup>K</sup> <sup>s</sup> <sup>≜</sup> <sup>n</sup>, <sup>J</sup>tt<sup>K</sup> <sup>s</sup> ≜ 1, <sup>J</sup>n<sup>1</sup> <sup>−</sup> <sup>n</sup>2<sup>K</sup> <sup>s</sup> <sup>≜</sup> max(0, <sup>J</sup>n1<sup>K</sup> <sup>s</sup> <sup>−</sup> <sup>J</sup>n2<sup>K</sup> s ), etc.

Let ↓ be a special symbol for termination. A confguration µ, for (extended) statement stm ∈ Stmt ∪ {↓}, store s ∈ Store, and a quantum state ρ ∈ HQ, has the form (stm, s, ρ). Let Conf be the set of confgurations. A confguration (stm, s, ρ) is well-formed with respect to the sets of variables B, V , and Q if B(stm) ⊆ B, N (stm) ⊆ V , Q(stm) ⊆ Q, dom(s) = B ∪ V , and ρ ∈ D(HQ). Throughout the paper, we only consider confgurations that are well-formed with respect to the sets of variables of the program under consideration.

The operational semantics is described in Figure 4 as a PARS →<sup>q</sup> over objects in Conf, where terminal objects are precisely the confgurations of the shape (↓, s, ρ). The (classical or quantum) state of a confguration can only be updated by the three rules (Exp), (Op), and (Meas). Rule (Exp) updates the classical store wrt the value of the evaluated expression. Rule (Op) updates the quantum state to a new quantum state Φ<sup>U</sup><sup>q</sup> (ρ) = UqρU† q , where U<sup>q</sup> is the unitary operator in M(HQ) computed by extending the quantum gate U to the entire set of qubits Q. Rule (Meas) performs a measurement on qubit q<sup>i</sup> . This rule returns a distribution of confgurations corresponding to the two possible outcomes, k = 0 and k = 1, with their respective probabilities tr(Mk,iρ) and, in each case, updates the classical store and the quantum state accordingly. In the particular case where tr(M<sup>k</sup>0,iρ) = 0 for some k<sup>0</sup> ∈ {0, 1}, {tr(Mk,iρ) : (↓, s[x := k], mk,i(ρ))}k∈{0,1} = {1 : (↓, s[x := 1 − k0], m1−k0,i(ρ))}. Rule (Seq) governs the execution of a sequence of statements stm1; stm2, under the covenant that ↓ ; stm ≜ stm, for each statement stm. The rule accounts for potential probabilistic behavior when stm<sup>1</sup> performs a measurement and it is otherwise standard. All the other rules are standard.

In a confguration µ = (stm, s, ρ), the pair σ ≜ (s, ρ) is called a state. Let Ststm be the set of states σ, τ, . . . that are well-formed wrt statement stm. For simplicity, we will denote this set by St when stm is clear from the context. To ease the presentation, we sometimes write (stm, σ) for the confguration µ.

We will be interested in expectation-based reasoning on quantum programs. In what follows, we also call functions f : Conf → R <sup>+</sup><sup>∞</sup> expectations, for brevity.

Defnition 2. For a statement stm and f : St → R <sup>+</sup>∞, we overload the notions of expected derivation length and weakest pre-expectation by:

edlstm : St <sup>→</sup> <sup>R</sup> <sup>+</sup><sup>∞</sup> qwpstm : (St <sup>→</sup> <sup>R</sup> <sup>+</sup><sup>∞</sup>) → (St → R <sup>+</sup><sup>∞</sup>) edlstm <sup>≜</sup> λσ.edl<sup>→</sup><sup>q</sup> (stm, σ) qwpstm <sup>≜</sup> λf.λσ.wp<sup>→</sup><sup>q</sup> (fst)(stm, σ),

where fst(stm, τ ) = f(τ ).

Example 2. Consider the program Cntoss given Figure 3. In the setting of the program Cntoss, Q = {q}, M0,<sup>1</sup> = ( 1 0 0 0 )and M1,<sup>1</sup> = ( 0 0 0 1 ). On an initial state σ = (s, ρ), the reduction starts deterministically as in the classical setting, performing the initialization x = tt and i = 0. From there, evaluation reaches the loop while x do stm. At each loop iteration, the loop counter i is incremented, and the Hadamard gate applied to the quantum variable q. The loop guard is obtained through measuring q.

To see how this is refected in the semantics, let us frst look at an iteration of the loop. If x was set to false, that is x holds the value 0, by rule (Wh0) the loop terminates within one step:

$$\{1: (\mathtt{while} \,\mathtt{x} \,\mathtt{do} \,\mathtt{stm}, [\mathtt{x} := 0, \mathtt{i} := i], \rho)\} \xrightarrow[\mathtt{Q}]{1} \{1: (\downarrow, [\mathtt{x} := 0, \mathtt{i} := i], \rho)\}.\tag{0}$$

On the other hand, when x was previously set to true, the loop executes its body. Precisely, we have:

$$\begin{aligned} & \{ 1: (\mathtt{while} \ \mathtt{x} \ \mathtt{do} \ \mathtt{stm}, [\mathtt{x} := 1, \mathtt{i} := i], \rho) \} \\ & \xrightarrow[] & \{ 1: (\mathtt{i} = \mathtt{i} + 1 \mathtt{;} \ \mathtt{q} = \mathtt{H} ; \ \mathtt{x} = \mathtt{meas} \ \mathtt{q} ; \ \mathtt{while} \ \mathtt{x} \ \mathtt{do} \ \mathtt{stm}, [\mathtt{x} := 1, \mathtt{i} := i], \rho) \} \\ & \xrightarrow[] & \{ 1: (\mathtt{q} = \mathtt{H} ; \ \mathtt{x} = \mathtt{meas} \ \mathtt{q} ; \ \mathtt{while} \ \mathtt{x} \ \mathtt{do} \ \mathtt{stm}, [\mathtt{x} := 1, \mathtt{i} := i + 1], \rho) \} \\ & \xrightarrow[] & \{ 1: (\mathtt{x} = \mathtt{meas} \ \mathtt{q} ; \ \mathtt{while} \ \mathtt{x} \ \mathtt{do} \ \mathtt{stm}, [\mathtt{x} := k, \mathtt{i} := i + 1], \Phi\_{H}(\rho)) \} \end{aligned} (1)$$

$$\xrightarrow[]{1} \rightarrow\_{\mathbb{Q}} \{ p\_k : (\mathtt{while} \, \mathtt{x} \, \mathtt{do} \, \mathtt{stm}, [\mathtt{x} := k, \mathtt{i} := i+1], \rho\_k) \} \}\_{k \in \{0, 1\}},\tag{4}$$

where in the last step, the probability p<sup>k</sup> equals tr(Mk,1ΦH(ρ)), while the normalized quantum state ρ<sup>k</sup> is given as mk,1(ΦH(ρ)). The above reduction is obtained by applying the rules of Figure 4: rule (Wh1) for reduction (1); rules (Exp) and (Seq) for reduction (2); rules (Op) and (Seq) for reduction (3); and fnally rules (Meas) and (Seq) for reduction (4).

For an arbitrary initial quantum state ρ = α β γ δ ∈ D(HQ) (where α, β, γ, δ ∈ C and tr(ρ) = α + δ = 1, γ = β, etc.), it follows that

$$p\_0 = tr(M\_{0,1}H\rho H^\dagger) = tr(\begin{pmatrix} 1 & 0\\ 0 & 0 \end{pmatrix} \frac{1}{2} \begin{pmatrix} \alpha + \beta + \gamma + \delta \ \alpha - \beta + \gamma - \delta\\ \alpha + \beta - \gamma - \delta \ \alpha - \beta - \gamma + \delta \end{pmatrix}) = \frac{1 + \beta + \gamma}{2},$$

On the Hardness of Analyzing Quantum Programs Quantitatively 41

$$\begin{aligned} \text{and that, } p\_1 &= 1 - p\_0 = \frac{1 - (\beta + \gamma)}{2}. \text{ Using } \rho\_k = \frac{M\_{k,1} H \rho H^\dagger M\_{k,1}^\dagger}{tr(M\_{k,1} H \rho H^\dagger)} = \frac{(M\_{k,1} H) \rho (M\_{k,1} H)^\dagger}{p\_k}, \\\ \rho\_0 &= \frac{\begin{pmatrix} \mathbf{1}^\dagger/\mathfrak{T} \ \mathbf{1}/\mathfrak{T} \\\mathbf{0} & \mathbf{0} \end{pmatrix} \begin{pmatrix} \alpha \ \beta \\ \gamma \ \delta \end{pmatrix} \begin{pmatrix} \mathbf{1}/\mathfrak{T} \ \mathbf{0} \\\mathbf{1}/\mathfrak{T} \ \mathbf{0} \end{pmatrix}}{p\_0} = \begin{pmatrix} \mathbf{1} \ \mathbf{0} \\\mathbf{0} \ \mathbf{0} \end{pmatrix} \quad \rho\_1 = \frac{\begin{pmatrix} \mathbf{0} & \mathbf{0} \\\mathbf{1}/\mathfrak{T} \ \mathbf{1}/\mathfrak{T} \\\mathbf{0} & \mathbf{1} \end{pmatrix} \begin{pmatrix} \mathbf{0} & \mathbf{1}/\mathfrak{T} \\\mathbf{0} & \mathbf{1}/\mathfrak{T} \end{pmatrix}}{p\_1} = \begin{pmatrix} \mathbf{0} \ \mathbf{0} \\\mathbf{0} \ \mathbf{1} \end{pmatrix}. \end{aligned}$$

Summarizing (1)–(4) we thus get:

$$\begin{aligned} \{1: (\mathtt{while} \,\mathtt{x} \,\mathtt{do} \,\mathtt{stm}, [\mathtt{x} := 1, \mathtt{i} := i], \left(\begin{smallmatrix} \alpha \ \beta \\ \gamma \ \delta \end{smallmatrix} \right) \} \\ \stackrel{4}{\twoheadrightarrow}\_{\mathsf{q}}^{4} \{p\_{0}; (\mathtt{while} \,\mathtt{x} \,\mathtt{do} \,\mathtt{stm}, [\mathtt{x} := 0, \mathtt{i} := i+1], \rho\_{0}), \\ p\_{1}; (\mathtt{while} \,\mathtt{x} \,\mathtt{do} \,\mathtt{stm}, [\mathtt{x} := 1, \mathtt{i} := i+1], \rho\_{1})\}. \end{aligned}$$

Putting everything together, we have

(Cntoss, s, α β γ δ ) 2 −−−−→→<sup>2</sup> <sup>q</sup> {1 : (while x do stm, [x:=1, i:=0], ρ)} 4 −−−−→→<sup>4</sup> <sup>q</sup> {p0: (while x do stm, [x:=0, i:=1], ρ0), p1: (while x do stm, [x:=1, i:=1], ρ1)} <sup>p</sup>0+4p<sup>1</sup> −−−−−→→<sup>4</sup> <sup>q</sup> {p0: (↓, [x:=0, i:=1], ρ0), p1 2 : (while x do stm, [x:=0, i:=2], ρ0), p1 2 : (while x do stm, [x:=1, i:=2], ρ1)} p1 <sup>2</sup> +4 <sup>p</sup><sup>1</sup> <sup>2</sup> −−−−−→→<sup>4</sup> <sup>q</sup> {p0: (↓, [x:=0, i:=1], ρ0), p1 2 : (↓, [x:=0, i:=2], ρ0), p1 4 : (while x do stm, [x:=0, i:=3], ρ0), p1 4 : (while x do stm, [x:=1, i:=3], ρ1)} p1 <sup>4</sup> +4 <sup>p</sup><sup>1</sup> <sup>4</sup> −−−−−→→<sup>4</sup> q · · ·

where terminal confgurations are underlined. This reduction converges to the terminal distribution

termCntoss(s, ρ) = {p<sup>0</sup> : (↓, [x:=0, i:=1], ρ0)} + { p1 2 i : (↓, [x:=0, i:=i + 1], ρ0)}i≥1, with an expected derivation length of

$$\mathsf{ed}\_{\mathsf{Catess}}(s, \left(\begin{smallmatrix} \alpha \ \beta \\ \gamma \ \delta \end{smallmatrix}\right) = 2 + 4 + (p\_0 + 4p\_1) + \sum\_{i=1}^{\infty} \frac{5p\_1}{2^i} = 7 + 8p\_1 = 11 - 4(\beta + \gamma).$$

For expectation f(s, ρ) ≜ s(i), measuring the iteration counter i, we have

$$\text{qw}\_{\text{Catoss}}\,\,f\,\left(s,\binom{\alpha\,\beta}{\gamma\,\delta}\right) = p\_0 \times 1 + \sum\_{i=1}^{\infty} \frac{p\_1}{2^i}(i+1) = p\_0 + p\_1 + 2p\_1 = 2 - (\beta + \gamma),$$

that is, the mean value held by i holds after execution is 2 − (β + γ). The termination probability is

$$\text{qwp}\_{\text{Cattest}}\left(\lambda\sigma.1\right)\left(s, \binom{\alpha}{\gamma}\_{\delta}\frac{\beta}{\delta}\right) = p\_0 \times 1 + \sum\_{i=1}^{\infty} \frac{p\_1}{2^i} \times 1 = p\_0 + p\_1 = 1,$$

i.e., the program is almost surely terminating.

# 3 Weakest Pre-expectations and Arithmetical Hierarchy

In this section, we study the hardness of some natural quantitative problems for weakest pre-expectations and expected derivation length.

Computability-Aimed Restrictions. This subsection is devoted to putting some restrictions on programs and on the considered notion of expectation to overcome the issues of computability, mentioned in the introduction.

Algebraic numbers. Towards this end, our solution is to target a subset of complex numbers, where simple operations like equality are decidable. We consider the set Q of algebraic numbers, i.e., complex numbers in C that are roots of a nonzero polynomial in Q[X]. Let A ≜ Q ∩ R be the real closed feld of real algebraic numbers in R. The following inclusions trivially hold (i) N ⊆ Q ⊆ A ⊆ R ⊆ C and (ii) Q ⊆ C. It was proved in [18, Proposition 2.2] that equality over Q and inequality over A are decidable using Cohn's representation [12]. It is well-known that the product and sum over Q are computable in polynomial time.

We now restrict the program semantics to matrices and density operators over algebraic numbers. Given a totally ordered set of qubits Q = {q1, . . . , qn}, let H˜<sup>Q</sup> be the Hausdorf pre-Hilbert space Q 2 n (i.e., the completeness requirement on Hilbert spaces is withdrawn) of n qubits defned by H˜<sup>Q</sup> ≜ ⊗<sup>n</sup> <sup>i</sup>=1H˜ qi , with H˜ <sup>q</sup> ≜ Q 2 being the vector space of computational basis {|0⟩, |1⟩} over the feld Q. Let M(H˜Q) and D(H˜Q) be the set of matrices and density operators on H˜Q, respectively.

Cliford+T gates. For the program semantics to be defned on the space D(H˜Q), the considered quantum gates are now restricted to gates whose corresponding unitary operators are in M(H˜Q), i.e., have a matrix representation over the algebraic numbers. To this end, we consider a restriction to the Cliford+T gates: I, X, Y, Z, H, S, CNOT, and T, whose unitary matrices are given below:

$$I \stackrel{\triangle}{=} \begin{pmatrix} \begin{smallmatrix} 1 \ 0 \end{smallmatrix} \end{smallmatrix}, \ X \stackrel{\triangle}{=} \begin{pmatrix} \begin{smallmatrix} 0 \ 1 \end{smallmatrix} \end{smallmatrix}, \ Y \stackrel{\triangle}{=} \begin{pmatrix} \begin{smallmatrix} 0 \ -i \end{smallmatrix} \end{smallmatrix}, \ Z \stackrel{\triangle}{=} \begin{smallmatrix} \begin{smallmatrix} 1 \ 0 \end{smallmatrix} \end{smallmatrix}, \ H \stackrel{\triangle}{=} \frac{1}{\sqrt{2}} \begin{pmatrix} \begin{smallmatrix} 1 \ -1 \end{smallmatrix} \end{smallmatrix}, \end{pmatrix},$$

$$S \stackrel{\triangle}{=} \frac{1}{\sqrt{2}} \begin{pmatrix} \begin{smallmatrix} 1 \ 0 \end{smallmatrix} \end{pmatrix}, \ CNOT \stackrel{\triangle}{=} \begin{pmatrix} \begin{smallmatrix} 1 \ 0 \ 0 \ 0 \end{smallmatrix} \end{pmatrix}, \ T \stackrel{\triangle}{=} \begin{pmatrix} \begin{smallmatrix} 1 & 0 \\ 0 \ 1 \ 0 \ 0 \end{smallmatrix} \end{pmatrix}.$$

The Cliford+T fragment is the set of unitary transformations generated by sequential (matrix multiplication) and parallel (Kronecker product) compositions of the gates H, S, CNOT, and T. This constitutes a reasonable restriction for unitary operators as it is known to be the simplest approximately universal fragment of quantum mechanics [1].

A central observation is that the superoperator associated with a unitary operator of the Cliford+T fragment is an endomorphism over density operators in D(H˜Q).

Lemma 1. The Cliford+T fragment preserves D(H˜Q), i.e., there exist Q and q ∈ Q such that for each unitary operator U of the Cliford+T fragment ΦU<sup>q</sup> ∈ D(H˜Q) → D(H˜Q).

Notice that, while a restriction to Cliford+T is reasonable in terms of quantum mechanics and universality, our result can be extended by adding any quantum gate preserving the above lemma. For example, the phase shift gate, defned by P<sup>φ</sup> ≜ 1 0 0 e iφ , preserves D(H˜Q) whenever φ = rπ, for any r ∈ Q.

Let StmtCT be the set of statements restricted to quantum gates computing Cliford+T unitary operators (hence a subset of Stmt), StCT be the set of states whose quantum state is in D(H˜Q), and ConfCT be the set of well-formed confgurations in (StmtCT ∪ {↓}) × StCT. Let Ststm CT be the set of states in StCT that are well-formed wrt statement stm. Once again, by abuse of notation, we will denote this set by StCT when stm is clear from the context.

A consequence of Lemma 1 is that ConfCT is closed under reduction, in the following sense. Let Dfn <sup>A</sup><sup>+</sup> (A) ⊆ D(A) be the set of fnitely supported subdistributions δ with algebraic probabilities, i.e., δ(a) ∈ A <sup>+</sup> for all a ∈ A.

Lemma 2. The set D fn <sup>A</sup><sup>+</sup> (ConfCT) is stable under reduction, more precisely, if δ ∈ Dfn <sup>A</sup><sup>+</sup> (ConfCT) and δ <sup>c</sup>−→→<sup>q</sup> ε, then ε ∈ Dfn <sup>A</sup><sup>+</sup> (ConfCT) and c ∈ A +.

Computable expectations. We also restrict the expectation codomain to algebraic numbers. Hence the considered expectations will be functions in StCT → A <sup>+</sup>. On its own, this restriction is not sufcient for our concerns, as the set StCT → A <sup>+</sup> is not countable. It implies that there exist expectations in StCT → A <sup>+</sup> that are not computable functions. To resolve this issue, we restrict the space of expectations further to computable ones:

$$\mathbf{E\_{CT}} \stackrel{\scriptstyle \Delta}{=} \{ f \mid f: \mathbf{St\_{CT}} \to \mathbb{A}^+, \text{ } f \text{ compatible} \}.$$

An immediate consequence of Lemma 2 is that termstm(σ) ∈ D(ConfCT) for any stm ∈ StmtCT and σ ∈ StCT. In consequence, qwpstm f σ is well-defned for all f ∈ StCT. This justifes that in our treatment below, we restrict expectations to the class ECT. However, keep in mind that despite Lemma 2, the subdistribution termstm(σ), obtained at the limit, does not fall within Dfn <sup>A</sup><sup>+</sup> (A). It is neither fnite nor are probabilities algebraic (A <sup>+</sup> is not complete). In particular, in general qwpstm f σ is a real number, rather than an algebraic one.

Quantitative Problems. We now defne formally the quantitative problems that we study.

Testing problems. Some natural quantitative problems related to weakest preexpectations are to determine for a given program stm, a given state σ, a given expectation f, and a given algebraic number a, whether the corresponding weakest pre-expectation qwpstm f σ is smaller than or equal to a. In this setting, it makes sense to consider any possible relation in the set {<, ≤, =, ≥, >} ⊆ P(A × A) as one could be interested in fnding precise values, (strict) upper- or lower-bounds.

Defnition 3. The testing problem sets Test<sup>R</sup> ⊆ ConfCT × ECT × A <sup>+</sup>, for R ∈ {<, ≤, =, ≥, >}, are defned by:

$$(\mathfrak{stm}, \sigma, f, a) \in \operatorname{Tessr}\_{\mathcal{R}} : \Longleftrightarrow \ (\mathfrak{q}\mathfrak{w}\mathfrak{p}\_{\mathfrak{stm}} \ f \ \sigma) \operatorname{\mathcal{R}} a.s.$$

The consideration of both Test<sup>≤</sup> and Test<sup>&</sup>gt; may seem redundant, as Test<sup>&</sup>gt; can be viewed as the complement of Test≤. However, it makes perfect sense to distinguish both properties, when considering the corresponding universal problems, as we do in a moment.

Finiteness problem. Another problem of interest consists in checking whether the weakest pre-expectations produces some fnitary output.

Defnition 4. The fniteness problem set Test̸=<sup>∞</sup> ⊆ ConfCT × ECT is defned by:

(stm, σ, f) <sup>∈</sup> Test̸=<sup>∞</sup> :⇐⇒ qwpstm f σ < <sup>∞</sup>.

Termination problems. We also defne two termination problems for almost sure termination and positive almost sure termination:

Defnition 5. The sets of ( positive) almost-sure terminating confgurations Ast ⊆ ConfCT (Past ⊆ ConfCT) are defned by:

$$\begin{aligned} (\texttt{stm}, \sigma) &\in \texttt{Astr} : \Longleftrightarrow \ |\texttt{term}\_{\texttt{stm}}(\sigma)| = 1 \\ (\texttt{stm}, \sigma) &\in \texttt{Past} : \Longleftrightarrow \ \texttt{ed}\_{\texttt{stm}}(\sigma) < \infty. \end{aligned}$$

It is well-known that Past ⊊ Ast, cf. [9].

Universal problems. Another kind of natural problems arises if one tries to check some properties for each possible program input (i.e., for each state σ). We can thus defne universal properties for each of the sets described previously.

Defnition 6. The sets of universal testing, fniteness and (positive) a.s. termination problems are defned by:

$$\begin{aligned} & (\texttt{stm}, f, g) \in \mathrm{UTEST}\_{\mathcal{R}} \subseteq \texttt{Stmt}\_{\mathrm{CT}} \times \texttt{E}^{2}\_{\mathrm{CT}} \iff \forall \sigma \in \texttt{St}\_{\mathrm{CT}}, \,(\texttt{stm}, \sigma, f, g(\sigma)) \in \texttt{TEST}\_{\mathcal{R}}\\ & (\texttt{stm}, f) \in \mathrm{UTEST}\_{\neq \infty} \subseteq \texttt{Stmt}\_{\mathrm{CT}} \times \texttt{E}\_{\mathrm{CT}} \iff \forall \sigma \in \texttt{St}\_{\mathrm{CT}}, \,(\texttt{stm}, \sigma, f) \in \texttt{TEST}\_{\neq \infty}\\ & \texttt{stm} \in \texttt{UAsr} \subseteq \texttt{Stmt}\_{\mathrm{CT}} \iff \forall \sigma \in \texttt{St}\_{\mathrm{CT}}, \,(\texttt{stm}, \sigma) \in \texttt{Astr} \\ & \texttt{stm} \in \texttt{UPAsr} \subseteq \texttt{Stmt}\_{\mathrm{CT}} \iff \forall \sigma \in \texttt{St}\_{\mathrm{CT}}, \,(\texttt{stm}, \sigma) \in \texttt{Past} \end{aligned}$$

Example 3. We have Cntoss ∈ UAst and Cntoss ∈ UPast, for the program Cntoss of Figure 3. Indeed, it was shown in Example 2 that Cntoss terminates with probability 1 and a fnite expected derivation length. This property holds for any input of the domain. In the same example, we have proven (Cntoss, f) ∈ Test̸=<sup>∞</sup> for f(s, ρ) = s(i). Indeed, we have shown the stronger property (Cntoss, f, g) <sup>∈</sup> Test=, where <sup>g</sup>(s, α β γ δ ) = 2 − (β + γ).

On the Hardness of Analyzing Quantum Programs Quantitatively 45


Table 1. Completeness results for quantitative problems in the arithmetical hierarchy.

Completeness Results in the Arithmetical Hierarchy. In what follows, we place the introduced quantitative problems within the arithmetical hierarchy [34]. The arithmetical hierarchy is a means to classify and relate undecidable problems wrt. to their inherent difculty, measured in terms of the number of (unbounded) quantifer alternations needed to state the problem as a formula in frst-order arithmetic, based on a decidable (recursive) predicate.

Reminder on the arithmetical hierarchy. Classes of the arithmetical hierarchy are defned inductively as follows:

$$\begin{aligned} \Pi^0\_0 = \Sigma^0\_0 \stackrel{\Delta}{=} \text{REC}, \quad \text{REC being the class of decidable problems (recursive sets)},\\ \Pi^0\_{n+1} \stackrel{\Delta}{=} \{\psi \mid \exists \phi \in \Sigma^0\_n, \ \forall \overline{x}. (\psi(\overline{x}) \iff \forall \overline{y}. \phi(\overline{x}, \overline{y}))\},\\ \Sigma^0\_{n+1} \stackrel{\Delta}{=} \{\psi \mid \exists \phi \in \Pi^0\_n, \ \forall \overline{x}. (\psi(\overline{x}) \iff \exists \overline{y}. \phi(\overline{x}, \overline{y}))\}. \end{aligned}$$

For each n, Π<sup>0</sup> n is the complement of Σ<sup>0</sup> n (i.e., Π<sup>0</sup> <sup>n</sup> = co-Σ<sup>0</sup> n , and vice versa) and it is a well-known result that Σ<sup>0</sup> <sup>1</sup> and Π<sup>0</sup> 1 correspond to the classes re of recursively enumerable (i.e., semi-decidable) problems and co-re of co-recursively enumerable (i.e., co-semi-decidable) problems, respectively. Given the sets A ⊆ X and B ⊆ Y , we write A ≤<sup>m</sup> B (A is many-one reducible to B) if there exists a computable function f : X → Y such that ∀x ∈ X, x ∈ A ⇐⇒ f(x) ∈ B. Given a class c of the arithmetical hierarchy and a set A, A is c-hard if ∀B ∈ c, B ≤<sup>m</sup> A. A set A is c-complete if A ∈ c and A is c-hard. It is well-known that if a set A is c-complete then its complement, noted co-A, is (co-c)-complete.

Results. Table 1 associates the quantum decision problems to the corresponding classes in the arithmetical hierarchy for which we have proven them complete, that is, we have proven membership and hardness for the corresponding class. Some of the results may seem surprising. For instance, the testing problem Test>, i.e., deciding qwpstm f σ > a within the Cliford+T fragment, turns out to be recursive enumerable. It is thus classifed identical to the (classical) halting problem H. § Remarkable, through the restriction to the Cliford+T fragment, corresponding problems are ranked within the arithmetical hierarchy identical to their non-quantum counterparts (see [37,24]). This observation holds for all problems apart those marked with (‡) which, to the best of our knowledge, have not been studied in a classical/probabilistic setting. Π<sup>0</sup> 2 - and Π<sup>0</sup> 3 -completeness of the universal testing problems, given relations > and < respectively, has been conjectured by Kaminski in his PhD thesis [25] for probabilistic programs.

A crucial observation towards these results is that, restricting to the Clifford+T fragment, the weakest pre-expectation of a program P can be approximated through computable transformers qwp ≤n stm : ECT → ECT that limit execution of stm to at most n ∈ N reduction steps. That is,

$$\mathfrak{qw}\_{\mathfrak{stm}}^{\leq n} f \; \sigma \triangleq \mathbb{E}\_{\mathsf{term}\_{\mathsf{stn}}^{\leq n}(\sigma)}(f),$$

for term<sup>≤</sup><sup>n</sup> stm(σ) the distribution of terminal confgurations obtained within n reduction steps, when evaluating (stm, σ). With regards to the above mentioned Test<sup>&</sup>gt; ∈ Σ<sup>0</sup> 1 for instance, observe that:

$$\begin{aligned} (\mathsf{stm}, f, \sigma, a) \in \mathsf{Tessr}\_{>} &\iff \mathsf{qwp}\_{\mathsf{stm}} \, f \, \sigma > a \\ &\iff \lim\_{i \to \infty} \mathsf{qwp}\_{\mathsf{stm}}^{\leq n} \, f \, \sigma > a \\ &\iff \exists n \in \mathbb{N}, \exists \delta \in \mathbb{A}^{+} \, \backslash \, \{0\}, \; \mathsf{qwp}\_{\mathsf{stm}}^{\leq n} \, f \, \sigma \geq a + \delta. \end{aligned}$$

Crucially, the predicate qwp ≤n stm f σ ≥ a+δ becomes computable. In essence, this is a consequence of Lemma 2: The n-th step normal form distribution term<sup>≤</sup><sup>n</sup> stm(σ) is fnite and computable, as f is computable so is thus qwp ≤n stm f σ. From here, the result follows now as equality on A is decidable. The proof of this, as well as all completeness proofs listed in Table 1 can be found in the Appendix. The following constitutes our frst main result.

Theorem 1. All completeness results in Table 1 hold.

# 4 Quantum Expectation Transformers

In what follows, we are interested in deliniating subclasses of testing problems that lead to decidability. To this end, we now defne a notion of quantum expectation transformer as a means to compute symbolically the weakest preexpectation of a program. We frst introduce some preliminary notations in order to lighten the presentation.

Notations. For any expression <sup>e</sup>, <sup>J</sup>e<sup>K</sup> is a shorthand notation for the function <sup>λ</sup>(s, ρ).Je<sup>K</sup> <sup>s</sup> ∈ St → R <sup>+</sup><sup>∞</sup>. We will also use f[x := e] for the expectation <sup>λ</sup>(s, ρ).f(s[<sup>x</sup> := <sup>J</sup>e<sup>K</sup> s ], ρ). Similarly, for a given map χ : D(HQ) → D(HQ),

<sup>§</sup> In our context the halting set H can be defned as the class of classical programs and states (P, σ) for which P is halting on σ.

qet[ skip ]{f} ≜ f qet[ x = e ]{f} ≜ f[x := e] qet[ stm1; stm<sup>2</sup> ]{f} ≜ qet[ stm<sup>1</sup> ]{qet[ stm<sup>2</sup> ]{f}} qet[ if <sup>b</sup> then stm<sup>1</sup> else stm<sup>2</sup> ]{f} <sup>≜</sup> qet[ stm<sup>1</sup> ]{f} <sup>+</sup>Jb<sup>K</sup> qet[ stm<sup>2</sup> ]{f} qet[ while b do stm ]{f} ≜ lfp λF.qet[ stm ]{F} <sup>+</sup>Jb<sup>K</sup> <sup>f</sup> qet[ q ∗= U ]{f} ≜ f[Φ<sup>U</sup><sup>q</sup> ] qet[ x = meas q<sup>i</sup> ]{f} ≜ f[x := 0; m0,i] +<sup>p</sup>0,i f[x := 1; m1,i].

Fig. 5. Quantum expectation transformer qet[ · ]{·}

f[χ] ≜ λ(s, ρ).f(s, χ(ρ)). We will also sometimes group such state modifcations, for instance, f[x := e; χ] stands for (f[x := e])[χ] and f[x := e, y := e ′ ] stands for (f[x := e])[y := e ′ ].

For p ∈ St → [0, 1] and f, g ∈ St → R <sup>+</sup><sup>∞</sup>, f +<sup>p</sup> g denotes the function λσ.p(σ) · f(σ) + (1 − p(σ)) · g(σ) ∈ St → R <sup>+</sup><sup>∞</sup>, similar we use f · g to denote λσ.f(σ) · g(σ) ∈ St → R <sup>+</sup><sup>∞</sup>. Thus, for instance, <sup>f</sup>[<sup>x</sup> := <sup>x</sup> + 1] +Jx=1<sup>K</sup> <sup>f</sup> behaves like f, except that x is frst incremented when applied to states with classical variable x equal to 1. In correspondence to the normalization of quantum state mk,i, we defne probabilities pk,i ≜ λρ.tr(Mk,iρM† k,i). We overload this function from D(HQ) to St s.t. pk,i(s, ρ) = pk,i(ρ). In this way, f[x := 0; m0,i] +p0,i f[x := 1; m1,i] computes precisely the expected value of f on the distribution of states obtained by measuring the i-th qubit and assigning the outcome to classical variable x.

Finally, we denote by ≤ also the pointwise extension of the order from R +∞ to functions, that is, f ≤ g holds if ∀σ ∈ St, f(σ) ≤ g(σ).

Defnition 7 (Quantum expectation transformer). The quantum expectation transformer consists in a program semantics mapping expectations to expectations in a continuation passing style

$$(\mathbf{qet}[\cdot] \{ \cdot \} : \mathbf{Stmt} \to (\mathbf{St} \to \mathbb{R}^{+\infty}) \to (\mathbf{St} \to \mathbb{R}^{+\infty}))$$

and is defned inductively on statements in Figure 5.

This transformer corresponds to the notion of expected value transformer of [6] on the Kegelspitze S = (R <sup>+</sup><sup>∞</sup>, +f), with +<sup>f</sup> being the forgetful addition. In the case of loops, the least fxed point lfp is defned with respect to the pointwise ordering on the function space St → R <sup>+</sup><sup>∞</sup>. Equipped with this ordering, this space forms a ω-CPO. As the quantum transformer can be shown to be ω-continuous, the fxed-point is always defned, cf. [44].

Theorem 2 (Adequacy). The following holds:

$$\forall \texttt{strm} \in \texttt{Stmt}, \,\,\forall f: \texttt{Stt} \to \mathbb{R}^{+\infty}, \,\,\texttt{qwp}\_{\texttt{strm}}(f) = \texttt{qect}[\texttt{stm}]\{f\}\,\,\,.$$


Fig. 6. Universal laws derivable for the quantum expectation transformer.

Apart from continuity, the quantum expectation transformer satisfes several useful laws, see Figure 6. The (monotonicity) Law permits us to reason modulo upper-bounds: actual expectations can be always substituted by upper-bounds. It is in fact an immediate consequence from the (continuity) Law, which is defned for any ω-chain (fi)<sup>i</sup> . The (upper invariance) Law constitutes a generalization of the notion of invariant stemming from Hoare calculus. It is used to fnd closed-form upper-bounds g to expectations f of loops. The pre-conditions state that g should dominate f on states where the loop would immediately exist, and otherwise, should remain invariant under iteration. It is worth mentioning that this proof rule is not only sound, but also complete, in the sense that any upper-bound satisfes the two constraints. The following example illustrates the use of this rule on the running example.

Example 4. Following Example 2, we over-approximate qet[ Cntoss ]{f}, for f(s, ρ) = s(i) the post-expectation measuring the classical variable i.

To this end, observe that the function g : St → R <sup>+</sup><sup>∞</sup> is an upper-invariant (Figure 6) to the while loop while x do stm, given a post-expectation f : St → R <sup>+</sup><sup>∞</sup>. Recall that the loop body stm comprises (i = i+1; q ∗= H; x = meas q). To fulfll the conditions of the (upper invariance) Law the following inequalities have to be met:

$$[\![\neg \mathbf{x}]\!] \cdot f \le g \qquad \text{[\![\mathbf{x}]\!] \cdot \mathbf{q} \bullet \text{t}\!] \cdot \mathbf{i} = \mathbf{i} + 1 \text{; } \mathbf{q} \text{ \*= H ; } \mathbf{x} = \mathbf{m} \mathbf{a} \mathbf{s} \neq \mathbf{q} \text{]} \{g\} \le g \qquad \text{(5)}$$

By unfolding the defnition, we see

$$\begin{split} & \mathsf{qet}[\mathbf{i} = \mathbf{i} + \mathbf{l} + 1; \; \mathbf{q} \ast = \mathsf{H}; \; \mathbf{x} = \mathsf{meas} \; \mathbf{q} \Big] \{g\} \\ & = \mathsf{qet}[\mathbf{i} = \mathbf{i} + 1 \,] \{\mathsf{qet}[\; \mathbf{q} \ast = \mathsf{H} \, \Big] \{\mathsf{qet}[\mathbf{x} = \mathsf{meas} \; \mathbf{q} \,] \{g\} \} \\ & = \mathsf{qet}[\mathbf{i} = \mathbf{i} + 1 \,] \{\mathsf{qet}[\; \mathbf{q} \ast = \mathsf{H} \, \Big] \{g[\mathbf{x} = 0; \; m\_{0,1}] +\_{p\_{0,1}} g[\mathbf{x} = 1; \; m\_{1,1}] \} \} \\ & = \mathsf{qet}[\mathbf{i} = \mathbf{i} + 1 \,] \{g[\mathbf{x} = 0; \; m\_{0,1}; \; \Phi\_{H}] +\_{p\_{0,1}} \Phi\_{H} \, g[\mathbf{x} = 1; \; m\_{1,1}; \; \Phi\_{H}] \} \\ & = g[\mathbf{x} = 0; \; m\_{0,1}; \; \Phi\_{H}; \; \mathbf{i} = \mathbf{i} + 1] +\_{p\_{0,1}} \Phi\_{H} \, g[\mathbf{x} = 1; \; m\_{1,1}; \; \Phi\_{H}, \; \mathbf{i} = \mathbf{i} + 1] \} \\ & = \lambda(s, \rho) \sum\_{k \in \{0, 1\}} p\_{k,1}(\Phi\_{H}(\rho)) \cdot g(s[\mathbf{x} = k, \mathbf{i} = \mathbf{i} + 1], m\_{k,1}(\Phi\_{H}(\rho))) ). \end{split}$$

By using the identities computed already in Example 2, we thus obtain

$$\mathbf{qet}[\texttt{stm}]\{g\}(s,\left(\begin{smallmatrix}\alpha\\\gamma\end{smallmatrix}\right))=\sum\_{k\in\{0,1\}}p\_k\cdot g(s[\mathbf{x}=k,\mathbf{i}:=\mathbf{i}+1],\rho\_k),\tag{6}$$

On the Hardness of Analyzing Quantum Programs Quantitatively 49

where, as in Example 2, p<sup>0</sup> = 1+β+γ 2 , p<sup>1</sup> = 1−(β+γ) 2 , ρ<sup>0</sup> = ( 1 0 0 0 ) and ρ<sup>1</sup> = ( 1 0 0 0 ) α β

We claim that g(s, γ δ ) ≜ s(i) + s(x) · (2 − (β + γ)) is an upper-bound to the pre-expectation of the while loop wrt. to the post expectation f. To this end, we check (5). The frst inequality is trivially satisfed. Concerning the second, notice that by defnition,

g(s[x = 0, i:=i+1],( 1 0 0 0 )) = s(i)+1 and g(s[x = 1, i:=i+1],( 0 0 0 1 )) = s(i)+3. By (6) we have

$$\begin{aligned} \mathsf{qet}[\mathsf{stm}]\{g\}(s, \binom{\alpha}{\gamma} \frac{\beta}{\delta}) &= \frac{1+\beta+\gamma}{2}(s(\mathbf{i})+1) + \frac{1-(\beta+\gamma)}{2}(s(\mathbf{i})+3) \\ &= (s(\mathbf{i})+2) - (\beta+\gamma) = g(s, \binom{\alpha}{\gamma} \frac{\beta}{\delta}), \end{aligned}$$

from which now the second constraint follows by case analysis on the value of x. Hence qet[ while x do stm ]{f} ≤ g and, by monotonicity (Figure 6),

$$\begin{split} \mathtt{qet}[\mathtt{Cutoss}]\{f\}\{s,\binom{\alpha\ \beta}{\gamma\ \delta}\} &\leq \mathtt{qet}[\mathtt{x}=\mathtt{tt};\ \mathtt{i}=0]\{g\}\left(s,\binom{\alpha\ \beta}{\gamma\ \delta}\right) \\ &= g([\mathtt{x}:=1,\mathtt{i}:=0],\binom{\alpha\ \beta}{\gamma\ \delta}) = 2 - (\beta+\gamma). \end{split}$$

Note that, in this case, the computed bound is exact.

One question of interest is to fnd the qet[ · ]{·} of a given statement. We obtain the following completeness results as a corollary of Theorem 1 and Theorem 2 on the Cliford+T fragment.

Corollary 1. The following completeness results hold:

– {(stm, f, g) ∈ StmtCT × E 2 CT | ∀σ, qet[ stm ]{f} (σ) = g(σ)} is Π<sup>0</sup> 2 -complete. – {(stm, f, g) ∈ StmtCT × E 2 CT | ∀σ, qet[ stm ]{f} (σ) ≤ g(σ)} is Π<sup>0</sup> 1 -complete.

The same kind of result can be straightforwardly obtained for each of the quantitative problems defned in previous section. All the corresponding sets are undecidable: they are at best (co-)semi-decidable as illustrated by Figure 1. This motivates us for restricting the problem a bit further to fnd a class of functions for which the quantitative problems for wpstm f can be decided.

# 5 Decidability of qet Inference over a Real Closed Field

Corollary 1 illustrates that it is not sufcient to relax the problem of fnding the quantum expectation transformer of a given statement to upper-bounds, in order to make it decidable. The undecidability of fnding the quantum expectation transformer of a given program is due to two other issues: 1) Issue 1: The computation of a fxpoint for qet[ · ]{·} in the case of while loops, 2) Issue 2: The check for inequalities over functions in ECT, whose frst-order theory is not decidable. This section is devoted to overcoming these two issues, by fnding an expressive fragment on which the inference of an upper-bound of the quantum expectation transformer becomes decidable.

qinf[ skip ]{F} ≜ F qinf[ x = e ]{F} ≜ F[x := e] qinf[ stm1; stm<sup>2</sup> ]{F} ≜ qinf[ stm<sup>1</sup> ]{qinf[ stm<sup>2</sup> ]{F}} qinf if<sup>ℓ</sup> b then stm<sup>1</sup> else stm<sup>2</sup> {F} <sup>≜</sup> <sup>X</sup>ℓ, with side-cond. ( b ⊢ qinf[ stm<sup>1</sup> ]{F} ≤ X<sup>ℓ</sup> ¬b ⊢ qinf[ stm<sup>2</sup> ]{F} ≤ X<sup>ℓ</sup> qinf- while<sup>ℓ</sup> b do stm {F} <sup>≜</sup> <sup>X</sup>ℓ, with side-cond. ( b ⊢ qinf[ stm ]{Xℓ} ≤ X<sup>ℓ</sup> ¬b ⊢ F ≤ X<sup>ℓ</sup> qinf[ q ∗= U ]{F} ≜ F[ϕ<sup>U</sup><sup>q</sup> ] qinf- x = meas<sup>ℓ</sup> qi {F} ≜ Xℓ, with side-cond. p0,i = 0 ⊢ F[x := 1; m1,i] ≤ X<sup>ℓ</sup> p1,i = 0 ⊢ F[x := 0; m0,i] ≤ X<sup>ℓ</sup> pk,i ̸= 0 ⊢ F[x := 0; m0,i] +<sup>p</sup>0,i F[x := 1; m1,i] ≤ X<sup>ℓ</sup>

Fig. 7. Term representations of qinf[ · ]{·} and their corresponding side-conditions.

Symbolic Inference. As a frst step towards automated inference, we defne a symbolic variant of the quantum expectation transformer in Figure 7. In the case of conditionals, loops, and measurements, we will use fresh variables for expectations; side conditions will guarantee that these variables indeed denote (upper-bounds to) the corresponding expectations. This means that the symbolic version yields correct results only when the expectations assigned to these variables satisfy all the side conditions. By solving the generated constraints, viz., by fnding an interpretation of ascribed variables that satisfy the imposed side-conditions, we efectively arrive at an inference procedure overcoming Issue 1.

To formalize this approach, we associate a unique label ℓ with each loop, conditional, and measurement, occurring in the considered program. Notationally, we write while<sup>ℓ</sup> b do stm / if<sup>ℓ</sup> b then stm<sup>1</sup> else stm<sup>2</sup> / meas<sup>ℓ</sup> q. Such labels permit us to associate a unique expectation variable X<sup>ℓ</sup> to each of these constructs. Given a set of such expectation variables EVar, the set of terms ETerm, upon which the symbolic quantum expectation transformer operates, is defned according to the following grammar:

$$\mathsf{ETern } F, G ::= X \mid F[\mathbf{x} := \mathbf{e}] \mid F[\chi] \mid F +\_p G,$$

where X stand for an arbitrary expectation variable in EVar. As stressed above, X will be used to denote certain expectations wrt. loops, conditionals, and measurements. We have already introduced the notations F[x := e] and F[χ] to represent updates to the classical and quantum state, respectively. Here, χ will always denote a fnite composition of superoperators ϕ<sup>U</sup> and measurements mk,i. By ensuring that normalization of quantum states mk,i(ρ) is never considered in the degenerate case of a zero-probability measurement pk,i(ρ), it will thereby always be possible to write χ as λρ. MρM† tr(NρN†) , for some M ∈ M(H˜Q) in the Clifford+T fragment. Finally, following the same reasoning, in the barycentric sum F +<sup>p</sup> G the probability p is a function in the quantum state, and will always be of general form λρ. tr(MρM† ) tr(NρN†) , for some M, N ∈ M(H˜Q) in the Cliford+T fragment. Similar to before, we may group updates such as in F[x := e; χ].

The symbolic variation of the expectation transformer can now be defned as

$$\mathsf{qinff}[\cdot]\{\cdot\}: \mathsf{Stmt} \to \mathsf{ETern} \to \mathsf{ETern},$$

generating also a set of side-conditions of the shape Γ ⊢ F ≤ G, with the intended meaning that G binds F on all input states that satisfy the predicate Γ. The full defnition of qinfer is given in Figure 7. As already hinted, the side conditions ensure that introduced variables X<sup>ℓ</sup> indeed yield an upper-bound on the corresponding expectation, in the case of conditionals by case-analysis, and in the case of loops via an application of the upper-invariant law from Figure 6. In the case of measurements, mk,i and pk,i are defned exactly as before. Here, we single out the two cases where the probability of a measurement, either p0,i(ρ) = tr(M0,iρ) = tr(M0,iρM† <sup>0</sup>,i) or p1,i(ρ) = 1−p0,i(ρ), is zero. This way, we avoid the case analysis underlying the defnition of mk,i and may, wlog., assume that it is indeed of the form λρ. <sup>M</sup>k,iρM† k,i tr(Mk,iρM† k,i) , with non-zero trace tr(Mk,iρM† k,i).

Example 5. In correspondence to Example 4, let us consider the application of the inference procedure on the program Cntoss, wrt. to the post-expectation f(s, ρ) = s(i). We label the loop and measurement with m and w, respectively.

Let X denote the post-expectation f. Unfolding the defnition, we see

$$\begin{aligned} \mathtt{qinf}[\mathtt{Cutosz}]\{X\} &= \mathtt{qinf}[\mathtt{x} = \mathtt{tt}; \; \mathtt{i} = 0; \; \mathtt{while}^{\mathtt{w}} \mathtt{x} \,\mathtt{do} \,\mathtt{stm}]\{X\} \\ &= X\_{\mathtt{w}}[\mathtt{x} := 1; \; \mathtt{i} := 0], \end{aligned}$$

generating the side-conditions x ⊢ Xm[ΦH; i:=i+1] ≤ X<sup>w</sup> and ¬x ⊢ X ≤ X<sup>w</sup> . The left-hand side of the frst constraint is obtained from

$$\begin{split} \mathsf{qinf}[\mathsf{stm}]\{X\_{\mathsf{w}}\} &= \mathsf{qinf}[\mathsf{i} = \mathsf{i} + 1] \{\mathsf{qinf}[\mathsf{q} \ast = H] \{\mathsf{qinf}[\mathsf{meas}^{\mathsf{m}} \neq \mathsf{q}] \{X\_{\mathsf{w}}\}\} \} \\ &= X\_{\mathsf{m}}[\mathsf{\Phi}\_{H}\mathsf{;} \ \mathsf{i} := \mathsf{i} + 1] .\end{split}$$

Note that this expansion generates further constraints, this time on X<sup>m</sup> representing the measurement. Specifcally, it yields the following constraints:

$$\begin{aligned} p\_{1-k,1} &= 0 \vdash X\_{\mathbf{w}}[\mathbf{x} := k; m\_{k,1}] \leq X\_{\mathbf{m}}, \qquad \text{(for } k \in \{0, 1\}),\\ p\_{0,1} &\neq 0 \neq p\_{1,1} \vdash X\_{\mathbf{w}}[\mathbf{x} := 0; m\_{0,1}] +\_{p\_{0,1}} X\_{\mathbf{w}}[\mathbf{x} := 1; m\_{1,1}] \leq X\_{\mathbf{m}}. \end{aligned}$$

Using the analysis from Example 4, we interpret X<sup>w</sup> and X<sup>m</sup> as:

$$\begin{aligned} \alpha(X\_{\mathsf{w}}) & \triangleq \lambda(s, \begin{pmatrix} \alpha \ \beta \\ \gamma \ \delta \end{pmatrix}) \cdot s(\mathsf{i}) + s(\mathsf{x})(2 - (\beta + \gamma)),\\ \alpha(X\_{\mathsf{m}}) & \triangleq \lambda(s, \begin{pmatrix} \alpha \ \beta \\ \gamma \ \delta \end{pmatrix}) \cdot s(\mathsf{i}) + 2 - 2\alpha. \end{aligned}$$

Furthermore, we interpret the input variable X as f, i.e., α(X) ≜ λ(s, ρ). s(i). Notice how α(Xw) just corresponds to the upper-invariant g derived in Example 4. Using the assignment, it is now standard to check that it is a solution to the fve constraints. For instance, considering states σ = ({i:=n, x:=x}, α β γ δ ), the ultimate constraint amount to the implication

$$
\alpha \ne 0 \ne \delta \Rightarrow n +\_{\alpha} (n+2) \le n + 2 - 2\alpha,
$$

which trivially holds. Finally, recall qinf[ Cntoss ]{X} = Xw[x:=1; i:=0]. This term is interpreted as λ(s, α β γ δ ). 2 − (β + γ), yielding the optimal bound computed in Example 4.

Example 6. Re-consider program RUS depicted in Figure 1. Here, we are interested in an upper-bound on the number of T-gates, counted by the program variable i. As before, we label the loop and measurement with m and w, respectively. Let

$$\texttt{stm} = \overbrace{\mathbf{q}\_2 = |0\rangle\;\;\;\;\;\dots \;\;\;\mathbf{x} = \texttt{meas}^m\;\mathbf{q}\_2,\;\mathbf{x}$$

be the body of the while loop statement (see Figure 1). We proceed with the analysis backwards. By the rules of Figure 7 it holds that qinf[ stm<sup>0</sup> ]{F} = F[Φ; i:=i+2] for any F, where Φ gives the quantum state updates within stm0. Unfolding defnitions, we have qinf[ RUS ]{X} = Xw[x:=0; i:=1] with x ⊢ Xm[Φ; i:=i+2] ≤ X<sup>w</sup> and ¬x ⊢ X ≤ X<sup>w</sup> , since, by the above observation,

$$\mathtt{qinf}[\mathtt{stm}]\{X\_{\mathtt{w}}\} = \mathtt{qinf}[\mathtt{stm}\_{0}]\{\mathtt{qinf}[\mathtt{x} = \mathtt{meas}^{\mathtt{m}}\ \mathtt{q}\_{2}]\{X\_{\mathtt{w}}\}\} = X\_{\mathtt{m}}[\boldsymbol{\Phi};\ \mathtt{i} := \mathtt{i} + 2],$$

subject to the following additional constraints stemming from measurements:

$$\begin{aligned} p\_{1-k,2} &= 0 \vdash X\_{\mathbf{w}}[\mathbf{x} := k; m\_{k,2}] \leq X\_{\mathbf{m}}, \qquad \text{(for } k \in \{0, 1\}),\\ p\_{0,2} &\neq 0 \neq p\_{1,2} \vdash X\_{\mathbf{w}}[\mathbf{x} := 0; m\_{0,2}] +\_{p\_{0,2}} X\_{\mathbf{w}}[\mathbf{x} := 1; m\_{1,2}] \leq X\_{\mathbf{m}}.\end{aligned}$$

Taking α(X) ≜ λ(s, ρ). s(i) and solving the constraints yields a constant upper bound of <sup>8</sup>/<sup>3</sup> on the expected number of T-gates used by the program. This is due to the fact that the probability of the internal measurement is always <sup>3</sup> 4 . Note that this bound is tight.

The transformer qinfer can be linked to qet of course only when variables X<sup>ℓ</sup> are interpreted in a way that the side conditions generated by infer are met. To spell this out formally, let α : EVar → ECT be an assignment of expectations to variables in EVar, and let <sup>J</sup>F<sup>K</sup> <sup>α</sup> : ECT denote the interpretation of F ∈ ETerm under <sup>α</sup> defned in the natural way, e.g., <sup>J</sup>X<sup>ℓ</sup><sup>K</sup> <sup>α</sup> <sup>=</sup> <sup>α</sup>(Xℓ), <sup>J</sup>F[χ]<sup>K</sup> <sup>α</sup> <sup>=</sup> <sup>J</sup>F<sup>K</sup> <sup>α</sup>[χ], etc.

We say that a constraint <sup>Γ</sup> <sup>⊢</sup> <sup>F</sup> <sup>≤</sup> <sup>G</sup> is valid under <sup>α</sup> if <sup>J</sup>F<sup>K</sup> <sup>α</sup>(σ) <sup>≤</sup> <sup>J</sup>G<sup>K</sup> <sup>α</sup>(σ) holds for all states σ ∈ StCT with Γ(σ). An assignment α is a solution to a set of constraints C, if it makes every constraint in C valid. Finally, we say α is a solution to qinf[ stm ]{f} if it is a solution to the set of constraints generated by qinf[ stm ]{f}. We have the following correspondence:

Theorem 3. For any α ∈ EVar → ECT, if α is solution to qinf[ stm ]{F} = G, then it holds that qet[ stm ]{JF<sup>K</sup> <sup>α</sup>} ≤ <sup>J</sup>G<sup>K</sup> α.

It is worth mentioning that the above procedure could have been defned without restriction to the full space St → R <sup>+</sup><sup>∞</sup> of expectations. In this case, this symbolic approach is also complete, in the sense that if qet[ stm ]{f} = g then qinf[ stm ]{X} = G for some G such that the side-conditions have a solution α, with <sup>α</sup>(X) = <sup>f</sup> and <sup>J</sup>G<sup>K</sup> <sup>α</sup> = g. As our main focus is on decidability, however, we have made the choice to restrict ourself to the Cliford+T setting.

Restriction to Polynomials over the Real Closed Field A. We now turn our eyes towards constraint solving, addressing the remaining Issue 2 through restricting the domain of expectations to polynomials over algebraic numbers. To be more precise, we consider the following problem.

Defnition 8. Let E ⊆ ECT be a class of expectations. The inference problem Qinfer(E) ⊆ StmtCT × E × (EVar → E) is given by

$$\alpha(\mathsf{stm}, f, \alpha) \in \mathsf{QINFER}(E) \iff \alpha[X := f] \text{ is solution to } \mathsf{qinf}[\mathsf{stm}]\{X\}$$

In the above defnition, (stm, f, α) ∈ Qinfer(E) is satisfed if the statement stm has solution α[X := f] wrt. the expectation f. Hence it can be seen as checking whether f is a post-expectation for stm. In particular, any solution α[X := f] constitutes an upper bound on the weakest pre-expectation of f (see Theorem 3). We will now see that Qinfer(E) is decidable, for E the set of (real algebraic) polynomial expectations of (arbitrary but fxed) degree d. For states StCT over n classical variables y1, . . . , y<sup>n</sup> and m qubits, let A d [StCT] denotes the class of functions of polynomial expectations of the form

$$\lambda(\{\mathbf{y}\_i := Y\_i\}\_{1 \le i \le n}, (A\_{j,k} + \mathbf{i}B\_{j,k})\_{1 \le j,k \le 2^m})\text{.}\tag{7}$$

where variables Y<sup>i</sup> refer to the classical, and variables Aj,k and Bj,k refer to the real part and imaginary part, respectively, of each algebraic coefcient in the quantum state. Further, P ∈ A[Y1, . . . , Yn, A1,1, . . . , A2m,2<sup>m</sup>, B1,1, . . . , B2m,2<sup>m</sup>] is a multivariate polynomial with coefcients in A. The index d refers to the (total) degree of the underlying polynomial P. For instance,

$$\lambda(\{\mathbf{x} := X \text{ ; } \mathbf{i} := I\}, \begin{pmatrix} A\_{1,1} + iB\_{1,1} \ A\_{1,2} + iB\_{1,2} \\ A\_{2,1} + iB\_{2,1} \ A\_{2,2} + iB\_{2,2} \end{pmatrix}) \text{. } I + X(2 - (A\_{1,2} + A\_{2,1})) \in \mathbb{A}^2[\mathbf{Set}\_{\mathbf{CT}}]$$

One important remark here is that we allow for possibly negative polynomials whereas expectations only output positive real algebraic numbers. Consequently, some side conditions are put on the admissible coefcients Aj,k and Bj,k of the input density matrix to preserve this condition (the matrix is positive, has trace 1, is hermitian). For example, P<sup>2</sup><sup>m</sup> <sup>i</sup>=1 <sup>A</sup>i,i = 1, <sup>P</sup><sup>2</sup><sup>m</sup> <sup>i</sup>=1 Bi,i = 0 (trace is 1) and ∀i, k, Ai,k = Ak,i and Bi,k = −Bk,i (self-adjointness). One can easily check that the expectations defned in Example 5 are in A d [StCT], for d ≥ 1.

#### 54 M. Avanzini, G. Moser, R. P´echoux, S. Perdrix

The restriction to polynomials is made on purpose, as quantifer elimination is decidable in the theory of real closed felds, a well known result due to Tarski and Seidenberg. Recall that the theory of real closed felds is the frst-order theory in which the primitive operations are multiplication, addition, the order relation ≤, and the constants 0 and 1. Consequently, the only numbers that can be defned are the real algebraic numbers. Specifcally, we will make use of the following result, quantifying the complexity of the quantifer elimination decision procedure as a function exponential in number of variables, and doubleexponential in the number of quantifer alternations.

Proposition 1 ([21, Theorem 6]). Let A be an integral ring over a real closed feld R. Let ψ = Q1⃗x1.Q2⃗x2. · · · Ql⃗x<sup>l</sup> . ϕ be a formula in prenex-normal form, where ∀k, Q<sup>k</sup> ∈ {∀, ∃}, Q<sup>k</sup> ̸= Qk+1, and ϕ is a quantifer-free formula over i variables and j atomic propositions of the shape P ≥ 0, each P being a polynomial of degree at most d with coefcients in A. There exists an algorithm computing a quantifer-free formula equivalent to ψ in time O(|ψ|) · (jd) i O(l) .

As A constitutes both an integral ring and a real closed feld, the above theorem is in particular applicable taking A = R ≜ A. In the particular case where ψ is a closed formula, the resulting quantifer-free formula is simply a Boolean combination of inequalities over constants from A. Since we already observed that these can be decided in polynomial time, the above proposition thus implies that validity of ψ is decidable under the given time bound.

By restricting assignment α to polynomial expectations, it becomes decidable to check that α is a solution to a given constraint set C. Indeed, under such a polynomial assignment α, a constraint Γ ⊢ F ≤ G becomes expressible as a formula in the theory of real closed feld A. By letting α range over polynomial expectations with undetermined coefcients, we can this way arrive at the main decidability result of this section.

Theorem 4. For any degree d ∈ N, d ≥ 1, the problem Qinfer(A d [StCT]) is decidable in time 2 2 dO(n) , where n is the size of the considered program.

Practical Algorithm. Theorem 4 established a computable algorithm on the inference of upper bounds on weakest pre-expectation on quantitative program properties of any given mixed classical-quantum program. Nevertheless, the complexity of this algorithm — double-exponential in the program size — is forbiddingly high. In order to turn this procedure into a practical algorithm, we have to tame this inherent complexity. For this, signifcant further restrictions on the class of bounding functions are necessary. We propose to proceed as follows. (1) Bounding functions: in (7) we restricted the class of expectations to polynomials, which in turn yield a bound on the weakest pre-expectation. Based on an analysis of concrete examples considered in the literature (e.g., [30,6]), this can be tightened further to degree 2 polynomials. (2) Approximate solutions: Theorem 4 rests upon (the decidability) of quantifer elimination. Thus the constraints C induced through the symbolic inference of qinf[ stm ]{X} = G (G, X ∈ ETerm) are solved exactly. Over-approximation, however, sufces, if we are only interested in soundness of the inference mechanism.

The restriction of the class of bounding functions is in essence a question of applicability of the automation, taking into account particular use-cases. With respect to approximate solutions, we observe that the actual constraints C considered have at most one quantifer alternation and admit a quantifer prenex of the form ∃ ∗∀ ∗ , that is, a sequence of existential quantifer follows by a sequence of universal quantifers. Roughly speaking the existential quantifers refer to the inference of coefcients in the bounding polynomials, while the universal quantifers refer to program variables. It is well-known that universal quantifcation in optimization problems can be turned into existential quantifcation, like Farka's lemma or generalizations thereof, cf. [38,19]. (E.g., [7,29] for instances of this approach for the inference of expected program costs.)

Summarizing, the inference mechanism detailed in Section 5 can be overapproximated to generate purely existential constraints. The latter can be efectively solved via SMT. We expect that (full) automation of the inference mechanism can capitalize on these ideas. Working out the details and in particular implementation of an efective prototype is subject to future work.

# 6 Conclusion and Future Work

We have studied the complexity and inference of techniques for obtaining qualitative program properties. One particular property of interest would be the cost of quantum programs, that is average time, average number of gates, mean value of a variable, etc. We show that these problems were undecidable in general by placing them in the arithmetic hierarchy and saw that inference could become decidable on a restricted fragment: quantum gates in Cliford+T and a function space with a decidable theory (polynomials of bounded degree over a real closed feld). Further, we sketch how the latter can be transformed into an efcient synthesis method.

Many open questions remain. The studied notion of expectation transformer describes local properties of the quantum state, while it would be interesting to extend this technique to the global state so as to study a mixed state in a quantum-only setting (without classical variables and stores). Another question of interest is to what extent a characterization of the quantum class zbqp, the class of problems computed by quantum programs in polynomial expected runtime, could be obtained using this tool.

Acknowledgments. This work is supported by the HORIZON 2020 project NEASQC and by the Inria associate team TC(Pro)<sup>3</sup> . It is also supported by the Plan France 2030 through the PEPR integrated project EPiQ ANR-22-PETQ-0007 and the HQI initiative ANR-22-PNCQ-0002, the ANR PRC project PPS ANR-19-CE48-0014, as well as FWF Project AUTOSARD P 36623.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Reconciling Partial and Local Invertibility

Anders Ågren Thuné1,2 , Kazutaka Matsuda2(B) , and Meng Wang<sup>3</sup>

<sup>1</sup> KTH Royal Institute of Technology, 100 44 Stockholm, Sweden. athune@kth.se

<sup>2</sup> Tohoku University, Aramaki Aza-aoba 6-3-09, Aoba-ku, Sendai 980-8579, Japan.

kztk@tohoku.ac.jp

<sup>3</sup> University of Bristol, Bristol BS8 1TH, United Kingdom. meng.wang@bristol.ac.uk

Abstract. Invertible programming languages specify transformations to be run in two directions, such as compression/decompression or encryption/decryption. Two key concepts in invertible programming languages are partial invertibility and local invertibility. Partial invertibility lets invertible code be parameterized by the results of non-invertible code, whereas local invertibility requires all code to be invertible. The former allows for more fexible programming, while the latter has connections to domains such as low-energy computing and quantum computing. We fnd that existing approaches lack a satisfying treatment of partial invertibility, leaving the connection to local invertibility unclear.

In this paper, we identify four core constructs for partially invertible programming, and show how to give them a locally invertible interpretation. We show the expressiveness of the constructs by designing the functional invertible language Kalpis, and show how to give them a locally invertible semantics using the novel arrow combinator language rrArr—the key idea is viewing partial invertibility as an invertible efect. By formalizing the two systems and giving Kalpis semantics by translation to rrArr, we reconcile partial and local invertibility, solving an open problem in the feld. All formal developments are mechanized in Agda.

Keywords: Reversible computation · Arrows · Partial invertibility · Domain-specifc languages.

# 1 Introduction

An invertible computation can be run in two ways: forward in the conventional way, or backward to recover an input given the output. Such processes appear frequently and prominently in a variety of contexts, enabling the shape of information to be adapted to diferent purposes, while preserving the essential content. For instance, (lossless) compression shrinks the size of a piece of information to facilitate efcient storage, encryption transforms it to be inaccessible to third parties, and serialization reshapes it to enable storage or transmission. The property of invertibility is crucial, as it guarantees that the data can always be reft to its original purpose.

For example, consider the function autokey below, which computes a variant of the Autokey cipher (see e.g., [50]). The cipher takes a primer character k, and interprets it as an integer (e.g., 'A' 7→ 0, 'B' 7→ 1, . . . , 'Z' 7→ 25) determining a shift to apply to the frst element of the input. Each consecutive character in the input is similarly shifted by the amount given by its predecessor. For instance, autokey 'F' "HELLO" = "CXHAD", as 'F' represents a (cyclic left) shift of 5 characters, mapping 'H' to 'C', and 'H' a shift of 7 characters, mapping 'E' to 'X', and so on.

```
autokey :: Char → [Char] → [Char]
autokey k [ ] = [ ]
autokey k (h : t) =
  shift (chrToInt k) h : autokey h t
                                          autokey′
                                                    :: Char → [Char] → [Char]
                                          autokey′
                                                    k [ ] = [ ]
                                          autokey′
                                                    k (h
                                                        ′
                                                          : t
                                                            ′
                                                            ) =
                                            let h = shift (−(chrToInt k)) h
                                                                              ′
                                            in h : autokey′
                                                             h t′
```
The corresponding decryption function autokey′ is given to the right, and shifts backward to restore the original input. We assume shift : Int → Char → Char performing the cyclic shift is previously defned. This is a simple example, but it serves as a toy model of more advanced encryption schemes and has a few interesting features which we highlight momentarily.

In traditional unidirectional languages, each direction of an invertible algorithm has to be specifed separately in this way, and there is no easy way of ensuring that the two programs really constitute each other's inverses. Furthermore, there is a maintenance concern—when one direction is updated, the other has to be updated accordingly. An alternative, more scalable approach is to let a single program denote both directions at the same time—intuitively, the inverse is derived by "reading the original code right-to-left". Invertible programming languages implement this approach, letting each program be executed in either of two directions, which are guaranteed to form a pair of inverse functions. Some examples of invertible languages include Janus [35,53], R [17], Inv [43], Π [10,26], RFun [54], Theseus [27], CoreFun [25] and Sparcl [39, 40].

These languages traditionally require each individual step of computation to be invertible, which can be ensured, e.g., by providing a set of invertible combinators as basic building blocks, or by imposing various syntactic restrictions. This form of local invertibility has several benefts, in addition to being a simple foundation for building programming languages. For example, it was observed early on that discarding information fundamentally results in heat dissipation, meaning that a machine executing only invertible instructions could in principle operate at lower energy levels than a conventional computer [32]. Moreover, locally invertible languages serve as a foundation when considering other domains with similar requirements, such as quantum computing, where computations are composed of individually invertible quantum gates along with irreversible measurements [22,48]. Despite these benefts, the local favor of invertibility severely limits the fexibility of the programmer. In particular, our example function autokey is not actually invertible up front! The case autokey k [ ] = [ ] discards the value of k, which means we cannot simply read the defnition right-to-left. Of

course, the primer k is not intended to be treated as part of the invertible input to autokey, but rather as a parameter determining the bijection between input and output strings. However, this cannot be naturally expressed in a language adhering strictly to the (locally) invertible paradigm, where the parameter would need to be preserved in the result.

The property of becoming invertible when some parameters are fxed is known as partial invertibility [39, 40, 44, 47], and many previous languages ofer some form of support for partially invertible defnitions. However, the level of support varies from more limited (e.g., [25,27,35]) to more complete (e.g., [39,40]), and the previous work largely lacks a systematic treatment. The case of autokey is especially tricky, since its invertible input h fows to the unidirectional parameter k in the recursive call. To our knowledge, only Sparcl [39,40] handles cases like this in a systematic way, but it does so through an advanced language foundation quite diferent from that of traditional invertible languages, and its connection to the locally invertible paradigm is not well-understood. Thus, it is an open question whether it is possible to support fully expressive partial invertibility while maintaining a compositional locally invertible interpretation.

It is theoretically known that any (partially) invertible computation can be simulated in a locally invertible system [8]; however, this simulation gives poor control over the invertible behavior and is inefcient in both time and space. There has been research on inversion of arbitrary programs (e.g., [41, 44, 49]), and on logic languages with no fxed direction of execution, like Prolog and Curry, which use (lazy) generate-and-test to fnd inputs corresponding to a given output [4]. Yet, these approaches lack the guarantee of invertibility, which is the main motivation of an invertible language.

#### 1.1 Contributions and Organization

In this paper, we identify a core set of constructs for partially invertible programming, and explain them in terms of a locally invertible semantics. These constructs are sufcient to allow expressive partially-invertible and higher-order computation, thus solving an open problem in the invertible programming literature. The constructs include (1) partially invertible branching, (2) pinning invertible inputs, (3) partially invertible composition, and (4) abstraction and application of invertible computations.

We demonstrate the above fndings by designing and formalizing two systems based on these constructs, Kalpis<sup>4</sup> and rrArr. Kalpis is a typed functional programming language accommodating expressive partially-invertible and higher-order computation, and rrArr is an arrow combinator language intended to capture the essence of partially invertible programs. Kalpis is given semantics via rrArr, which captures partial invertibility as an efect on top of 'pure' invertible computations, intuitively adjoining a parameter to an invertible function, analogously to the reader monad in unidirectional computation. By interpreting terms of Kalpis as parameterized bijections, we are able to give a

<sup>4</sup> The name stands for "Kalpis—an Arrow-based Locally and Partially Invertible System".

translation into rrArr combinators, giving a compositional embedding into a locally invertible setting. Thus, we present a simple and rigorous take on partial invertibility which bridges the gap between previous work in the feld.

The core constructs for partial invertibility that we present are not new per se, and the features of Kalpis largely coincide with those of Sparcl [39, 40]. However, the goal of this paper is not to present Kalpis as such, but rather to describe partial invertibility from frst principles and give a simpler semantics which is compatible with local invertibility. There are key technical diferences between the two languages, and the fact that they are still similar should be taken as a sign that we have achieved our goal without a signifcant loss of expressiveness.

In summary, our main contributions are:


Section 6 discusses the results in relation to previous work, and Section 7 concludes.

# 2 Constructs for Partially Invertible Programming

In this section, we introduce a set of core constructs for partially invertible programming and explain their intuitive idea using programming examples in our partially invertible language Kalpis, which we introduce formally in Section 3. The constructs include (1) partially invertible branching, (2) pinning invertible inputs, (3) partially invertible composition, and (4) abstraction and application of invertible computations. We explain them each in turn, and show how they can be understood as operations on parameterized bijections, which we exploit in later sections to embed them into a locally invertible setting.

These constructs act as a form of glue, allowing invertible and unidirectional computations to be run in tandem. Thus, we also assume some traditional invert-

<sup>5</sup> https://git.sr.ht/~aathn/kalpis-agda

<sup>6</sup> https://git.sr.ht/~aathn/kalpis

ible constructs taken from the existing literature, like invertible pattern matching, which we briefy explain where necessary.

#### 2.1 Partially Invertible Branching

As a frst example, we defne partially invertible addition. In particular, the function x 7→ x + n has inverse x 7→ x − n for any n ∈ N. Kalpis supports recursive type defnitions, and we can defne the naturals as follows.

data Nat = Z | S Nat

Now, addition is implemented naturally by the following function add, taking an n to produce the corresponding bijection.

```
sig add : Nat → Nat ↔ Nat
def •
     add n x =
  case n of
    Z → x
    S n → S (add n ⋄ x)
```
The language uses a functional syntax, and features elements typical to invertible programming: a bijection type A ↔ B, bijection defnition def • , and bijection application f ⋄ x. The functional types associate to the right, so the type of

```
add : Nat → Nat ↔ Nat
```
indicates a partially invertible function taking a Nat to produce a bijection Nat ↔ Nat. The case form showcases our frst core construct, partially invertible branching. If n is zero, x is returned unchanged, and otherwise S is applied to the result of a recursive computation. The resulting function appends n copies of S to x in the forward direction, or peels them of in the backward direction.

What is interesting is that case results in a loss of information: without prior knowledge of n, it is impossible to determine which branch to choose when executing backwards. This corresponds to the fact that one cannot uniquely determine n and x given y = n + x. However, when n is fxed beforehand, we can refer to its value regardless of executing forwards or backwards, which is what motivates the case construct. For example, we get the following results when applying add to some example inputs, where the primitive operator (·) † : (A ↔ B) → (B ↔ A) lets us compute the inverse.


As the type Nat ↔ Nat requires, the argument x in the defnition of add must be treated linearly, i.e., must be used exactly once in any successful evaluation (see e.g., [51]) in order to ensure invertibility. For instance, changing the frst

case above to Z → Z gives an error, as x is unused in the case body. Indeed, if x is never used, there is no way to recover its value in the backward direction. While allowing more than one use does not directly prevent invertibility, it requires implicit copying of values, which may induce unintended runtime failures in the backward execution. Similarly, we cannot branch on x using case for the reasons mentioned above; instead, an invertible case• form is available, explained later.

Note that add is not a total function: e.g., the application (add (S Z))† ⋄Z will try to peel an S when there is none, resulting in a runtime error.<sup>7</sup> The guarantee given by Kalpis is that whenever evaluating a bijection f on argument v gives v ′ in the forward direction, then evaluating f on v ′ gives v in the backward direction, and vice versa (this is made formal in Section 3).

Mathematically, add represents a parameterized bijection, a family of (partial) one-to-one mappings f<sup>n</sup> : N → N (such that fn(x) = x + n). This view will underpin our explanation of partially invertible computations in later sections, and each of the core constructs in this section can also be understood from this viewpoint. Seen from this perspective, the case construct allows defnitions of the form

$$f\_n(x) = \begin{cases} g\_n(x) & \text{if } n = 0\\ h\_n(x) & \text{otherwise} \end{cases},$$

where g and h are also parameterized bijections.

#### 2.2 Pinning Invertible Inputs

As a second example, we consider a program fb computing pairs of Fibonacci numbers (defned by the equations F<sup>0</sup> = F<sup>1</sup> = 1 and Fn+1 = F<sup>n</sup> + Fn−<sup>1</sup> for n > 0), a classic in the invertible programming literature (e.g., [18, 53]). We can compute fb n by case distinction on n; if n = 0, we return (F0, F1), and otherwise we recursively obtain fb (n−1) = (Fn−1, Fn), with which we compute the next pair (Fn, F<sup>n</sup> + Fn−1).

However, if we try to implement this algorithm invertibly using the function add above, we encounter an issue: we cannot make the call add Fn⋄Fn−1, as add does not treat its frst argument invertibly. Since F<sup>n</sup> comes from the invertible input n, we need an operation that is properly invertible in both inputs. To this end, we can defne an invertible addition add′ such that add′⋄(x, y) = (x, x+y). By preserving a copy of x in the output, the same x can be used to recover y by subtraction in the inverse direction. Indeed, add′ ⋄ (Fn, Fn−1) gives just the result we need. In Kalpis, add′ can be derived from add automatically using our second core construct, pin.

sig add′ : (Nat, Nat) ↔ (Nat, Nat) def • add′ (x, y) = pin add ⋄ (x, y)

Here, the operator pin : (c → a ↔ b) → (c, a) ↔ (c, b) lifts a partially invertible function to operate on invertible data; we refer to this as pinning

<sup>7</sup> The loss of totality is unavoidable in order to achieve r-Turing completeness [5], i.e., the ability to defne all computable bijections.

the invertible input x , allowing it to be used in a unidirectional position. This construct (inherited from Sparcl [39, 40]) is crucial in practical programming, as it lets unidirectional computations depend on invertible data in a controlled manner. With add′ defned, fb can be written as follows.

```
sig fb : Nat ↔ (Nat, Nat)
def •
     fb n =
  case• n of
    Z → (S Z, S Z) with is11
    S n → let•
               (y, x) = fb ⋄ n in
           add′ ⋄ (x, y)
             with not ◦ is11
                                            sig is11 : Nat → Bool
                                            def is11 n =
                                              case n of
                                                (S Z, S Z) → True
                                                _ → False
```
This example is defned by invertible pattern matching (case• ), a construct inherited from previous languages like Janus [35,53] and Ψ-Lisp [7]. When branching on the input to a bijection (as opposed to a fxed parameter), postconditions marked by the keyword with ensure that the execution can determine which branch to take in the backward direction. Each postcondition is a boolean function that must return True for any result of its branch and False for any result of the branches below it (this is checked at runtime following the symmetric frst-match policy [54]). The backward evaluation tests each condition in turn, selecting the frst branch whose condition is true. Here, is11 is used to distinguish the base case where the output is (S Z, S Z).

The inverse behavior of fb computes n given a pair (Fn, Fn+1). Specifcally, by computing Fn+1 − Fn, we obtain Fn−1, and repeating the process until we reach the start of the sequence lets us deduce the index of the initial pair. Kalpis runs fb as below.


Again, fb is non-total: running it backwards on a pair not constituting two consecutive Fibonacci numbers will cause the computation to fail.

Viewed as an operation on parameterized bijections, pin lets part of an invertible input be shifted to the parameter position if a copy is returned in the end. Formally, we have pin(f)n(x, y) = (x, f(n,x)(y)); in our example, f(n,x) corresponds to addition by x , ignoring a trivial n representing variables captured in the pin form.

### 2.3 Partially Invertible Composition

We now return to the example of the introduction, autokey. It can be defned in Kalpis as follows:

$$\begin{array}{l} \mathsf{sig}\quad autokey: \mathsf{Char}\rightarrow[\mathsf{Char}] \leftrightarrow[\mathsf{Char}]\\ \mathsf{{def}^{\bullet}}\ \_{autokey}k\ xs =\\ \mathsf{case}^{\bullet}\ xs\ \mathsf{of}\\ [] \quad \rightarrow[]\\ (h:t)\rightarrow\mathsf{let}^{\bullet}\ (h,r) = \mathsf{pin}\ \_autokey\ \diamonds(h,t)\ \mathsf{in}\\ (shift\ (chrToInt\ k)\ \diamondh):r \end{array}$$

The structure is very similar to the unidirectional version in Section 1, but uses the invertible branching and pinning constructs explained previously. We assume primitives chrToInt : Char → Int and shift : Int → Char ↔ Char for computing and performing the cyclical shifts, respectively. We omit the with-conditions of the invertible match by convention, as the syntactically distinct branch bodies can act as patterns to guide backward branching.

This example features our third core construct, partially invertible composition. This simply refers to the fact that we can modify the parameter of a bijection unidirectionally, as in shift (chrToInt k) ⋄ h. In this case, the (irreversible) function chrToInt is applied to k inside the (invertible) call to shift. In other words, the parameter part of an invertible computation is allowed to depend freely on unidirectional computations, greatly enhancing the fexibility when programming. The reason we call it composition is because from the perspective of parameterized bijections, this corresponds to the composition of a parameterized bijection f with an (arbitrary) function g on the parameter part, i.e., (f ◦ g)n(x) = fg(n)(x). In our example, we have f corresponding to shift and g corresponding to chrToInt.

The example also further highlights the utility of pin. As noted in the introduction, autokey is tricky to express since each character in the invertible output depends unidirectionally on the preceding character in the corresponding input. Similar patterns also appear in more advanced examples; for instance, consider an adaptive compression method where each character in the input must be treated invertibly, and yet also be used as part of the (unidirectional) compression table. pin enables this sort of dependency in a safe way, letting us use h in the recursive call to autokey and returning a copy to use in the output.

Again, Kalpis lets us execute autokey in either direction, and guarantees that the two are inverses.


#### 2.4 Abstraction and Application of Invertible Computations

Our fnal core construct of partially invertible programming is the ability to abstract and apply invertible computations. Although the examples we have seen so far have defned (partially) invertible computations using the def • keyword in a style close to traditional invertible languages, Kalpis actually features bijections as frst-class values and supports proper higher-order programming. Bijections can be constructed with an invertible λ-form λ •x.e analogous to that typical for ordinary functions, and the form def • f x<sup>1</sup> x<sup>2</sup> . . . x<sup>n</sup> = e is simply syntactic sugar for f = λx1.λx<sup>2</sup> . . . λ•xn. e. To our knowledge, only Sparcl [39, 40] shares this feature, with most invertible languages being limited to frst-order computation.

For example, we are able to defne multiple variants of the typical map function for lists in Kalpis.


Here, map is defned as usual, and maps a function over each element of a list, while mapBij makes use of the language's invertible constructs, taking a bijection argument to produce a bijection on lists. For example, using mapBij, the Caesar cipher (which shifts each character in the input a fxed number of steps) can be defned with a one-liner, as below to the left.

$$\begin{array}{ll} \mathsf{sig} & cascar: \mathsf{Char} \to [\mathsf{Char}] \leftrightarrow [\mathsf{Char}] & \mathsf{sig} \ vig : [\mathsf{Char}] \to [\mathsf{Char}] \leftrightarrow [\mathsf{Char}] \\ \mathsf{def} & cacsar \ k = mapBij \ (shift \ k) & \mathbf{def} \ vig \ ks = apBij \ (map\ shift \ ks) \end{array}$$

The function on the right, vig (from Vigenère), takes a list of keys, shifting each character in the input using the corresponding key—the defnition relies on apBij : [a ↔ b] → [a] ↔ [b] to apply a list of bijections pointwise to a list of inputs (assuming the two have equal lengths). The latter example demonstrates that bijections can even occur inside data structures such as lists.

Some restrictions must be observed when dealing with higher-order computation in Kalpis. The language distinguishes between unidirectional and invertible terms, and carefully controls the interaction between the two. The restrictions mean that the invertible fragment of the language is essentially frst-order; a formal account is given in Section 3.

Viewed from the perspective of parameterized bijections, abstraction corresponds to forming the function n 7→ fn, witnessing that each choice of parameter n induces a bijection f<sup>n</sup> which can be treated as a standalone value. On the other hand, application of a bijection α corresponds to forming the parameterized bijection appα(x) = α(x), where the parameter determining the bijection is α itself.

This concludes Section 2; for more programming examples in Kalpis, we refer to the prototype implementation,<sup>8</sup> which contains a number of nontrivial programs, including implementations of Hufman coding and sliding-window compression.

<sup>8</sup> https://git.sr.ht/~aathn/kalpis

# 3 The Kalpis Core System

In this section, we formally defne the Kalpis core system and state the essential metatheoretic properties. A salient feature of the system is the clear separation between unidirectional and invertible terms: we have two main syntactic categories, two typing relations, and three evaluation relations (one for unidirectional terms, and one in each direction for invertible terms). The unidirectional terms are a conservative extension of a standard simply-typed call-by-value λ-calculus, and the invertible terms add support for (partially) invertible computation.

After introducing the syntax and reviewing some examples, Sections 3.4 and 3.5 give a formal semantics which suggests an interpretation of Kalpis terms as parameterized bijections. This view is made precise in Sections 4 and 5, which defne a translation from Kalpis into the arrow language rrArr, enabling a locally invertible interpretation.

#### 3.1 Syntax

The syntax of Kalpis core is given below, where u denotes unidirectional terms, r denotes invertible terms, and p denotes patterns. The vector notation t denotes an ordered sequence of elements t<sup>i</sup> , whose length we will refer to by |t|.

$$\begin{array}{l} u ::= x \mid \lambda x. u \mid u\_1 \; u\_2 \mid \lambda^\bullet x. r \mid u\_1 \diamond u\_2 \mid \mathsf{C} \; \overline{u} \mid \mathsf{case} \; u\_0 \; \mathsf{of} \; \{\overline{p \to r}\} \\\ r ::= x \mid u \diamond r \mid u^\dagger \diamond r \mid \overline{p} \; \overline{u} \; \overline{u} \; \overline{u} \; \overline{u} \; \mathsf{of} \; \overline{u} \\\ \mid \; \mathsf{C} \; \overline{r} \mid \mathsf{case} \; u \; \mathsf{of} \; \{\overline{p \to r}\} \mid \mathsf{case} \; \overline{r} \; \overline{p} \; \mathsf{of} \; \{\overline{p \to r \; \mathsf{with} \; u\} \\\ p ::= \mathsf{C} \; \overline{x} \end{array}$$

The syntax of unidirectional terms include the standard cases for variables, abstraction and application, along with data constructors and pattern matching. In addition, there is the invertible abstraction λ •x.r and application u<sup>1</sup> ⋄ u<sup>2</sup> explained in the previous section. Note that while the body r is an invertible term, the abstraction itself is unidirectional.

The syntax of invertible terms resembles a frst-order functional language, but with a couple of key additions. We have bijection application u ⋄ r, where the bijection is unidirectional whereas the argument is invertible. We also have fully applied versions of the (·) † and pin operators explained in the previous section (this is without loss of generality, as e.g., the higher-order version of pin can be recovered as λf.λ•x. pin f ⋄ x). Partially invertible branching is represented by the case form, whose scrutinee u is unidirectional. The case• form deconstructs an invertible term, and has a with-condition for invertible branching, following Janus [35, 53] and Ψ-Lisp [7]. The core constructs of the previous section are all featured explicitly in the syntax, except for partially invertible composition, which is implicitly performed whenever a unidirectional term u occurs in an invertible context.

### 3.2 Types

Next, we defne the types of Kalpis core.

$$A, B \:: = \mathsf{T} \,\overline{B} \mid A \to B \mid A \leftrightarrow B \mid X$$

The types include constructors T B, functions A → B, bijections A ↔ B and type variables X. The types are conventional with the exception of invertible computations A ↔ B; this simplicity is a design feature of Kalpis. With each type constructor T we associate an arity k and a set of constructors C with signatures C : A<sup>1</sup> → A<sup>2</sup> → · · · → A<sup>n</sup> → T B, where |B| = k. We will assume the type constructors include at least the unit 1, products ⊗, and sums ⊕ with constructors

$$( ): 1 \qquad (-,-): A \to B \to A \otimes B \qquad \mathsf{l} \mathsf{nL}: A \to A \oplus B \qquad \mathsf{l} \mathsf{nR}: B \to A \oplus B$$

for any A, B. We use Bool as a shorthand for 1⊕1, and True, False as shorthands for InL (), InR (), respectively.

Types can be (mutually) recursive via constructors; for example, the type Nat has constructors Z : Nat and S : Nat → Nat. In general, for any fxed A, the recursive type µX.A can be represented with a nullary type constructor RecA, with constructor

$$\mathsf{Roll} : A[\mathsf{Rec}\_A/X] \to \mathsf{Rec}\_A.$$

For instance, Rec1⊕<sup>X</sup> has constructor Roll : 1 ⊕ Rec1⊕<sup>X</sup> → Rec1⊕X, making it isomorphic to Nat. Technically, we consider a variable X implicitly bound in the annotation to Rec, and assume all other types are closed.

#### 3.3 Correspondence to the Surface Language

The correspondence between the core syntax and the examples of Section 2 should be clear. For instance, the examples of addition and Fibonacci number calculation can be written as follows:

$$\begin{array}{ll} \text{add} \triangleq \text{fix} \left( \lambda add'.\lambda n.\lambda^{\bullet} m. \begin{array}{l} fib \stackrel{\scriptstyle\approx}{=} \exists x Bij \ (\lambda fb'.\lambda^{\bullet} n. \\ \mathbf{case} \ n \text{ of} \\ \mathbf{\widetilde{Z}} \rightarrow \mathbf{m} \\ \mathbf{\widetilde{S}} \, n' \rightarrow \mathbf{S} \, (add' \,\, n' \diamond m) \end{array} \right) & \begin{array}{l} \mathbf{case} \ \mathbf{\widetilde{}} \text{n} \, \mathrm{of} \,\, \begin{array}{l} (\lambda fb'.\lambda^{\bullet} n. \\ \mathbf{case} \ \mathbf{\widetilde{}} \text{n} \, \mathbf{of} \\ \mathbf{\widetilde{Z}} \rightarrow (\mathbf{\widetilde{S}} \, \mathbf{Z}, \mathbf{\widetilde{Z}}) \\ \mathbf{\widetilde{S}} \, n' \rightarrow \mathbf{case} \end{array} \end{array} \right) & \begin{array}{l} \mathbf{false} \ \mathbf{\widetilde{Z}} \rightarrow (\mathbf{\widetilde{Z}} \, \mathbf{\widetilde{Z}} \mathbf{\widetilde{Z}}) \ \mathbf{with} \ \mathbf{is} \, \mathbf{1} \\ \mathbf{\widetilde{Z}} \rightarrow (\mathbf{\widetilde{S}} \, \mathbf{Z} \, \mathbf{\widetilde{Z}}) \ \mathbf{with} \ \mathbf{is} \, \mathbf{1} \\ (x, y) \rightarrow \mathbf{j} \, \mathbf{in} \, \mathbf{add} \, \circ (y, x) \big| \\ \mathbf{with} \, \lambda\_{\perp} \, \mathbf{True} \\ \mathbf{with} \, \mathbf{not} \, \circ \, \mathbf{is} \, \mathbf{1} \, \end{array} \end{array}$$

Here, add is a unidirectional term defned using a fxpoint operator fx , and the structure is similar to the version presented in Section 2.1. The function fb is similarly defned, but uses the fxpoint operator fxBij instead of fx , which works for bijections instead of functions. We omit the defnition of is11 : Nat ⊗ Nat → Bool in the interest of space. The term fxBij (and analogously fx ) is defned as below, making use of the language's recursive types.

$$\exists f x Bij \stackrel{\Delta}{=} \lambda f. \ (\lambda g. \ g. (\mathsf{Roll} \ g)) \ (\lambda x. \lambda^{\bullet} a. \ f \ ((\mathsf{case} \ x \ \mathsf{of} \ \mathsf{Roll} \ y \to y) \ x) \circ a).$$

The type system we defne in the next section will assign these terms the following types as expected.

$$\begin{array}{llll}add & \mathsf{Nat} \rightarrow \mathsf{Nat} \leftrightarrow \mathsf{Nat} & \mathit{fix} & : ((A \rightarrow B) \rightarrow A \rightarrow B) \rightarrow A \rightarrow B \\\ f b & \mathsf{Nat} \leftrightarrow \mathsf{Nat} \otimes \mathsf{Nat} & \mathit{fixBij} & ((A \leftrightarrow B) \rightarrow A \leftrightarrow B) \rightarrow A \leftrightarrow B \end{array}$$

### 3.4 Type System

Figure 1 shows the typing rules for unidirectional (Γ ⊢ u : A) and invertible (Γ; Θ ⊢ r : A) terms. The latter relation uses two contexts Γ and Θ; intuitively, Γ contains variables for unidirectional data, which may be discarded or duplicated freely, whereas Θ contains variables for data that must be treated in an invertible way. This use of a dual context system [13] is inspired by previous work such as CoreFun [25] and Sparcl [39, 40]. Formally, we defne the typing contexts as Γ, Θ ::= ε | Γ, x : A, and assume names x are unique within a context. We let Γ1, Γ<sup>2</sup> denote the concatenation of two contexts.

The rules for Γ ⊢ u : A are mostly straightforward. T-Abs• pushes the parameter x of λ •x.r into Θ instead of Γ to ensure that the variable is used in an invertible way in r, and T-Run gives a rule for bijection application analogous to T-App. In the Case rules, we implicitly require that patterns are disjoint and exhaustive.

In the rules for Γ; Θ ⊢ r : A, the variables in the Θ environments must be used exactly once to ensure invertibility. Hence, we need to separate Θ into, e.g., Θ = Θ<sup>1</sup> ⊎ Θ<sup>2</sup> for typing subterms, where ⊎ is used analogously to a linear type system (see, e.g., [9]). The rules follow the intuition that r denotes a bijection between Θ and A parameterized by Γ. This highlights the diference between the pattern matching rules, T-UCase and T-RCase: the bound variables Γ<sup>i</sup> in the former are parameters for the bijection that r<sup>i</sup> defnes, while in the latter, the variables Θ<sup>i</sup> are part of the inputs of r<sup>i</sup> , so that case• performs a composition of two invertible computations.

As stated in Section 2.4, there are some restrictions on how unidirectional and invertible terms can interact. Note that the unidirectional subterms occurring in the invertible typing rules are only typed using Γ, and not Θ. For instance, since the left-hand side in rule T-RApp is unidirectional, it cannot depend directly on invertible variables, ruling out terms like λ •x.(x⋄True). This is a natural restriction, as we cannot generally deduce which function was used to produce some given result. Conversely, there is no rule for directly accessing Γ from the invertible typing relation; instead, unidirectional data can only affect the computation through rules like T-UCase and T-RApp. Both λ-forms are unidirectional, meaning they can neither capture invertible variables nor be returned from an invertible computation. In this sense, the invertible fragment of the language is frst-order.

We note that there are no particular restrictions on unidirectional terms, and the approach presented could be used to augment any standard functional language with invertible computations λ •x.r and u<sup>1</sup> ⋄ u2. The prototype implementation further adds let-polymorphism as an orthogonal extension.

### 3.5 Operational Semantics

We frst defne the set of values as below.

$$v ::= \mathsf{C} \; \overline{v} \mid \langle \lambda x. u, \gamma \rangle \mid \langle \lambda^{\bullet} x. r, \gamma \rangle$$

Reconciling Partial and Local Invertibility 71

$$\begin{array}{c|c|c} \hline \text{Typping Rales for Undiferential Terms} & \Gamma \vdash u : A \quad \text{and} \quad \text{Paratens } \Gamma \vdash p : A\\ \hline \Gamma \vdash x : A \quad \text{T-VAR} & \Gamma \vdash x : A \to B \quad \text{T-Ans} & \Gamma \vdash u : A \to B \quad \Gamma \vdash u : A\\ \hline \Gamma \vdash x : A \quad \text{T-Ans} : A \quad \text{T-Ans} & \Gamma \vdash u : A \to B \quad \Gamma \vdash u : A \to B\\ \hline \Gamma \vdash \lambda^{\mathsf{T}} x : A \to B & \Gamma \vdash \mathsf{A} x : A \to B & \Gamma \vdash u : A \to B\\ \hline \Gamma \vdash \lambda^{\mathsf{T}} x : A \to B & \Gamma \vdash u : \alpha : \alpha : B\\ \hline \end{array} \\ \begin{array}{c|c|c} \hline \Gamma \vdash u : A \to B & \Gamma \vdash u : A \to B & \Gamma \vdash u : A \to A\\ \hline \Gamma \vdash u : \alpha : \alpha : B\\ \hline \Gamma \vdash \mathsf{T-u} x : A \quad \Gamma \vdash \mathsf{T-u} : \beta : A \to B & \Gamma \vdash \mathsf{T-u} : \overline{\beta} \\ \hline \Gamma \vdash \mathsf{C-u} x : \alpha \text{ of } \{\mathsf{p-u}\} : B\\ \hline \Gamma \vdash \mathsf{A} x : \mathsf{A} \to A & \Gamma \vdash \mathsf{B-c} x : \mathsf{T-u} : \overline{\beta} \\ \hline \Gamma \vdash u : A \to B & \Gamma \vdash u : B & \Gamma \vdash \mathsf{C-u} x : \overline{\beta}\\ \hline \Gamma \vdash u : A \to B & \Gamma \vdash u : \beta : B\\ \hline \Gamma \vdash \mathsf{U} x : \beta : \alpha : \beta : \beta\\ \hline$$

Fig. 1. The type system of Kalpis core: A → B means A<sup>1</sup> → · · · → A<sup>|</sup>A<sup>|</sup> → B.

Here, γ is a value environment, i.e., a mapping from variables to their values. Formally, we defne γ, θ ::= ∅ | γ, x 7→ v, with γ and θ corresponding to Γ and Θ. We use the disjoint union θ<sup>1</sup> ⊎θ<sup>2</sup> to concatenate two environments θ<sup>1</sup> and θ2, which is defned only when dom(θ1) and dom(θ2) are disjoint. The values include constructors and two closure forms ⟨λx.u, γ⟩ and ⟨λ •x.r, γ⟩, corresponding to unidirectional and invertible computations. We type the values in analogy with the terms, with the rules for closures as follows:

$$\frac{\gamma:\Gamma \quad \Gamma, x:A \vdash u:B}{\langle \lambda x.u, \gamma \rangle:A \to B} \quad \frac{\gamma:\Gamma \quad \Gamma; x:A \vdash r:B}{\langle \lambda^\bullet x.r, \gamma \rangle:A \leftrightarrow B}$$

Here, we write γ : Γ to mean that dom(γ) = dom(Γ) and γ(x) : Γ(x) for all x ∈ dom(Γ). For p a pattern, we write pγ to denote the value obtained by applying the substitution γ to p's variables. In addition, we use the shorthand i[= j ≜ True if i = j False otherwise .

We now present in Figure 2 the operational semantics of Kalpis core, which consists of three evaluation relations: unidirectional, forward, and backward. The unidirectional evaluation relation γ ⊢ u ⇓ v reads that under γ term u evaluates to value v, as usual. In contrast, the forward and backward evaluation relations defne a bijection. The former relation γ; θ ⊢ r ⇒ v reads that under γ the forward evaluation of r maps θ to v, and the latter relation γ; v ⊢ r ⇐ θ reads that under γ the backward evaluation of r maps v to θ. As one can see, γ serves as parameter for this bijection that defnes a one-to-one correspondence between θ and v. Due to the space limitations, we omitted the rules for backward evaluation, as they are completely symmetric to forward evaluation. That is, for each rule of the forward evaluation, the corresponding backward rule is obtained by swapping each occurrence γ; θ ⊢ r ⇒ v with γ; v ⊢ r ⇐ θ, and vice versa. Crucially, the evaluation relations are mutually dependent, and when a unidirectional term is embedded in an invertible computation, the unidirectional evaluation will be invoked to evaluate the term in the same way regardless of whether executing forwards or backwards.

We encourage the reader to study the rules for partially invertible case and invertible case• especially. The former branches based on a unidirectional term, which is evaluated frst regardless of the direction of execution. The latter branches based on an invertible term, which is evaluated frst in the forward direction but last in the backward direction. In the backward direction, the withconditions u are instead evaluated frst; the condition i[= j for j ≤ i encodes the branch selection and the runtime check of postconditions mentioned previously.

There is a subtlety in the backward evaluation rule for constructors C r, where the same C occurs both in the term C r and the input C v, meaning that evaluation fails if the value does not match the constructor C. This corresponds to, e.g., the term (λ •x. S x) † ⋄ Z failing as it tries to subtract one from zero.

#### 3.6 Metatheory

In this section, we briefy state the essential properties of the core system. The propositions in this section have been formalized mechanically, by implementing and reasoning about a defnitional interpreter [46] in Agda. The implementation follows the presentation of the paper closely, but uses intrinsically-typed terms and nameless variables, and relies on the sized delay monad [1, 11].

#### Theorem 1 (Subject reduction).


Proof. Directly from the existence and type of the defnitional interpreter in Agda.

# Theorem 2 (Invertibility). If Γ; Θ ⊢ r : A, γ : Γ, θ : Θ and v : A, then γ; θ ⊢ r ⇒ v if and only if γ; v ⊢ r ⇐ θ.

Proof. By simultaneous induction on the term r and the step count of evaluation; simple induction on the term r is not enough as the language has general recursion. The proof is otherwise straightforward, since the evaluation relations are completely symmetric.


Fig. 2. The operational semantics of Kalpis core. Rules for the backward evaluation are omitted in the interest of space, but can be derived as explained in the text.

Remark on Progress. We have chosen to give the semantics in a big-step style in this paper. This choice was made both because the invertibility property is more natural to state about a big-step semantics, which relates input to output directly, and to make the step to a denotational semantics smaller—as mentioned, the evaluation relations suggest an interpretation of invertible terms as parameterized bijections.

Thus, the progress property typically proven for a small-step semantics, meaning that evaluation never gets "stuck" given a valid input (see, e.g., [45]), is not direct to state in our case. However, we get a similar guarantee from the implementation in Agda, whose type checker asserts that no uncontrolled run-time errors are possible. Indeed, the only errors that can occur during evaluation are those caused by imprecise with-conditions or mismatched constructors.

# 4 Arrows for Partial and Local Invertibility

While the core system of Kalpis presented in the previous section is simple and illuminating, it only ofers an operational understanding of the language. Furthermore, it depends on a unidirectional evaluation, which does not ft in a locally invertible setting. We want to get at the essence of partially invertible programming, and show that partial and local invertibility can be reconciled, which is the focus of this section.

#### Syntax

A, B ::= 1 | A ⊕ B | A ⊗ B | µX.A τ ::= A ⇋ B | A ⇝ B | C · A ↭ B µ ::= arr<sup>u</sup> c | µ<sup>1</sup> ≫<sup>u</sup> µ<sup>2</sup> | frst<sup>u</sup> µ | left<sup>u</sup> µ | clone | run α α ::= arr <sup>r</sup> c | α<sup>1</sup> ≫<sup>r</sup> α<sup>2</sup> | frst <sup>r</sup> α | left <sup>r</sup> α | α † | case! α<sup>1</sup> α<sup>2</sup> | pin α | µ ≫! α


Fig. 3. The syntax and types of rrArr: A and B denote base types, τ denotes combinator types, c denotes bijections, µ denotes unidirectional arrow combinators and α denotes invertible arrow combinators.

In what follows, we defne rrArr, a low-level language based on arrow combinators, intended to capture the essence of partially invertible computation. The operations of rrArr directly correspond to the core constructs of Section 2, and have an immediate interpretation in terms of abstract functions and parameterized bijections. What is more, we show that they have an alternative, compositional and locally invertible interpretation using an idea similar to the reader monad in unidirectional computation (based on the irreversibility effect [26] and the reversible reader [23]). This property is not obvious for Kalpis, not to mention earlier work such as Sparcl [39, 40].

We begin by explaining the syntax and semantics of a frst-order fragment of rrArr, before proceeding to give its locally invertible intrepretation. We then extend this fragment to match the full expressiveness of Kalpis in Section 4.5 with operations for higher-order computation. In Section 5, we top it all of by giving a formal translation from Kalpis core to rrArr.

# 4.1 Syntax and Type System of rrArr

Figure 3 shows the syntax and type system of rrArr (where base bijections c of type A ⇋ B are kept abstract). The language involves unidirectional (µ) and invertible (α) terms, similarly to Kalpis. Both kinds of terms form arrows over bijections, through the combinators arr , ≫, and frst.

The former arrow, denoted by µ : A ⇝ B, intuitively represents an ordinary function; arr<sup>u</sup> c extracts the forward semantics of a bijection c, µ<sup>1</sup> ≫<sup>u</sup> µ<sup>2</sup>

composes two functions µ<sup>1</sup> and µ2, and frst<sup>u</sup> µ simply applies µ to the frst component of the input. The unidirectional arrows also feature left<sup>u</sup> , the sum counterpart of frst, and allow copying data through clone.

The latter arrow, denoted by α : C · A ↭ B, represents bijections between A and B parameterized by C; arr <sup>r</sup> c constructs a parameterized bijection that behaves as the bijection c ignoring any parameter, α<sup>1</sup> ≫<sup>r</sup> α<sup>2</sup> composes the two bijections obtained by passing the parameter to both α<sup>1</sup> and α2, and frst <sup>r</sup> α applies the bijection determined by α to the frst component of the input. These arrows also support left <sup>r</sup> , and form an inverse arrow [23] through a dagger operator α † , that undoes α and its efect.

What is special in rrArr is the communication between the two arrows through case!, pin, ≫!, and run, where the former three directly correspond to the core constructs of Section 2. The term case! α<sup>1</sup> α<sup>2</sup> performs partially invertible branching, running α<sup>1</sup> or α<sup>2</sup> depending on the value of its parameter. The term pin α corresponds to the pinning construct; in rrArr, this operation moves part of the input (D) into the parameter (C ⊗ D) of α. The term µ ≫! α represents partially invertible composition of the function µ with the parameterized bijection α. Finally, the operator run allows converting a parameterized bijection C · A ↭ B to a function C ⊗ A ⇝ B by extracting its forward semantics. This can be seen as a special case of applying invertible computations (in a unidirectional context); the treatment of abstraction and application supporting higher-order computation is left for Section 4.5, as it requires a slight extension.

It is worth noting that invertible arrows are inherently allowed to ignore their parameter (through arr <sup>r</sup>), a fact that can be used to derive the crucial erasure operation in unidirectional arrows. In particular, supposing id : A ⇋ A, we get the term run (arr <sup>r</sup> id) : C ⊗ 1 ⇝ 1, which ignores any input C to return ().

# 4.2 Semantics of rrArr

We now formalize the intuitive interpretation through the semantics presented in Figure 4. We defne a base set of values containing unit, pairs, and tagged values, which we type in the conventional way. Recursively typed values roll w are only manipulated by the base invertible combinators c.

$$w ::= () \mid (w\_1, w\_2) \mid \mathbf{inl} \; w \mid \mathbf{inr} \; w \mid \mathbf{roll} \; w$$

The semantics of rrArr again takes the form of three relations: one for unidirectional arrows and two for invertible arrows. The frst (µ w<sup>1</sup> 7→ w2) reads that µ maps w<sup>1</sup> to w2, confrming the intuition that unidirectional arrows represent functions. The second (α w; <sup>w</sup><sup>1</sup> 7→ <sup>w</sup>2) and third (α w; <sup>w</sup><sup>1</sup> <sup>←</sup>[ <sup>w</sup>2) read that given parameter w, α maps w<sup>1</sup> to w<sup>2</sup> under the forward (resp. backward) evaluation, confrming the intuition that our invertible arrows correspond to parameterized bijections. The rules closely follow the informal descriptions presented in the previous section. We assume a base invertible semantics for combinators c of the form c w<sup>1</sup> 7→ w2, invoked by the rules concerning arr for each arrow.


Fig. 4. The semantics of rrArr. As before, the backward evaluation rules are symmetrically obtained from the forward rules.

The semantics satisfes the desired properties of subject reduction and invertibility, although we refer to our mechanized formalization for the details.<sup>9</sup>

#### 4.3 Locally Invertible Interpretation

Recall that our goal is to defne a locally invertible interpretation, whereas the straightforward semantics of Section 4.2 depended on a unidirectional evaluation. In this section, we give an alternative interpretation of rrArr, utilizing the reversible reader (RReader) [23] to interpret the invertible arrow combinators.

$$\left[ \left[ C \cdot A \right. \leadsto \left. B \right] \right] = \mathsf{R} \mathsf{R} \mathsf{e} \mathsf{a} \mathsf{der} \ C \; A \; B \; \mathsf{a}$$

Here, RReader C A B consists of the bijections of type C ⊗ A ⇋ C ⊗ B that keep the C part unchanged. This arrow was originally introduced with the intention of modelling a bijection with some "static" input C [23]. Regarding ⇝, we use the irreversibility efect [26] that leverages the fact that every unidirectional computation can be simulated by a locally invertible computation yielding "garbage" [8], as:

$$\{A \leadsto B\} = \exists G. A \preceq G \otimes B$$

Combining these two efects is a novel point of rrArr; in particular, we contribute the core constructs of case!, ≫!, pin and run, which enable communication between the two. Locally invertible interpretations of the primitives in

<sup>9</sup> https://git.sr.ht/~aathn/kalpis-agda


Fig. 5. The invertible primitives of Π<sup>o</sup> [26]. Note that we replace the looping construct trace with the derived inl for simplicity (Section 4.5 recovers the expressiveness of this combinator).

each system have been given in the existing results. Here, we extend the results with the operations novel to rrArr, to show that the two systems together give a locally invertible model of partially invertible computations.

As our target invertible language, we use Π<sup>o</sup> [26], whose combinators c constitute a minimal set of (non-total) invertible operations. The combinators support sequential composition (c<sup>1</sup> # <sup>c</sup>2), parallel composition (c<sup>1</sup> <sup>⊗</sup> <sup>c</sup><sup>2</sup> and <sup>c</sup><sup>1</sup> <sup>⊕</sup> <sup>c</sup>2), and importantly, a local inversion operator (c † ) such that (c<sup>1</sup> # <sup>c</sup>2) † = c † 2 # c † 1 . Figure 5 shows a summary of the primitives; their behavior should be obvious from the types (see the Agda formalization for details).

We now proceed to give another interpretation of the core constructs of rrArr.

Partially invertible branching. Given <sup>α</sup><sup>1</sup> and <sup>α</sup><sup>2</sup> with <sup>J</sup>α1<sup>K</sup> : <sup>C</sup> <sup>⊗</sup><sup>A</sup> ⇋ <sup>C</sup> <sup>⊗</sup><sup>B</sup> and <sup>J</sup>α2<sup>K</sup> : <sup>D</sup> <sup>⊗</sup> <sup>A</sup> ⇋ <sup>D</sup> <sup>⊗</sup> <sup>B</sup>, we must construct

$$\left[ \left[ case! \,\, \alpha\_1 \,\, \alpha\_2 \right] : (C \oplus D) \otimes A \Leftarrow (C \oplus D) \otimes B . \right]$$

Using distr , we can convert (C⊕D)⊗<sup>A</sup> to <sup>C</sup>⊗A⊕D⊗A, after which <sup>J</sup>α<sup>1</sup><sup>K</sup> and <sup>J</sup>α<sup>2</sup><sup>K</sup> can be run in parallel. Factoring out the B, we get the required transformation.

$$\begin{aligned} \left[ \left[ case! \right.\alpha\_1 \; \alpha\_2 \right] &= distr \left\{ \alpha\_1 \right\} \oplus \left[ \alpha\_2 \right] \right\} \stackrel{\circ}{\;}distr^{\dagger} \end{aligned} $$

Pinning. Given <sup>α</sup> with <sup>J</sup>α<sup>K</sup> : (<sup>C</sup> <sup>⊗</sup> <sup>D</sup>) <sup>⊗</sup> <sup>A</sup> ⇋ (<sup>C</sup> <sup>⊗</sup> <sup>D</sup>) <sup>⊗</sup> <sup>B</sup>, we must produce

$$(\operatorname{pin} \alpha \ulcorner : C \otimes (D \otimes A) \Longleftrightarrow C \otimes (D \otimes B) \shortvee \iota$$

As the reversible reader arrow <sup>J</sup>α<sup>K</sup> already returns the context <sup>C</sup> unchanged, we only need to shufe the inputs and outputs appropriately.

$$\mathbb{I}\left[\operatorname{pin}\,\alpha\right] = \operatorname{assocl}\_{\times}{}^{\circ}\_{\mathfrak{P}}\left[\alpha\right]{}^{\circ}\_{\mathfrak{P}}\operatorname{assocl}\_{\times}^{\dagger}$$

Partially invertible composition. Given <sup>µ</sup> and <sup>α</sup> with <sup>J</sup>µ<sup>K</sup> : <sup>C</sup> ⇋ <sup>G</sup>⊗<sup>D</sup> and <sup>J</sup>α<sup>K</sup> : <sup>D</sup> <sup>⊗</sup> <sup>A</sup> ⇋ <sup>D</sup> <sup>⊗</sup> <sup>B</sup>, we must construct

$$[\mu \gg !\ \alpha] : C \otimes A \Leftarrow C \otimes B .$$

The basic idea is to run <sup>J</sup>µ<sup>K</sup> to produce a <sup>D</sup>-typed value to run <sup>J</sup>α<sup>K</sup> on, however, this brings with it unwanted garbage. Fortunately, since <sup>J</sup>α<sup>K</sup> is a reversible reader arrow, it is guaranteed to preserve the D-component, meaning that after running it we have the same D and G-values available to us as before. These can be turned back into the original <sup>C</sup> value by running <sup>J</sup>µ<sup>K</sup> backwards, giving the transformation required.

<sup>J</sup><sup>µ</sup> <sup>≫</sup>! <sup>α</sup><sup>K</sup> <sup>=</sup> <sup>J</sup>µ<sup>K</sup> <sup>⊗</sup> id # assocl† <sup>×</sup> # id <sup>⊗</sup> <sup>J</sup>αK # assocl<sup>×</sup> # Jµ<sup>K</sup> † ⊗ id JαK A <sup>C</sup> <sup>J</sup>µ<sup>K</sup> <sup>J</sup>µ<sup>K</sup> † B D D G C

Note that this is precisely the construction underlying the reversible updates [5] of imperative reversible languages, and that <sup>J</sup>α<sup>K</sup> preserving the context is crucial for the construction to succeed.

Running invertible computations. Given <sup>α</sup> with <sup>J</sup>α<sup>K</sup> : <sup>C</sup> <sup>⊗</sup> <sup>A</sup> ⇋ <sup>C</sup> <sup>⊗</sup> <sup>B</sup>, we must produce

$$[[run\ \alpha]]:C\otimes A \Leftarrow G\otimes B,$$

for some <sup>G</sup>. Clearly it sufces to take <sup>J</sup>α<sup>K</sup> with <sup>G</sup> <sup>=</sup> <sup>C</sup>, and we are done.

#### 4.4 Correctness

We now state the desired correctness properties of our locally invertible interpretation, which show that it is equivalent to the direct semantics of Figure 4 and that <sup>J</sup>α<sup>K</sup> is indeed a reversible reader arrow (i.e., it preserves the context C).

# Theorem 3 (rrArr 99K Π<sup>o</sup> Soundness).

– µ w<sup>1</sup> 7→ <sup>w</sup><sup>2</sup> implies <sup>J</sup>µ<sup>K</sup> <sup>w</sup><sup>1</sup> 7→ (g, w2) for some <sup>g</sup>. – α w; <sup>w</sup><sup>1</sup> 7→ <sup>w</sup><sup>2</sup> implies <sup>J</sup>α<sup>K</sup> (w, w1) 7→ (w, w2). ⊓⊔

# Theorem 4 (rrArr 99K Π<sup>o</sup> Completeness).


The theorems do not refer to the backward evaluation directly, utilizing the invertibility of both rrArr and Π<sup>o</sup> .

### 4.5 Higher-order Computation

The previous sections laid out the fundamental ideas for representing partial invertibility in a locally invertible setting. However, with rrArr being frstorder, it is not sufcient to be able to interpret Kalpis in a simple way. In this section, we extend the language with four new combinators enabling proper higher-order computation, shown in Figure 6. The combinators curry and app are the standard currying and evaluation maps, creating and applying functions


Fig. 6. Combinators for higher-order computation in rrArr.

A → B. Their invertible counterparts curry• and app• provide the fnal core construct from Section 2: abstraction and application of invertible computations. They operate over parameterized bijections, abstracting the parameter to get a bijection value A ↔ B. The values are extended accordingly with two new closure forms ⟨µ, w⟩ : A → B and ⟨α, w⟩ : A ↔ B, where µ : C ⊗ A ⇝ B, α : C · A ↭ B, and w : C, representing staged unidirectional and invertible computations, respectively.

Having higher-order computation in the invertible setting has been challenging [2,12,39,40]. Borrowing the idea from [39,40], we address the issue by leveraging the fact that the function and bijection values are only part of invertible computations as parameters of parameterized bijections; hence, we only need a limited form of higher-orderness. We extend Π<sup>o</sup> with two additional primitive operations:

$$\begin{array}{l} curv\_{\cong} : (C \otimes A \leftrightharpoons C \otimes B) \to (C \leftrightharpoons C \otimes (A \leftrightarrow B)) \\ app \equiv \quad \cdot ((A \leftrightarrow B) \otimes A) \Longleftrightarrow ((A \leftrightarrow B) \otimes B) \end{array}$$

The former takes a combinator with an auxiliary piece of "state" C, and abstracts it into a bijection given a value of C. The latter applies a bijection, and saves it to enable reversing the operation later. To represent the values of type A ↔ B in Π<sup>o</sup> , we introduce a third form of closure ⟨f, w⟩, where we have f : C⊗A ⇋ C⊗B and w : C. Then, the semantics of app⇋ and curry⇋ are as follows:

$$\frac{\operatorname{clos} = \langle f, w \rangle}{(\operatorname{curr}\_{\Xi} \, \_{f} f) \,\, w \mapsto (w, \operatorname{cols}) \quad \frac{f \,\, (w, a) \mapsto (w', b)}{\operatorname{app}\_{\Xi} \, \, (\langle f, w \rangle, a) \mapsto (\langle f, w' \rangle, b)}$$

As before, the inverse semantics is symmetric; e.g., (curry⇋ f) † (w, clos) 7→ w if clos = ⟨f, w⟩. The (non-total) invertibility of curry⇋ is trivial, as its inverse fails unless its input matches the corresponding output; it is essentially a unidirectional function embedded in the invertible world. Since observational equality of closure values is undecidable, the equality check must rely on some other, intensional (e.g., syntactic) equality. Practically, this means that the combinator can only be used to create a closure and then subsequently undo the very same closure. However, this does not pose an issue for the translation from rrArr, where closures will only result from uses of curry and curry• , both of which are unidirectional arrows (⇝). These unidirectional arrows will only be executed backwards as part of partially invertible compositions (≫!), which ensures that the input is the same as the corresponding output.

Now, we can interpret <sup>J</sup>app<sup>K</sup> <sup>=</sup> app⇋, <sup>J</sup>app• <sup>K</sup> <sup>=</sup> app⇋, and

$$\{\ulcorner curv\ y \ulcorner\} = \ulcorner nl\upharpoonright\_{\blacktriangleleft} \ulcorner curry \underline{s} \urcorner (\ulcorner nl\upharpoonright\_{\blacktriangleleft} \ulcorner \ulcorner \ulcorner n\upharpoonright\_{\blacktriangleleft} \ulcorner \ulcorner \ulcorner nr\vpharpoonright\_{\blacktriangleleft} \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner \ulcorner),$$

The former construction curries <sup>J</sup>µ<sup>K</sup> : <sup>C</sup> <sup>⊗</sup> <sup>A</sup> ⇋ <sup>G</sup> <sup>⊗</sup> <sup>B</sup> given <sup>w</sup> : <sup>C</sup> by creating a one-shot closure ⟨f, inl w⟩ which turns into ⟨f, inr g⟩ for g : G when frst applied, and fails on a second application.

The theorems of Section 4.4 extend without difculty to the higher-order combinators, although the statement is somewhat more intricate due to the differing set of closure values between rrArr and Π<sup>o</sup> . We refer to the mechanized formalization in Agda for details.

# 5 Interpreting Kalpis with Arrows

Theorem 1 (Section 3.6) suggests that a unidirectional term-in-context Γ ⊢ u : A can be seen as a function from Γ to A, and that an invertible term-in-context Γ; Θ ⊢ r : A can be seen as a bijection between Θ and A parameterized by Γ. Then, it is natural that they be related with the two arrows (− ⇝ −) and (− · − ↭ −) of rrArr, respectively. In this section, we give a formal account of this relation by translating terms of Kalpis into rrArr, giving by extension a compositional locally invertible interpretation of Kalpis.

We frst defne some operations on typing contexts. We defne Γ <sup>×</sup> as

$$(x\_1:A\_1,\ldots,x\_n:A\_n)^\times = (((1\otimes A\_1)\otimes A\_2)\otimes\cdots)\otimes A\_n.$$

It is straightforward to defne an operator lookup<sup>x</sup> : Γ <sup>×</sup> ⇝ A provided that Γ(x) = A. We also use a combinator split<sup>Θ</sup>1,Θ<sup>2</sup> : (Θ1⊎Θ2) <sup>×</sup> ⇋ Θ × <sup>1</sup> ⊗Θ × 2 for splitting the linear environments. Then, we give two type-directed transformations: Γ ⊢ u : A 99K µ that transforms u to µ of type Γ <sup>×</sup> ⇝ A, and Γ; Θ ⊢ r : A 99K α that transforms r to α of type Γ <sup>×</sup> ·Θ<sup>×</sup> ↭ A. For the purposes of the translation, we consider a fxed set of type constructors T B ::= 1 | A ⊗ B | A ⊕ B | RecA, identifying µX.A with RecA.

Without loss of generality, we drop unnecessary with-conditions, so that a case• -expression with one branch needs no with-clause, and one with two branches needs only one clause. Due to the space limitations, we present only the most representative cases here, and point the interested reader to the mechanized formalization in Agda.<sup>10</sup>

<sup>10</sup> https://git.sr.ht/~aathn/kalpis-agda

Case T-UCase (A ⊕ B).

$$\begin{array}{c} \Gamma \vdash u : A \oplus B \dashrightarrow \mu\\ \Gamma, x : A; \Theta \vdash r\_1 : C \dashrightarrow \alpha\_1 \qquad \Gamma, y : B; \Theta \vdash r\_2 : C \dashrightarrow \alpha\_2\\ \hline \Gamma; \Theta \vdash \mathsf{case} \ u \text{ of } \mathsf{InL} \ x \rightarrow r\_1; \ \mathsf{InR} \ y \rightarrow r\_2 : C \dashrightarrow\\ \qquad (clone \ \mathsf{X} \mathsf{y} \ \mathsf{first}\_{\mathsf{u}} \ \mu \ \mathsf{gg}\_{\mathsf{u}} \ \mathsf{arr}\_{\mathsf{u}} \ (swap\_{\mathsf{x}} \ \mathsf{ţ} \ \mathsf{distl})) \ \mathbb{X} \ \mathsf{case!} \ \alpha\_1 \ \alpha\_2 \end{array}$$

We can duplicate Γ <sup>×</sup> using clone and use one copy to construct A ⊕ B with µ. Using distl : A ⊗ (B ⊕ C) ⇋ A ⊗ B ⊕ A ⊗ C, which is easily derived, we distribute the second copy of Γ over the sum. Then, the required combinator can be constructed through a combination of partially invertible composition (≫!) and branching (case!), where we have case! α<sup>1</sup> α<sup>2</sup> : (Γ <sup>×</sup> ⊗A⊕Γ <sup>×</sup> ⊗B)·Θ ↭ C. Case T-RCase (A ⊕ B).

$$\begin{array}{c} \Gamma; \Theta\_1 \vdash r\_1 : A \oplus B \dashrightarrow \alpha\_1 \qquad \Gamma; \Theta\_2, x : A \vdash r\_2 : C \dashrightarrow \alpha\_2\\ \hline \Gamma; \Theta\_2, y : B \vdash r\_3 : C \dashrightarrow \alpha\_3 \qquad \Gamma \vdash u : C \to \mathsf{Bool} \dashrightarrow \mu\\ \hline \Gamma; \Theta\_1 \mathbin{\\vdash} \Theta\_2 \vdash \mathsf{case}^\bullet \; r\_1 \; \mathsf{of} \; \mathsf{In} \; x \rightarrow r\_2; \ \mathsf{In} \; y \rightarrow r\_3 \ \mathsf{with} \; u : C \dashrightarrow\\ \qquad \qquad \qquad \qquad \mathit{arr\_r} \; split\_{\Theta\_1, \Theta\_2} \; \mathsf{\gg\mathsf{r}\_r \; first\_r} \; \alpha\_1 \; \mathsf{gg\mathsf{r}\_r \; arr\_r \; (swap\_\times \uparrow \; listl) \; \mathsf{\gg\mathsf{r}\_r}\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \$$

The idea is similar to T-UCase, but we now operate in the invertible world, so we split (Θ<sup>1</sup> ⊎ Θ2) <sup>×</sup> instead of duplicating Γ, and compose using ≫<sup>r</sup> instead of ≫!. The combinator case α<sup>1</sup> α<sup>2</sup> α<sup>3</sup> ≜ left <sup>r</sup> α<sup>1</sup> ≫<sup>r</sup> right <sup>r</sup> α<sup>2</sup> ≫<sup>r</sup> α † <sup>3</sup> with type

$$\text{case}: (D \cdot A \curvearrowleft \curvearrowright C) \to (D \cdot B \curvearrowleft \curvearrowright C) \to (D \cdot C \curvearrowleft \curvearrowright C \oplus C) \to D \cdot (A \oplus B) \curvearrowright C,$$

provides an invertible branching operator analogous to case!, with a postcondition for merging the branches. We convert µ : Γ <sup>×</sup> ⇝ (C → Bool) to an arrow mkCond µ : Γ <sup>×</sup> · C ↭ C ⊕ C through the mkCond operator, which can be defned using pin, case! and app in tandem.

Cases T-Abs• , T-RApp.

Γ; x : A ⊢ r : B 99K α Γ ⊢ λ • x.r : A ↔ B 99K curry • (arr <sup>r</sup> unitel † <sup>×</sup> ≫<sup>r</sup> α) Γ ⊢ u : A ↔ B 99K µ Γ; Θ ⊢ r : A 99K α Γ; Θ ⊢ u ⋄ r : B 99K α ≫<sup>r</sup> (µ ≫! app • )

For T-Abs• , we get α : Γ <sup>×</sup> ·1⊗A ↭ B, which we curry• after handling the unit. For T-RApp, α transforms Θ<sup>×</sup> to A, letting µ be applied through a partially invertible composition (≫!) with app• .

Case T-Pin.

$$\frac{\Gamma \vdash u: C \to A \leftrightarrow B \dashrightarrow \mu \qquad \Gamma; \Theta \vdash r: C \otimes A \dashrightarrow \alpha}{\Gamma; \Theta \vdash p \stackrel{\hspace{1cm}}{\vdash} p \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}{\rightsquigarrow} \stackrel{\hspace{1cm}}$$

We have α producing C ⊗ A, and with parameter Γ <sup>×</sup> ⊗ C, we can apply µ to produce B. Thus, we must shift C from the output into the parameter, and pin achieves just that.

Correctness. Finally, we show the correctness of the translation with respect to the semantics of Sections 3.5 and 4.2. Before we state correctness, we must frst defne a translation of the values, since they difer between Kalpis and rrArr.

$$\begin{aligned} \left[ ( ) \right] &= ( ), \quad \left[ \left( v\_1, v\_2 \right) \right] = ( \left[ v\_1 \right], \left[ v\_2 \right] ),\\ \left[ \left[ \mathfrak{ln} \, v \right] &= \text{inl} \, \left[ v \right], \quad \left[ \mathfrak{ln} \, \mathfrak{k} \, v \right] = \text{inr} \, \left[ v \right], \quad \left[ \mathfrak{Re} \, \mathfrak{l} \, v \right] = \text{roll} \, \left[ v \right],\\ \left[ \left[ \langle \lambda x.u, \gamma \rangle \right] = \langle \left[ u \right], \left[ \gamma \right] \rangle \right], \quad \left[ \langle \lambda^\bullet x.r, \gamma \rangle \right] = \langle arr \, unitel\_\times^\dagger \gg \text{l}\_\text{r} \, \left[ r \right], \left[ \gamma \right] \rangle \end{aligned}$$

The base values are translated trivially, whereas the closures are translated according to the type-directed translation given above (cf. Case T-Abs• ). We also defne a translation of value environments γ in the obvious way.

Then, we can state the correctness of the translation as below.

# Theorem 5 (Kalpis 99K rrArr Soundness).

– <sup>Γ</sup> <sup>⊢</sup> <sup>u</sup> : <sup>A</sup> 99K <sup>µ</sup> and <sup>γ</sup> <sup>⊢</sup> <sup>u</sup> ⇓ <sup>v</sup> implies <sup>µ</sup> <sup>J</sup>γ<sup>K</sup> 7→ <sup>J</sup>v<sup>K</sup> – <sup>Γ</sup>; <sup>Θ</sup> <sup>⊢</sup> <sup>r</sup> : <sup>A</sup> 99K <sup>α</sup> and <sup>γ</sup>; <sup>θ</sup> <sup>⊢</sup> <sup>r</sup> <sup>⇒</sup> <sup>v</sup> implies <sup>α</sup> <sup>J</sup>γK; <sup>J</sup>θ<sup>K</sup> 7→ <sup>J</sup>vK. ⊓⊔

This theorem does not refer to the backward evaluation directly, utilizing the invertibility of both Kalpis and rrArr. The completeness part, on the other hand, does need a separate statement for the backward direction, since there is no a priori guarantee that the output <sup>w</sup> is of the form <sup>J</sup>θK.

# Theorem 6 (Kalpis 99K rrArr Completeness).


We refer to the Agda code in the supplementary material for the proofs.

# 6 Related Work

Kalpis and rrArr are not the frst to support partial invertibility. In the imperative setting, languages such as Janus [35, 53], Frank's R [17], and R-While [19] support a limited form of partial invertibility via reversible update operators [6]. An example of a reversible update statement is x += e, whose efect can be reverted by the corresponding inverse statement x -= e. Both statements use the same e, which need not be invertible (e.g., x += yz is reverted by x -= yz, and vice versa). In the functional setting, Theseus [27] allows a bijection to take additional parameters, but only provided that they are available at compile time. RFun version 2,<sup>11</sup> an extension to the original RFun [54], and CoreFun [25] allow more fexibility via so-called ancilla parameters, which are translated to auxiliary inputs and outputs of the invertible computation. Their approach is similar to Kalpis's but more restrictive since they lack support for the pin operator and

<sup>11</sup> https://github.com/kirkedal/rfun-interp

higher-order computation. Jeopardy [31] is a recent invertible language where even irreversible functions can be inverted in certain contexts depending on implicitly available information. However, this is still work in progress, and seems to lean closer to program inversion methods than the lightweight type-based approach we employ.

Sparcl [39,40] is the most fexible system that supports partial invertibility to our knowledge, which is realized through a more advanced language foundation. Instead of bijections A ↔ B, Sparcl features invertible data marked by the type A• , which implicitly corresponds to some bijection S ↔ A. This idea of invertible data is inherited from the HOBiT language [38], which represents lens combinators [15, 16] as higher-order functions to achieve applicative-style higher-order bidirectional programming [36, 37]. The type system of Sparcl ensures that a closed linear function between invertible data !(A• ⊸ B• ) is isomorphic to a (non-total) bijection between A and B, so that partial invertibility can be represented as a function that takes both unidirectional and invertible data C → A• ⊸ B• . This representation afords more fexibility than Kalpis does: invertible data is allowed to be captured in abstractions, and can even appear in subcomponents of datatypes (e.g., Int ⊗ (Int• ) or Int ⊕ (Int• ) are both valid types). However, this fexibility comes at the cost of complexity, requiring a semantics that interleaves partial evaluation and invertible computation, making a locally invertible interpretation difcult. We remark that the holed residuals ⟨x.E⟩ featured in Sparcl's core system bear a strong resemblance to bijections λ •x.r in Kalpis.

Our combinator language rrArr can be seen as an extension of MLΠ, an arrow metalanguage on top of the invertible language Π treating information creation and loss (non-totality and irreversibility) as an efect [26]. By combining their work with the reversible reader arrow [23], we are able to give erasing (weakening) as a derived operation defned via the operator run (as demonstrated in Section 4). Further research on the nontrivial interaction between the arrows, such as an equational characterization and a denotational model, is left for future work. While the previous work is able to treat non-totality as part of an efect, we assume some non-total operations in the underlying invertible system due to the inclusion of recursive and functional types.

The design of Kalpis is inspired by the arrow calculus of Lindley, Wadler, and Yallop [33], which is a metalanguage for the conventional representation of arrows [24], analogous to the monad metalanguage [42]. In a sense, Kalpis can be seen as a counterpart to the arrow calculus for rrArr. For example, the treatment of λ •x.r is actually inherited from the arrow calculus, where arrows cannot be nested in general [34], unless the underlying arrow supports application to form a monad [24]. To the best of our knowledge, a monad-based programming system for invertible/reversible computation does not exist, though there are some closely related results, including monads for nondeterministic computation (such as [14]) and a monadic programming framework for bidirectional transformations [20, 52]. However, these existing approaches lack the guarantee of bijectivity—a motivation to use invertible languages.

The importance of partial invertibility has been recognized in the neighboring literature on program inversion—program transformations that derive a program of f −1 for a given program of f. Partial inversion [44, 47] essentially applies a binding-time analysis [21, 28] to an input program, where the static data can be treated as unidirectional inputs. The technique is further extended to treat results of inverses as unidirectional [3, 29, 30]. This treatment is similar to the role of pin in Kalpis and Sparcl [39,40] in that it converts invertible data into "static" parameters. Some approaches to program inversion are more liberal: semi inversion [41] essentially converts a program into a logic program, where there is no clear boundary between unidirectional and invertible data, and the PINS system [49], in addition to an original program, can take a control structure of an inverse program to efectively synthesize inverses that may not mirror the control structures of the original. The main limitation of program inversion is that as a program transformation it may fail, often for reasons that are not obvious to programmers.

# 7 Conclusion

We have presented a set of four core constructs for partially invertible programming, demonstrated their expressiveness through examples, and shown that they can be given a locally invertible interpretation, thus solving an open problem in the feld. The four constructs are (1) partially invertible branching, (2) pinning invertible inputs, (3) partially invertible composition, and (4) abstraction and application of invertible computations. We designed the partially invertible language Kalpis on top of these constructs and formalized its syntax, type system and operational semantics. We then presented rrArr, a low-level arrow language with primitives directly corresponding to the constructs, and gave it a locally invertible interpretation based on two efects—the irreversibility efect [26] and the reversible reader [23]. Finally, we presented a type-directed translation from Kalpis to rrArr, showing how to support expressive partial invertibility on top of a locally invertible foundation. Proofs of all theorems stated in the paper are formalized by the accompanying Agda code.<sup>12</sup>

Acknowledgments. We thank Eijiro Sumii, Oleg Kiselyov, and other Sumii-Matsuda Lab members for useful feedback on a preliminary version of this research. This work is partially supported by JSPS KAKENHI Grant Numbers, JP19K11892, JP20H04161, and JP22H03562, EPSRC Grant EXHIBIT: Expressive High-Level Languages for Bidirectional Transformations (EP/T008911/1), and Royal Society Grant Bidirectional Compiler for Software Evolution (IES\R3\170104). This work was also partially supported by a scholarship awarded to the frst author by the Marianne and Marcus Wallenberg Foundation (SJF application BA21-0019).

Data Availability. The accompanying artifact that contains the prototype implementation of Kalpis and the Agda formalization mentioned in this paper is available from https://doi.org/10.5281/zenodo.10511566.

<sup>12</sup> https://git.sr.ht/~aathn/kalpis-agda

# References


<sup>86</sup> A. Ågren Thuné et al.


<sup>88</sup> A. Ågren Thuné et al.

54. Yokoyama, T., Axelsen, H.B., Glück, R.: Towards a reversible functional language. In: Vos, A.D., Wille, R. (eds.) RC. Lecture Notes in Computer Science, vol. 7165, pp. 14–29. Springer (2011). https://doi.org/10.1007/978-3-642-29517-1\_2

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Efcient Matching with Memoization for Regexes with Look-around and Atomic Grouping***<sup>⋆</sup>*

Hiroya Fujinami1,2(B) and Ichiro Hasuo1,2

<sup>1</sup> National Institute of Informatics, Tokyo, Japan makenowjust@nii.ac.jp

<sup>2</sup> SOKENDAI (The Graduate University for Advanced Studies), Hayama, Japan

**Abstract.** *Regular expression (regex) matching* is fundamental in many applications, especially in web services. However, matching by *backtracking*—preferred by most real-world implementations for its practical performance and backward compatibility—can sufer from so-called *catastrophic backtracking*, which makes the number of backtracking superlinear and leads to the well-known ReDoS vulnerability. Inspired by a recent algorithm by Davis et al. that runs in linear time for (nonextended) regexes, we study efcient backtracking matching for regexes with two common extensions, namely *look-around* and *atomic grouping*. We present linear-time backtracking matching algorithms for these extended regexes. Their efciency relies on *memoization*, much like the one by Davis et al.; we also strive for smaller memoization tables by carefully trimming their range. Our experiments—we used some real-world regexes with the aforementioned extensions—confrm the performance advantage of our algorithms.

**Keywords:** regular expression · look-around · atomic grouping · pattern matching · ReDoS · memoization

# **1 Introduction**

**Regex Matching** *Regular expressions* (*regexes*) are a fundamental formalism for various pattern-matching tasks. Many regex matching implementations, however, sufer from occasional super-linear growth of their execution time. Such excessive execution time can be exploited for DoS attacks—this is a vulnerability called *regex denial of service* (*ReDoS*). ReDoS is recognized as a signifcant security concern in many real-world systems, especially web services such as Stack Overfow and Cloudfare (see §2.4 for more details).

**Need for Efcient Backtracking Regex Matching** The principal cause of ReDoS is *catastrophic backtracking*, that is, the explosion of recursion in a backtracking-based matching algorithm.

*<sup>⋆</sup>* The authors are supported by CREST ZT-IoT Project (No. JPMJCR21M3), ER-ATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), and ASPIRE Grant No. JPMJAP2301, JST.

In regex matching, in general, a regex *r* is converted into a non-deterministic fnite automaton (NFA) A, and the latter is executed for an input string *w*. The non-determinism of A can be resolved in either a *depth-frst* or a *breadth-frst* manner. The former is called *backtracking regex matching*, and the latter is the *on-the-fy DFA construction*.

Catastrophic backtracking and ReDoS are phenomena unique to the former (i.e., backtracking)—as is well-known, the time complexity of the on-the-fy DFA construction is linear (i.e., *O*(|*w*|)). Indeed, many modern regex implementations are based on the on-the-fy DFA construction, including RE2<sup>3</sup> , Go's regexp<sup>4</sup> , and Rust's regex<sup>5</sup> .

It is practically essential, however, to make backtracking regex matching more efcient. A principal reason is *consistency*. Most existing regex matching implementations use backtracking, and they return only one matching position out of many (see §2.3). While it is possible to replace them with on-the-fy DFA matching, it is non-trivial to ensure consistency, that is, that the chosen matching position is the same as the original backtracking matching implementation. .NET's regex implementation has a linear-time complexity backend using a derivative-based approach, which is compatible with a backtracking backend. Still, it does not support look-around and atomic grouping [28]. Once the returned matching position changes, it can unexpectedly afect the behavior of all the systems (e.g., web services) that use regex matching.

Another reason for improving backtracking regex matching is its *extensibility*. There are many extensions of regexes widely used—such as the ones we study, namely look-around and atomic grouping—and they are supported by few onthe-fy DFA matching implementations.

**Existing Work: Linear-time Backtracking Matching with Memoization** *Memoization* is a well-known technique for speeding up recursive computations. The recent work [10] shows that memoization can be applied to backtracking regex matching with consistency in mind. Specifcally, the work [10] presents a backtracking matching algorithm that runs in *O*(|*w*|) time—thus, it is theoretically guaranteed to avoid catastrophic backtracking—for regexes without extensions. (They also mention application to extended regexes in [10], but we found issues in their discussion—see Remark 2).

**Our Contribution: Linear-time Backtracking Matching for Some Extended Regexes** In this paper, we present a linear-time backtracking matching algorithm for regexes with *look-around* and *atomic grouping*, two real-world extensions of regexes. It uses memoization in order to achieve a linear-time complexity. We also prove that it is consistent (i.e., it chooses the same matching position as the original algorithm without memoization).

The technical key to our algorithm is the design of suitable memoization tables. We follow the general idea in [10] of using memoization for backtracking matching, but our examination of its issues with extended regexes (Remark 2)

<sup>3</sup> https://github.com/google/re2

<sup>4</sup> https://pkg.go.dev/regexp

<sup>5</sup> https://docs.rs/regex/latest/regex/

shows that the range—i.e., the set of possible entries—of memoization tables should be suitably extended. Specifcally, the range in [10] is {**false**}, recording only matching failures; it is extended in our algorithm to {Failure(*j*) | *j* ∈ {0*, . . . , ν*(A)}} ∪ {Success}. Here, *ν*(A) is the maximum nesting depth of atomic grouping for the (extended) NFA A, defned in §5.

Our development is rigorous and systematic, based on the notion of NFA whose labels can themselves be NFAs. This extended notion of NFA is suggested in [10, Section IX.B]; in this paper, we formalize it and build its theory.

We experimentally evaluate our algorithm; the experiment results confrm its performance advantages. Additionally, we survey the usage status of look-around and atomic grouping—two regex extensions of our interest—in real-world regexes and demonstrate their wide usage (§6).

**Technical Contributions** We summarize our technical contributions.


**Organization** We provide some preliminaries in §2, such as regex extensions of our interest. Our formalization of NFAs with sub-automata is also presented here. In §3, we discuss the work [10] that is closest to ours. We present our matching algorithm for regex with look-around in §4 and the one for regex with atomic grouping in §5. Then, we discuss our implementation and experimental evaluation in §6. We conclude in §7.

Some additional proofs and other materials are deferred to the appendices in the extended version [15].

**Related Work** Many related works are discussed elsewhere in suitable contexts. Here, we discuss other related works.

There are many theoretical studies on look-around and atomic grouping. The work [27] is a theoretical study of look-ahead operators; it shows how to convert them to fnite automata. Another conversion based on derivatives is introduced in [26]. The work [3] conducts a fne-grained analysis of the size of DFAs obtained from converting regexes with look-ahead, improving the bounds given in [26,27]. The work [5] discusses the relation between look-ahead operators and back-references in regexes. A recent study [22] presents a linear-time matching algorithm for regexes with look-around; it uses a memoization-like construct for efciency. However, the compatibility with backtracking is not a concern there, unlike the current work. On atomic grouping, conversion to fnite automata is proposed [4], where atomic grouping is simulated by look-ahead.

Another common regex extension is *back-reference*. We do not deal with this extension because 1) this extension is known to be non-regular (i.e., the language class defned by back-reference is beyond regular), and 2) its matching problem is known to be NP-complete [1] (thus the search for linear-time matching is doomed). There are other extensions (absent operators, conditional branching, etc.), but they are used less often (cf. §6).

ReDoS countermeasures are an active scientifc topic. Besides efcient matching, there are two directions for them: *ReDoS detection* and *ReDoS repair*. Re-DoS detection is a problem that determines whether a given regex can cause catastrophic backtracking. This can be done by fnding specifc structures in a transition diagram of an automaton [2, 18, 29, 34, 36, 37]. Besides, dynamic analysis, such as fuzzing [31], and combinations of static and dynamic analyses [19] are studied. ReDoS repair is a problem of modifying a given regex so that it does not cause ReDoS. Known solutions include exploring ReDoS-free regexes using SMT solvers [6, 21] and rule-based rewriting of vulnerable regexes [20]. These ReDoS detection and repair measures are computationally demanding, and their real-world deployment is limited.

There are other implementation-level studies on speeding up regex matching, such as Just-in-Time (JIT) compilation [17] and FPGA [32]. However, these studies are not intended to prevent catastrophic backtracking.

# **2 Preliminaries**

We introduce preliminaries for this paper. Firstly, we present some basic concepts such as regexes, NFAs, conversion from regexes to NFAs, and backtracking matching. We then discuss *catastrophic backtracking* and the *ReDoS vulnerability* that it can cause. Finally, we introduce *look-around* and *atomic grouping* as practical regex extensions and *NFAs with sub-automata* for these extensions.

We fx a fnite set *Σ* as an alphabet throughout this paper. We call sequences of elements of *Σ strings*. The *empty string* is denoted by *ε*. For a string *w* = *σ*0*σ*<sup>1</sup> *. . . σn*−1, the *length* of *w*, denoted by |*w*|, is defned as |*w*| = *n*. We also write *w*[*i*] = *σ<sup>i</sup>* for *i* ∈ {0*, . . . , n* − 1}.

We use partial functions for memoization. For two sets *A* and *B*, a partial function *G* from *A* to *B*, denoted by *G*: *A ⇀ B*, is defned as a function *G*: *A* → *B* ∪ {⊥}. Here ⊥ is the element for "undefned," and it is assumed that ⊥ ̸∈ *B*.

Let *G*: *A ⇀ B* be a partial function, *a* ∈ *A*, and *b* ∈ *B*. We let *G*(*a*) ← *b* denote an updated partial function: it carries *a* to *b*, and any other *x* ∈ *A* to *G*(*x*) (it is undefned if *G*(*x*) is initially undefned).

#### **2.1 Regexes**

*Regular expressions* (*regexes*) are defned by the following abstract grammar.

$$\begin{array}{lclcl}r ::= & \sigma & \text{(a (internal) character, where } \sigma \in \Sigma\text{)} & \mid & \varepsilon & \text{(the empty string)}\\ & \mid & r \mid r \text{ (an alternation)} & & \mid r \cdot r \text{ (a concatenation)}\\ & \mid & r^\* & \text{(a repetition)}\end{array}$$

Fig. 1: a conversion from regexes to NFAs

The concatenation operator · may be omitted when there is no ambiguity. The precedence of operators is as follows: repetition, concatenation, and alternation. For example, *ab*<sup>∗</sup> |*c* means (*a* · (*b* ∗ ))|*c*.

For a regex *r*, the *size of r*, denoted by |*r*|, is defned as follows: |*σ*| = |*ε*| = 1, |(*r*1|*r*2)| = |*r*<sup>1</sup> · *r*2| = |*r*1| + |*r*2| + 1, and |*r* ∗ | = |*r*| + 1.

#### **2.2 NFAs**

A *non-deterministic fnite state automaton* (*NFA*) is a quadruple (*Q, q*0*, F, T*), where *Q* is a fnite set of states *q*<sup>0</sup> ∈ *Q* is an initial state, *F* ⊆ *Q* is a set of accepting states, and *T* is a transition function. For each *q* ∈ *Q* \ *F*, *T*(*q*) can be one of the following: *T*(*q*) = Eps(*q* ′ ), *T*(*q*) = Branch(*q* ′ *, q*′′), and *T*(*q*) = Char(*σ, q*′ ) where *q* ′ *, q*′′ ∈ *Q* and *σ* ∈ *Σ*.

The above defnition of a transition function *T* is tailored to our purpose of backtracking. Compared to the common defnition *δ* : *Q* × ({*ε*} ∪ *Σ*) → 2 *<sup>Q</sup>*, it expresses general branching as combinations of certain *elementary branchings*. The latter is namely one transition by *ε*, two transitions by *ε*, and one transition by a certain character *σ* ∈ *Σ*. This makes the description of backtracking matching easier. Note, in particular, that the successors *q* ′ *, q*′′ in the branching Branch(*q* ′ *, q*′′) are ordered. Here, *q* ′ and *q* ′′ are called the *frst* and *second successors*, respectively. This defnition of transition functions is similar to the op-codes of many real-world regex-matching implementations (cf. [8]).

We present a conversion from regexes to NFAs (see Figure 1); it is similar to the Thompson–McNaughton–Yamada construction [23, 35]. For a regex *r*, A(*r*) denotes the *NFA* A *converted from r*. In the fgure, labels on arrows show kinds of transitions. In a Branch transition, the top arrow points to the frst successor, and the bottom points to the second successor. Rectangles indicate that the conversion is applied to sub-expressions inductively. Because each case of this construction introduces at most two new states, for a regex *r* and the NFA A(*r*) = (*Q, q*0*, F, T*), we have |*Q*| = *O*(|*r*|).


We collectively call Eps and Branch transitions *ε-transitions*. Later in this paper, if there are consecutive *ε*-transitions, they may be shown as a single transition in a fgure. When a certain state returns to itself by *ε*-transitions, such a sequence of *ε*-transitions is called an *ε-loop*. *ε*-loops are problematic in matching because they cause infnite loops in matching.

An *ε*-loop can be detected during matching by recording a position on an input string when a state is visited. When an *ε*-loop is detected, several solutions exist to deal with it (see, e.g., [30]), such as treating an *ε*-loop as a failure (e.g., JavaScript and RE2) or treating it as a success but escaping it (e.g., Perl). These solutions can be easily adapted to our algorithms; therefore, for the simplicity of presentation, we introduce the following assumption.

*Assumption 1 (no ε-loops).* NFAs do not contain *ε*-loops.

#### **2.3 Backtracking Matching**

We present a basic backtracking matching algorithm for NFAs in Algorithm 1. It serves as a basis for optimization by memoization, both in [10] and in the current work.

The function Match<sup>A</sup>*,w* is recursively called in this algorithm, but it must terminate on Asm. 1. It takes two parameters: A is an NFA, and *w* is an input string. It also takes two arguments: *q* ∈ *Q* is the current state, and *i* ∈ {0*, . . . ,* |*w*|} is the current position on *w*. Match<sup>A</sup>*,w*(*q*0*, i*) for an NFA A = (*Q, q*0*, F, T*) returns SuccessAt(*i* ′ ) with the matching position *i* ′ ∈ {0*, . . . ,* |*w*|} if the matching with A succeeds from *i* to *i* ′ on *w*, or returns Failure if the matching fails.

The Match function implements *partial matching*: given the position *i* ∈ {0*, . . . ,* |*w*|} of interest, one obtains, by running Match<sup>A</sup>*,w*(*q*0*, i*), one "matching position" *i* ′ (if it exists) such that *w*[*i*] *w*[*i*+1] *. . . w*[*i* ′ ] is accepted by A. Note the

Fig. 2: the NFA A((*a*|*a*) ∗ *b*)

diference from *total matching*: given A and *w*, it returns **true** if (the whole) *w* is accepted by A and **false** otherwise. The practical relevance of partial matching must be clear, as we can use it for text search and replacement.

Lines 5 to 8 in Algorithm 1 perform matching for Branch transitions. Here, the algorithm frst tries matching from the frst successor *q* ′ , and if that fails, it tries matching from the second successor *q* ′′ with the same position. This behavior is called *backtracking*.

We defne the *regex partial matching* problem using the function Match.

*Problem 1 (regex partial matching).*

**Input**: a regex *r*, an input string *w*, and a starting position *i* ∈ {0*, . . . ,* |*w*|} **Output**: returns Match<sup>A</sup>(*r*)*,w*(*q*0*, i*) where A(*r*) = (*Q, q*0*, F, T*).

*Remark 1.* One can say that the problem formulation is a bit strange. It requires, as output, a specifc matching position chosen by a specifc algorithm Match, while a usual formulation would require an arbitrary matching position. We take this formulation since we aim to show that our optimization by memoization not only solves partial matching but also is *consistent* with an existing backtracking matching algorithm, in the sense we discussed in §1. We formulate consistency as correctness with respect to Prob. 1, that is, preserving the solution chosen by the specifc algorithm Match. We also note that the algorithm Match mirrors many existing implementations of regex matching (cf. §2.2).

#### **2.4 Catastrophic Backtracking and ReDoS**

In the execution of the Match function (Algorithm 1), depending on an NFA A and an input string *w*, the number of recursive calls for the Match function may increase explosively, resulting in a very long matching time, as we will see in Example 1. This explosive increase in matching time is called *catastrophic backtracking*.

*Example 1 (catastrophic backtracking).* Consider the NFA A = A((*a*|*a*) ∗ *b*) = (*Q, q*0*, F, T*) shown in Figure 2, and let *w* = "*a <sup>n</sup>c*" (the string repeating *a* of *n* times and ending with *c*) be an input string. Match<sup>A</sup>*,w*(*q*0*,* 0) invokes recursive calls *O*(2*n*) times until returning Failure. The reason for this recursive call explosion is to try all combinations on *q*<sup>2</sup> to *q*<sup>3</sup> and *q*<sup>4</sup> to *q*<sup>5</sup> transitions for each *a* in *w* during the matching.

*Regexes denial of service* (*ReDoS*) is a security vulnerability caused by catastrophic backtracking. In ReDoS, catastrophic backtracking causes a huge load on servers, making them unable to respond in a timely manner. There are cases of service outages due to ReDoS at Stack Overfow in 2016 [12] and at Cloudfare in 2019 [16]. Additionally, a 2018 study [33] reported that over 300 web services have potential ReDoS vulnerabilities. Thus, ReDoS is a widespread problem in the real world, and there is a great need for countermeasures.

According to a 2019 study [25], only 38% of developers are aware of Re-DoS. This study also found that many developers fnd it difcult not only to read regexes but also to fnd and validate regexes to match their desires. It is mentioned in [25] that developers use Internet resources such as Stack Overfow to fnd regexes. In recent years, it has also become common to use generative AIs such as ChatGPT for such a purpose. However, when the authors asked, "Please suggest 10 regexes for validating email addresses" to ChatGPT,<sup>6</sup> 2 of the 10 suggested regexes would cause ReDoS (see Table 1). Developers may unknowingly use such vulnerable regexes. For this reason, it is important to develop ReDoS countermeasures that can be achieved without the developer being aware of them.

Matching speed-up is a way to avoid causing ReDoS by ensuring that matching is linear in time to the length of an input string, freeing developers from worrying about ReDoS. A popular method for matching speed-up is using breadthfrst search for non-deterministic transition instead of backtracking (depth-frst search); it is called the *on-the-fy DFA construction* [7, 28]. However, since lookaround and atomic grouping are extensions based on backtracking (see §2.6), it is not obvious that they can be supported by the on-the-fy DFA construction.

*Memoization* is another approach to ensuring linear-time backtracking matching; we pursue it in this paper.

#### **2.5 Regex Extensions: Look-around and Atomic Grouping**

Many real-world regexes come with various extensions for enhanced expressivity [13]. In this paper, we are interested in two classes of extensions, namely *look-around* and *atomic grouping*.

**Look-around** Look-around is a regex extension that allows constraints on strings around a certain position. It is also called *zero-width assertion* (e.g., in [10]) because it does not consume any characters. Look-around consists of four types: *positive* or *negative*, and *look-ahead* or *look-behind*.

Positive look-ahead is typically represented by the syntax (?=*r*); its matching succeeds when, reading ahead from the current position of the input string, the

<sup>6</sup> We used ChatGPT 3.5 (September 25, 2023 version).

<sup>7</sup> The second and third regexes are the same; they are the actual output of ChatGPT.

Table 1: the regexes given by ChatGPT for the question "Please suggest 10 regexes for validating email addresses".<sup>7</sup>


matching of the inner regex *r* succeeds. Note that the position for the overall matching does not change by the inner matching of *r*. For example, the regex /(?=bc)/ matches the string "abc" from position 1 (i.e., after the frst character a) without consuming any characters.

The matching of a negative look-ahead (?!*r*) succeeds when the inner regex *r* is *not* matched.

Positive or negative *look-behind*—denoted by (?<=*r*) or (?<!*r*), respectively is similar to the above, with the diference that the inner matching of *r* is performed *backward*, i.e., from right to left. For example, the regex /(?<=ab)/ matches the string "abc" from position 2 (i.e., before the last character c) without consuming any characters.

A typical use of look-around is to put a look-behind before (or a look-ahead after) a regex *r*. This is useful when one wants to perform a search or replacement of *r* for only those occurrences that are in a certain context. For example, the regex /(?<=<p>)[ˆ<]\*(?=<\/p>)/ matches only contents of the HTML <p> tag. As another example, common assertions such as \A (this matches the beginning of a string) and \z (this matches the end) can be expressed using look-around, namely \A = (?<!.) and \z = (?!.).

**Atomic Grouping** Atomic grouping is a regex extension that controls backtracking behaviors. It is designed to manually avoid problems caused by backtracking, such as catastrophic backtracking (§2.4).

Atomic grouping is represented by the syntax (?>*r*); once the matching of the inner regex *r* succeeds, the remaining branches in potential backtracking for matching *r* are discarded. For example, the regex /(a|ab)c/ matches the string "abc", but the regex /(?>(a|ab))c/ using atomic grouping does *not* match it. This is because, once a in the atomic grouping matches the frst character a of "abc", the remaining branch ab (in a|ab) is discarded, and one is left with the regex c and the string "bc".

Atomic grouping is often used for the purpose of preventing catastrophic backtracking. In that case, it is used in combination with the repetition syntax, e.g., (?>(*r*\*)) (often abbreviated as *r*\*+) and (?>(*r*+)) (abbrev. as *r*++). These abbreviations are called *possessive quantifers*. The former (namely (?>(*r*\*))) is intuitively understood as (?>(*ε*|*r*|*rr*|*. . .* )), with the diference that longer matching is preferred (this is because the Eps loop is the frst successor in Figure 1e). Once a longer match is found, the remaining branches (i.e., those for shorter matches) get discarded, thus preventing catastrophic backtracking.

One might wonder if our (linear-time and thus ReDoS-free) matching algorithm should support atomic grouping—the principal use of atomic grouping is to suppress backtracking and avoid ReDoS. We do need to support it since, as we discussed in §1, ours is meant to be a drop-in replacement for matching implementations that are currently used.

**Our Target Extended Regexes** Our target class, namely *regexes with lookaround and atomic grouping*, is defned by the following grammar.


For brevity, we sometimes refer to regexes with look-around and atomic grouping as *(la, at)-regexes*. We also refer to regexes with look-around as *la-regexes* and regexes with atomic grouping as *at-regexes*.

For a (la, at)-regex *r*, the size of *r*, denoted by |*r*|, is defned as the same as the regex one except for |(?=*r*)| = |(?>*r*)| = |*r*| + 1.

Look-around is known to be *regular*: they can be converted to DFA, and the language family of la-regexes is the same as the regular language. This fact is mentioned in [3,26,27]. Atomic grouping is also known to be regular in the same sense [4]. However, it is known that look-ahead and atomic grouping can make the number of states of the corresponding DFA grow exponentially [3, 4, 26, 27].

In what follows, for simplicity, we only discuss positive look-ahead in discussions of look-around. Adaptation to other look-around operators, such as negative look-behind, is straightforward.

#### **2.6 NFAs with Sub-automata**

We introduce *NFAs with sub-automata* for backtracking matching algorithms for (la, at)-regexes. This extended notion of NFAs is suggested in [10, Section IX.B], but it seems ours is the frst formal exposition.

Roughly speaking, an NFA with sub-automata is an NFA whose transitions can be labeled with—in addition to a character *σ* ∈ *Σ*, as in usual NFAs another NFA with sub-automata. See Figure 3, where transitions from *q*<sup>0</sup> to *q*<sup>1</sup> are labeled with *r* , the NFA with sub-automata obtained by converting *r*. We annotate these transitions further with a label (pla for positive look-ahead, at for atomic grouping, etc.) that indicates which operator they arise from. Note that NFAs with sub-automata can be nested—transitions in *r* in Figure 3 can be labeled with NFAs with sub-automata, too.

Our precise defnition is as follows. There, *P* is the set that collects all states that occur in an NFA with sub-automata A, i.e., in 1) the top-level NFA, 2) its label NFAs, 3) their label NFAs, and so on.

**Defnition 1 (NFAs with sub-automata).** *An* NFA with sub-automata A *is a quintuple* A = (*P, Q, q*0*, F, T*) *where P is a fnite set of states and Q* ⊆ *P is a set of so-called* top-level states*. We require that the quadruple* (*Q, q*0*, F, T*) *is an NFA, except that the value T*(*q*) *of the transition function T is either 1)* Eps(*q* ′ )*,* Branch(*q* ′ *, q*′′)*, or* Char(*σ, q*′ ) *(as in usual NFAs, §2.2), or 2)* Sub(*k,* A′ *, q*′ )*, where* A′ *is an NFA with sub-automata, q* ′ *is a successor state, and k is a* kind label *where k* ∈ {pla*,* nla*,* plb*,* nlb*,* at}*.*

*We further impose the following requirements. Firstly, we require all NFAs with sub-automata in* A *to have disjoint state spaces. That is, for any distinct top-level states q, q*′′ ∈ *Q in* A*, if T*(*q*) = Sub(*k,* A′ *, q*′ ) *and T*(*q* ′′) = Sub(*k* ′ *,* A′′*, q*′′′)*, then we must have P* ′ ∩ *P* ′′ = ∅*, Q* ∩ *P* ′ = ∅ *and Q* ∩ *P* ′′ = ∅*, where* A′ = (*P* ′ *, . . .*) *and* A′′ = (*P* ′′ *, . . .*)*. Secondly, we require that the set P in* A = (*P, . . .*) *is the (disjoint) union of all states that occur within* A*, that is, P* = *Q* ∪ S *q*∈*Q,T*(*q*)=Sub(*k,*A′ *,q*′)*,*A′=(*P* ′ *,...* ) *P* ′ *.*

The kind label *k* in Sub(*k,* A′ *, q*′′) indicates how the sub-automaton A′ should be used (cf. Algorithm 2). If every kind label occurring in A (including its subautomata) is either pla*,* nla*,* plb, or nlb, then A is called a *la-NFA*. Similarly, if every kind label is at, A is called an *at-NFA*. Following this convention, general NFAs with sub-automata are called *(la, at)-NFAs*.

Note that the defnition is recursive. Non-well-founded nesting is prohibited, however, by the fniteness of *P*. By the defnition, if *P* = *Q*, then A does not contain any transitions labeled with sub-automata.

In addition to Eps and Branch transitions, we refer to Sub transitions with a label *k* ∈ {pla*,* nla*,* plb*,* nlb} as *ε*-transitions too. We also assume the following, similarly to Asm. 1.

*Assumption 2.* (la, at)-NFAs do not contain *ε*-loops.

Efficient Matching with Memoization for (la, at)-regexes 101

Fig. 3: a conversion from (la, at)-regexes to (la, at)-NFAs. For negative lookahead, we use the corresponding kind label nla. For positive and negative lookbehind, besides using the kind labels plb and nlb, we suitably reverse *r* .


For (la, at)-regexes, their conversion to (la, at)-NFAs is described by the constructions in Figure 3—using transitions labeled with sub-automata—in addition to the conversion for regexes in §2.2. Note that we have |*P*| = *O*(|*r*|) in these constructions.

The backtracking matching algorithm in Algorithm 1 can be naturally extended to (la, at)-NFAs; it is shown in Algorithm 2. The clauses for positive look-ahead (Lines 12 to 16) and atomic grouping (Lines 17 to 21) are similar to each other, conducting matching for sub-automata. Note that their diference is in the "return position" (*i* in Line 15; *i* ′ in Line 20).

The clauses for other look-around operators are similar to the ones for positive look-around. For look-behind, we can suitably use an additional parameter *d* ∈ {−1*,* +1} for indicating a matching direction.

Using the extended backtracking matching algorithm (Algorithm 2), we defne the *partial matching problem for (la, at)-regexes* in the same way as for regexes without extensions (Prob. 1).

*Problem 2 ((la, at)-regex partial matching).*

**Input**: a (la, at)-regex *r*, an input string *w*, and a starting position *i* **Output**: returns Match-(la*,* at)A(*r*)*,w*(*q*0*, i*) where <sup>A</sup>(*r*) = (*P, Q, q*0*, F, T*).

# **3 Previous Works on Regex Matching with Memoization**

This section introduces an existing work [10] on regex matching with memoization, paving the way for our algorithms for (la, at)-regexes in Sections 4 and 5.

*Memoization* is a programming technique that makes recursive computations more efcient by 1) recording arguments of a function and the corresponding return values and 2) reusing them when the function is called with the recorded arguments.

As we described in §2.3, regex matching is conducted by backtracking matching. It is implemented by recursive functions (see Algorithms 1 and 2); thus, it is a natural idea to apply memoization. Since Java 14, Java's regex implementation has indeed used memoization for optimization. However, this optimization is not enough to completely prevent ReDoS; see, e.g., [24].

The work that inspires the current work the most is [10], whose main novelty is linear-time backtracking regex matching (much like the current work). Its contributions are as follows.


We will mainly discuss the above item 1; it serves as a basis for our algorithms in Sections 4 and 5. The technique in the item 2 is potentially very relevant: we expect that it can be combined with the current work; doing so is future work. The content of the item 2 is reviewed in [15, Appendix A] for the record.

*Remark 2.* On the above item 4, the work [10] claims that the time complexity of their algorithm is linear also for REWZWA (*O*(|*w*|) for an input string *w*). However, we believe that this claim comes with the following problems.

**–** The description of an algorithm for REZWA in [10] is abstract and leaves room for interpretation. The description is to "preserve the memo functions of the sub-automata throughout the simulation of the top-level M-NFA, remembering the results from sub-simulations that begin at diferent indices *i* of *w*" [10, Section IX-B]. For example, it is not explicit what the "results" are—they can mean (complete) matching results or mere success/failure.

**Algorithm 3** a total matching algorithm with memoization for NFAs without *ε*-transitions [10, Listing 2].



Our contribution includes a correct memoization algorithm for look-around (REZWA) that resolves the above problems.

#### **3.1 Linear-time Backtracking Matching with Memoization**

We describe the frst main contribution of the work [10] (the item 1 in the above list), namely a backtracking algorithm that achieves a linear-time complexity thanks to memoization. The algorithm [10, Listing 2] is presented in Algorithm 3.

In this algorithm DavisSL*<sup>M</sup>* <sup>A</sup>*,w*, an NFA A is a quintuple (*Q, q*0*, F, δ*) where *δ* : *Q* × *Σ* → 2 *<sup>Q</sup>* is an nondeterministic transition function. An additional parameter *M* : *Q* × N *⇀* {**false**} is a memoization table, which is mathematically a mutable partial function. This algorithm implements total matching (cf. §2.3). It is notable that the memoization table records only matching *failures*: a matching *success* does not have to be recorded since it immediately propagates to the success of the whole problem.

Fig. 4: the NFA A((*aa*|*aa*) ∗ *b*), after removing *ε*-transitions


1: **function** DavisSLImpl*<sup>M</sup>* <sup>A</sup>*,w*(*q, i*) 2: (*Q, q*0*, F, δ*) = A 3: **if** *i* = |*w*| **then return** whether *q* ∈ *F* 4: **if** *M*(*q, i*) ̸= ⊥ **then return** *M*(*q, i*) *M*(*q, i*) ← **false** *▷ M*(*q, i*) *is speculatively set to* **false** 5: **for** *q* ′ ∈ *δ*(*q, w*[*i*]) **do** . . . 7: *M*(*q, i*) ← **false** *▷ moved up* . . .

This algorithm achieves a linear-time matching. It thus prevents ReDoS. A full proof of linear-time complexity is found in [10, Appendix C], but its essence is the following (note the critical role of memoization here).


*Example 2 (matching with memoization for NFAs without ε-transitions).* Let us consider the regex (*aa*|*aa*) ∗ *b* and the corresponding NFA A((*aa*|*aa*) ∗ *b*) defned in §2.2. For the purpose of applying Algorithm 3, we manually remove its *ε*transitions, leading to the NFA in Figure 4. Let *w* = "*a* <sup>2</sup>*nc*" be an input string. Match<sup>A</sup>*,w*(*q*0*,* 0) (without memoization) invokes recursive calls *O*(2*<sup>n</sup>*) times for the same reason as in Example 1, but DavisSL*M*<sup>0</sup> <sup>A</sup>*,w*(*q*0*,* 0) (with memoization, where *M*<sup>0</sup> is the initial memoization table) invokes recursive calls *O*(*n*) times because *M*(*q*0*, i*) for each position *i* ∈ {0*,* 2*, . . . ,* 2*n*} has been recorded after the frst visit.

*Remark 3.* Following the discussion in Remark 2, here we describe the gap between Algorithm 3—the algorithm described in the paper [10]—and its prototype implementation [11]. The latter is shown in Algorithm 4.

The precise diference between the two algorithms is that Line 7 in Algorithm 3 is moved up to the moment just before the for-loop, in Algorithm 4. It is not hard to see that this modifcation does not afect the correctness of the algorithm: if the pair (*q, i*) is visited again in the future, it means that the current matching from (*q, i*) did not succeed, and backtracking occurred. Note that, in case the current matching is successful, the function call returns **true** so the memoization content *M*(*q, i*) should not matter.

However, the above argument is true only when there is no look-around. (A detailed discussion is in Example 3.) This point seems to be missed in the implementation [11].

**Algorithm 5** a partial matching algorithm with memoization. An adaptation of Algorithm 3 from [10], and a basis for our algorithms (Algorithms 6 and 7)

```
1: function MemoM
                   A,w(q, i)
       Parameters: an NFA A, an input string w, and
                     a memoization table M : Q × N ⇀ {Failure}
              Input: a current state q, and a current position i
            Output: returns SuccessAt(i
                                      ′
                                       ) if the matching succeeds, or
                     returns Failure if the matching fails
2: (Q, q0, F, T) = A
3: if M(q, i) ̸= ⊥ then return M(q, i)
4: result ← ⊥
5: if q ∈ F then result ← SuccessAt(i)
6: else if T(q) = Eps(q
                         ′
                          ) then result ← MemoM
                                               A,w(q
                                                    ′
                                                     , i)
7: else if T(q) = Branch(q
                            ′
                             , q′′) then
8: result ← MemoM
                        A,w(q
                             ′
                             , i)
9: if result = Failure then result ← MemoM
                                              A,w(q
                                                   ′′, i)
10: else if T(q) = Char(σ, q′
                             ) then
11: if i < |w| and w[i] = σ then result ← MemoM
                                                   A,w(q
                                                         ′
                                                         , i + 1)
12: else result ← Failure
13: if result = Failure then M(q, i) ← Failure
14: return result ▷ result ̸= ⊥, as one can easily see
```
#### **3.2 Matching with Memoization Adapted to the Current Formalism**

In Algorithm 5, we present an adaptation of Algorithm 3 to our formalism, especially our defnition of NFA (§2.2) that ofers fne-grained handling of nondeterminism. Algorithm 5 has been adapted also to solve *partial* matching (it returns a matching position *i* ′ ) rather than total matching as in Algorithm 3 (cf. §2.3). Algorithm 5 serves as a basis towards our extensions to look-around and atomic grouping in Sections 4 to 5.

The adaptation is straightforward: Line 5 ensures that the algorithm solves partial matching; the rest is a natural adaptation of the for-loop of Algorithm 3 to our defnition of NFA (§2.2). The algorithm terminates thanks to Asm. 1. We note that the type of memoization tables does not have to be changed compared to Algorithm 3.

Algorithm 5 exhibits the same desired properties as Algorithm 3, namely correctness (with respect to Prob. 1) and linear-time complexity. We formally state these properties for the record; here, *M*<sup>0</sup> : *Q* × N *⇀* {Failure} is the initial memoization table (its entry is anywhere ⊥).

**Theorem 1 (linear-time complexity of Algorithm 5).** *For an NFA* A = (*Q, q*0*, F, T*)*, an input string w, and an position i* ∈ {0*, . . . ,* |*w*|}*,* Memo*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,w*(*q*0*, i*) *terminates with O*(|*w*|) *recursive calls.*

**Theorem 2 (correctness of Algorithm 5).** *For an NFA* A = (*Q, q*0*, F, T*)*, an input string w, and an position i* ∈ {0*, . . . ,* |*w*|}*,* Match<sup>A</sup>*,w*(*q*0*, i*) = Memo*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,w*(*q*0*, i*)*.*

The proofs can be found in [15, Appendix B.1]. Here is their outline.

Fig. 6: the at-NFA A(*a* ∗ (?>*a* ∗ )*ab*)

We frst introduce the notion of run for Match and Memo; it records *recursive calls* of the function itself, as well as *invocations* of the memoization table, together with their return values.

For linear time complexity (Thm. 1), we show that 1) a recursive call with the same argument (*q, i*) appears at most once in a run, and that 2) the number of invocations of the memoization table with the same key (*q, i*) is bounded by the (graph-theoretic) in-degree. Linear-time complexity then follows easily.

For correctness (Thm. 2), we introduce a conversion from runs of Memo to runs of Match. By showing that 1) the result is indeed a valid run of Match and 2) the conversion preserves return values, we show the coincidence of the return values of the two algorithms, i.e., correctness.

# **4 Memoization for Regexes with Look-around**

We describe our frst main technical contribution, namely a backtracking matching algorithm for la-NFAs with memoization (Algorithm 6). We prove that it is correct (Thm. 4) and that its time complexity is linear (*O*(|*w*|), Thm. 3).

The key ingredient of our algorithm is the type of memoization tables, where their range is extended from {Failure} to {Failure*,* Success}. We motivate this extension through two problematic algorithms MemoExit-la and MemoEnter-la; MemoExit-la is obtained by naively extending Algorithm 5 (Memo) with adding the processing of sub-automaton transitions with pla (positive look-ahead) done in Algorithm 2 (Lines 12 to 16), and MemoEnter-la is similar to MemoExit-la, but this records to the memoization table at the same timing as Algorithm 4 (DavisSLImpl). In particular, their memoization tables only record false.

The example below shows the problems with the two naive algorithms. Specifically, MemoExit-la is not linear and MemoEnter-la is not correct.

*Example 3.* Consider the la-NFA A = A(((?=*a* ∗ )*a*) ∗ ) = (*P, Q, q*0*, F, T*) shown in Figure 5, and let *w* = "*a <sup>n</sup>*" be an input string.

MemoExit-la*M*<sup>0</sup> <sup>A</sup>*,w*(*q*0*,* 0) invokes recursive calls *O*(|*w*| 2 ) times—in the same way as Match-(la*,* at)—because there are no matching failures in A′ that contribute to memoization.

We also see MemoEnter-la is not correct: Match-(la*,* at)A*,w*(*q*0*,* 0) returns SuccessAt(*n*), but MemoEnter-la*M*<sup>0</sup> <sup>A</sup>*,w*(*q*0*,* 0) returns SuccessAt(1) because *M*(*q*5*,* 1) = **false** is recorded during the frst loop and interpreted as a matching failure.

In Example 3, a natural solution to the non-linearity issues with MemoExit-la is to enrich memoization so that it also records previous successes of look-around. Furthermore, since matching positions do not matter in look-around, the type of memoization tables should be *M* : *P* × N *⇀* {Failure*,* Success}.

*Remark 4.* The work [10, Section IX-B] proposes an adaptation of their memoization algorithm to REZWA. Its description in [10, Section IX-B] (to "preserve the memo functions*. . .* "; see Remark 2) consists of the following two points:


The naive algorithm MemoExit-la we discussed above implements the frst point. We can further add the second point (that is essentially "memoization for sub-automaton matching") to MemoExit-la.

However, we fnd that this is not enough to ensure linear-time complexity. The problem is that the "memoization for sub-automaton matching" is used too infrequently. For example, in Example 3, the start positions of sub-automaton matching are diferent each time; thus, the memoized results are never used.

Our algorithm (Algorithm 6) resolves this problem by letting the memoization tables (for sub-automaton matching) record results not only for *starting* positions but also for non-starting positions.

We also note that there is a gap between the algorithm in the paper [10] and its prototype implementation [11]; see Remark 3. The latter is linear time but not always correct. For example, in Example 3, the correct result is SuccessAt(*n*), but the prototype [11] returns SuccessAt(1), similarly to MemoEnter-la.

Algorithm 6 is the matching algorithm for la-NFAs that we propose. It adopts the above extended type of *M*. In Line 18, Success is recorded in the memoization table when the matching succeeded. This function can return one of SuccessAt(*i* ′ ), Failure, and Success. We frst prove the following lemma to see that the algorithm indeed solves the partial matching problem (Prob. 2).

**Lemma 1.** *For a la-NFA* A = (*P, Q, q*0*, F, T*)*, an input string w, and a position i* ∈ {0*, . . . ,* |*w*|}*,* Memo-la*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,w*(*q*0*, i*) *returns either* SuccessAt(*i* ′ ) *for i* ′ ∈ {0*, . . . ,* |*w*|} *or* Failure *(it does not return* Success*).*

*Proof.* When we obtain Success as a return value, it must be via an entry *M*(*q, i*) of the memoization table. However, due to Asm. 2, when *M*(*q, i*) is set to Success for a state *q* of the top-level automaton of A, the matching is already fnished and returns SuccessAt(*i* ′ ). ⊓⊔


**Algorithm 6** our partial matching algorithm with memoization for la-NFAs

As a consequence of the lemma, we can further shrink the memoization tables in Algorithm 6 by not recording Success for *M*(*q, i*) where *q* is a state of the toplevel automaton.

Algorithm 6 exhibits the desired properties, namely correctness (with respect to Prob. 2) and linear-time complexity.

**Theorem 3 (linear-time complexity of Algorithm 6).** *For a la-NFA* A = (*P, Q, q*0*, F, T*)*, an input string w, and a position i* ∈ {0*, . . . ,* |*w*|}*,* Memo-la*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,w*(*q*0*, i*) *terminates with O*(|*w*|) *recursive calls.*

**Theorem 4 (correctness of Algorithm 6).** *For a la-NFA* A = (*P, Q, q*0*, F, T*)*, an input string w, and a position i* ∈ {0*, . . . ,* |*w*|}*,* Match-(la*,* at)(*q*0*, i*) = Memo-la*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,w*(*q*0*, i*)*.*

Thm. 3 and 4 can be shown similarly to Thm. 1 and 2; see [15, Appendix B.2]. The in-degree for sub-automata requires some additional care.

# **5 Memoization for Regexes with Atomic Grouping**

We describe our second main technical contribution, namely a backtracking matching algorithm for at-NFAs with memoization (Algorithm 7). We prove that it is correct (Thm. 6) and that its time complexity is linear (*O*(|*w*|), Thm. 5).

The key ingredient of our algorithm is the type of memoization tables, where their range is extended from {Failure} to {Failure(*j*) | *j* ∈ {0*, . . . , ν*(A0)}}; the latter records a *depth j* of atomic grouping in order to distinguish failures of different depths. We motivate this extension through two problematic algorithms

MemoExit-at and MemoEnter-at. Much like in §4, MemoExit-at naively extends Algorithm 5 (Memo) by adding the processing of sub-automaton transitions with at done in Algorithm 2 (Lines 17 to 21), and MemoEnter-la is similar to MemoExit-at, but records to the memoization table at the same timing as Algorithm 4 (DavisSLImpl).

Firstly, we observe that MemoExit-at is not linear for a reason similar to Example 3. (A concrete example is given by Example 4.) Therefore, we turn to the other candidate, namely MemoEnter-at.

We fnd, however, that MemoEnter-at is also problematic. It is not correct.

*Example 4.* Consider the at-NFA A = A(*a* ∗ (?>*a* ∗ )*ab*) = (*P, Q, q*0*, F, T*) shown in Figure 6, and let *w* = "*a <sup>n</sup>b*" be an input string. Match-(la*,* at)<sup>A</sup>*,w*(*q*0*,* 0) returns Failure—the atomic grouping (?>*a* ∗ ) consumes all *a*'s in *w* and no *a* is left for the fnal *ab* pattern—but MemoEnter-at*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,w*(*q*0*,* 0) returns SuccessAt(*n* + 1). Thus MemoEnter-at is wrong.

For both algorithms, the state *q*<sup>7</sup> in the at transition is frst reached at position *i* = *n*, and then backtracking is conducted, leading to the state *q*<sup>7</sup> again at *i* = *n* − 1. The execution of MemoEnter-at proceeds as follows.


The last example shows the challenge we are facing, namely the need of *distinguishing failures of diferent depths*. Specifcally, in the previous example, the memoized value *M*(*q*7*, n*) = **false** comes from the failure of matching for ambient A; still, it is used to control backtracking in the sub-automaton A′ . This fact is problematic in an atomic grouping where, roughly speaking, backtracking in an ambient automaton should not cause backtracking in a sub-automaton. Atomic grouping can be nested, so we must track at which depth failure has happened.

**Defnition 2 (nesting depth of atomic grouping).** *For an at-NFA* A = (*P, Q, q*0*, F, T*) *and a state q* ∈ *P, the* nesting depth of atomic grouping for *q, denoted by ν*A(*q*)*, is*

$$\nu\_{\mathcal{A}}(q) = \begin{cases} 0 & \text{if } q \in Q \\ 1 + \nu\_{\mathcal{A'}}(q) & \text{where } \mathcal{A'} = (P', Q', q\_0', F', T') \\ & \text{s.t. } T(q') = \mathsf{Sub}(\mathfrak{at}, \mathcal{A'}, q'') \text{ and } q \in P'. \end{cases}$$


**Algorithm 7** our partial matching algorithm with memoization for at-NFAs

*We also defne the* maximum nesting depth of atomic grouping for A*, denoted by ν*(A)*, as ν*(A) = max*q*∈*<sup>P</sup> ν*A(*q*)*.*

Algorithm 7 is our algorithm for at-NFAs; the type of its memoization tables is *M* : *P* × N *⇀* {Failure(*j*) | *j* ∈ {0*, . . . , ν*(A)}}. Some remarks are in order.

Note frst that the algorithm takes, as its parameters, the whole at-NFA A<sup>0</sup> and its sub-automaton A as the algorithm's current scope. The top-level call is made with A<sup>0</sup> = A (cf. Thm. 5 and 6); when an at transition is encountered, the scope goes to the corresponding sub-automaton (A′ in Line 17).

In Line 9, the **if** condition checks that the nesting depth of Failure is the depth of the current NFA, and backtracking is performed if and only if it is true. This approach is crucial for avoiding the error in Example 4. The rest of the cases for Eps*,* Branch*,* Char is similar to Algorithm 5.

The case for Sub (Lines 15–23) requires some explanation. It is an adaptation of Lines 17–21 of Algorithm 5 with memoization. The apparent complication comes from the set *K* in SuccessAt(*i* ′ *, K*). The set *K* is a set of *keys* for a memoization table *M*, that is, pairs (*q, i*) of a state and a position. The role of *K* is to collect the set of keys of *M* for which, once failure happens, the entry Failure(*j*) has to be recorded (this is done in a batch manner in Line 22). More specifcally, once failure happens in an outer automaton (i.e., at a smaller depth *j*), this has to be recorded as Failure(*j*) for inner automata (at greater depths). The set *K* collects those keys for which this has to be done, starting from inner automata (A′ , Line 18) and going to outer ones (A, Lines 19–20).

A closer inspection reveals that Line 20 is vacuous in Algorithm 7; however, it is needed when we combine it with look-around at the end of the section.

Algorithm 7 exhibits the desired properties, namely correctness (with respect to Prob. 2) and linear-time complexity. In Thm. 6, *f* is a function that converts results of Algorithm 7 to results of Algorithm 2; it is defned by *f*(Failure(*j*)) = Failure and *f*(SuccessAt(*i* ′ *, K*)) = SuccessAt(*i* ′ ).

**Theorem 5 (linear-time complexity of Algorithm 7).** *For an at-NFA* A = (*P, Q, q*0*, F, T*)*, an input string w, and an position i* ∈ {0*, . . . ,* |*w*|}*,* Memo-at*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,*A*,w*(*q*0*, i*) *terminates with O*(|*w*|) *recursive calls.*

**Theorem 6 (correctness of Algorithm 7).** *For an at-NFA* A = (*P, Q, q*0*, F, T*)*, an input string w, and an position i* ∈ {0*, . . . ,* |*w*|}*,* Match-(la*,* at)(*q*0*, i*) = *f*(Memo-at*<sup>M</sup>*<sup>0</sup> <sup>A</sup>*,*A*,w*(*q*0*, i*))*.*

Thm. 5 and 6 are proved similarly to Thm. 1 and 2; see [15, Appendix B.3]. The following points require some extra care.

Firstly, for linear-time complexity (Thm. 5), there is another recursive call (Line 19) before the return value of a recursive call (Line 17) is memoized (Line 22). If the second recursive call (Line 19) eventually leads to (the same call as) the frst call (Line 17) (let's call this event (∗)), then this can nullify the efect of memoization. We prove, as a lemma, that (∗) never happens.

Secondly, for correctness (Thm. 6), our conversion of runs should replace an invocation of the memoization table—if it returns a failure with a shallower depth—with not only the corresponding run (as before) but also the run of the second recursive call (Line 19). See [15, Appendix B.3] for details.

**Combination with Look-around** It is also possible to combine with Algorithm 6 (for look-around) and Algorithm 7 (for atomic grouping). In this case, the type of memoization tables becomes *M* : *P* × N *⇀* {Failure(*j*) | *j* ∈ {0*, . . . , ν*(A)}} ∪ {Success} and nesting depths of the atomic group are reset by look-around operators. A complete algorithm can be found in [15, Appendix C]; it also exhibits the desired properties.

# **6 Experiments and Evaluation**

**Implementation** We implemented the algorithm proposed in this paper for evaluation. We call our implementation memo-regex. It is written in 1368 lines of Scala.

Table 2: our benchmark regexes and input strings


memo-regex supports both look-around (i.e., look-ahead and look-behind) and atomic grouping. We implemented a regex parser ourselves. Backtracking is implemented by managing a stack manually rather than using a recursive function to prevent stack overfow. In this case, the memoization keys are pushed onto the stack. Recoding these keys in a memoization table is done during backtracking. We used the mutable HashMap from the Scala standard library as a data structure for memoization tables.

memo-regex also supports capturing sub-matchings. However, this feature cannot be used within atomic grouping and positive look-around because submatching information is lost for memoization.

The code of memo-regex, as well as all experiment scripts, is available [14].

**Efciency of Our Algorithm** We conducted experiments to assess the performance of our memo-regex, in particular in comparison with other existing implementations.

As target regexes, we looked for those with look-around and/or atomic grouping in the real-world regexes posted on regexlib.com. We then identifed—by manual inspection—four regexes *r*1*, . . . , r*<sup>4</sup> that are subject to potential catastrophic backtracking. These regexes are shown in Table 2. We then crafted input strings *w*1*, . . . , w*4, respectively, so that they cause catastrophic backtracking. Specifcally, *r*<sup>1</sup> contains positive look-ahead and negative look-ahead; this positive look-ahead is used for restricting the length of input strings. The regexes *r*<sup>2</sup> and *r*<sup>3</sup> are themselves positive look-ahead and look-behind, respectively; both include negative look-ahead, too. The regex *r*<sup>4</sup> includes atomic grouping and negative look-ahead.

For these regexes, we measured matching time using memo-regex on Open-JDK 21.0.1. We compared it with the following implementations: Node.js 20.5.0,

Fig. 8: matching time for *r*2*, r*<sup>3</sup> and *r*<sup>4</sup>

Ruby 3.1.4, and PCRE2 10.42 (used by PHP 8.3.1, w/ or w/o JIT). All of these implementations use backtracking; Ruby and PCRE2 have restrictions on regexes inside look-behind and Node.js does not support atomic grouping. The experiments were performed 10 times and the average was adopted. Furthermore, for memo-regex, we measured the size of its memoization table by the memory usage, using jamm.<sup>8</sup> The experiments were conducted on MacBook Pro 2021 (Apple M1 Pro, RAM: 32 GB).

We show the results in Figures 7 and 8. Note that the values of *n* are diferent depending on whether the matching time complexity is *O*(*n* 2 ) or *O*(2*<sup>n</sup>*). Results for some implementations are absent for *r*<sup>3</sup> and *r*<sup>4</sup> because of the syntactic restrictions discussed above.

In Figures 7 and 8, we observe clear performance advantages of memo-regex. In particular, its linear-time complexity and linear memory consumption (memoization table size) are experimentally confrmed.

**Real-world Usage of Look-around and Atomic Grouping** We additionally surveyed the use of the regex extensions of our interest, in order to confrm their practical relevance.

We used a regex dataset collected by a 2019 survey [9]. This dataset contains 537,806 regexes collected from the source code of real-world products.

<sup>8</sup> https://github.com/jbellis/jamm

We tallied the usage of each regex extension by parsing these regexes in the dataset with our parser in memo-regex. 8,679 regexes could not be parsed or compiled; this is due to back-reference for 4,360 regexes, unsupported syntax (Unicode character class, conditional branching, etc.) for 4,134 regexes, and too large or semantically invalid regexes for the other 184 regexes. We adopted the remaining 529,127 regexes for tallying.

The result is shown in Table 3. Note that 1) the numbers for look-ahead and look-behind do not include simple zero-width assertions such as ˆ (line-begin) or \$ (line-end), and 2) that of atomic grouping includes possessive quantifers such as \*+ and ++.

In Table 3, we observe that 17,167 regexes (3.2%) in the dataset use at least one of the extensions we studied in this paper. While the ratio is not very large, the absolute numTable 3: regex ext. usage


ber (17,167 regexes) is signifcant; this implies that there are a number of applications (such as web services) that rely on the regex extensions. Thereby we confrm the practical relevance of these regex extensions.

# **7 Conclusions and Future Work**

In this paper, we proposed a backtracking algorithm with memoization for regexes with look-around and atomic grouping. It is the frst linear-time backtracking matching algorithm for such regexes. It also fxs problems of the memoization matching algorithm in [10] for look-ahead. We implemented the algorithm; our experimental evaluation confrms its performance advantage.

One direction of future work is to support more extensions. Our implementation does not support a widely used regex extension, namely back-references. In the recent work [10], back-reference was supported by additionally recording captured positions in memoization tables. We expect that a similar idea is applicable to our algorithm.

Combination with selective memoization (used in [10]; see [15, Appendix A]) is another direction. We believe it is possible, but it will require a more detailed discussion on how to handle sub-automata in the selective memoization schema.

# **Acknowledgments**

Thanks are due to Konstantinos Mamouras for pointing to [22] after the dissemination of the preprint version of this paper.

# **Data-Availability Statement**

The data that support the fndings of this study are openly available in Zenodo at 10.5281/zenodo.10458317, reference number [14].

# **References**


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Verification**

# **A Denotational Approach to Release/Acquire Concurrency**

Yotam Dvir1(B) , Ohad Kammar<sup>2</sup> , and Ori Lahav<sup>1</sup>

<sup>1</sup> Tel Aviv University, Tel Aviv, Israel yotamdvir@mail.tau.ac.il, orilahav@tau.ac.il <sup>2</sup> University of Edinburgh, Edinburgh, UK ohad.kammar@ed.ac.uk

**Abstract.** We present a compositional denotational semantics for a functional language with frst-class parallel composition and shared-memory operations whose operational semantics follows the Release/Acquire weak memory model (RA). The semantics is formulated in Moggi's monadic approach, and is based on Brookes-style traces. To do so we adapt Brookes's traces to Kang et al.'s view-based machine for RA, and supplement Brookes's mumble and stutter closure operations with additional operations, specifc to RA. The latter provides a more nuanced understanding of traces that uncouples them from operational interrupted executions. We show that our denotational semantics is adequate and use it to validate various program transformations of interest. This is the frst work to put weak memory models on the same footing as many other programming efects in Moggi's standard monadic approach.

**Keywords:** Weak memory models · Release/Acquire · Shared state · Shared memory · Concurrency · Denotational semantics · Monads · Program refnement · Program equivalence · Compiler optimizations

# **1 Introduction**

Denotational semantics defnes the meaning of programs *compositionally*, where the meaning of a program term is a function of the meanings assigned to its immediate syntactic constituents. This key feature makes denotational semantics instrumental in understanding the meaning a piece of code independently of the context under which the code will run. This style of semantics contrasts with standard operational semantics, which only executes closed/whole programs. A basic requirement of such a denotation function <sup>J</sup>*−*<sup>K</sup> is for it to be *adequate* w.r.t. a given operational semantics: plugging program terms *M* and *N* with equal denotations—i.e. <sup>J</sup>*M*<sup>K</sup> <sup>=</sup> <sup>J</sup>*N*K—into some program context Ξ [*−*] that closes over their variables, results in observationally indistinguishable closed programs in the given operational semantics. Moreover, assuming that denotations have a defned order (*≤*), a "directed" version of adequacy ensures that <sup>J</sup>*M*<sup>K</sup> *<sup>≤</sup>* <sup>J</sup>*N*<sup>K</sup> implies that all behaviors exhibited by Ξ [*M*] under the operational semantics are also exhibited by Ξ [*N*].

For shared-memory concurrent programming, Brookes's seminal work [13] defned a denotational semantics, where the denotation <sup>J</sup>*M*<sup>K</sup> is a set of totally ordered traces of *M* closed under certain operations, called stutter and mumble. Traces consist of sequences of memory snapshots that *M* guarantees to provide while relying on its environment to make other memory snapshots. Brookes [12] used the insights behind this semantics to develop a semantic model for separation logic, and Turon and Wand [46] used them to design a separation logic for refnement. Additionally, Xu et al. [48] used traces as a foundation for the Rely/Guarantee approach for verifcation of concurrent programs, and Liang et al., Liang et al. [34, 35] used a trace-based program logic for refnement.

A *memory model* decides what outcomes are possible from the execution of a program. Brookes established the adequacy of the trace-based denotational semantics w.r.t. the operational semantics of the strongest model, known as *sequential consistency* (SC), where every memory access happens instantaneously and immediately afects all concurrent threads. However, SC is too strong to model real-world shared memory, whether it be of modern hardware, such as x86-TSO [40, 44] and ARM, or of programming languages such as C/C++ and Java [4, 37]. These runtimes follow *weak memory models* that allow performant implementations, but admit more behaviors than SC.

Do weak memory models admit adequate Brookes-style denotational semantics? This question has been answered affrmatively once, by Jagadeesan et al. [25], who closely followed Brookes to defne denotational semantics for x86-TSO. Other weak memory models, in particular, models of *programming languages*, and *non-multi-copy-atomic* models, where writes can be observed by diferent threads in diferent orders, have so far been out of reach of Brookes's totally ordered traces, and were only captured by much more sophisticated models based on *partial orders* [15, 19, 24, 26, 28, 41].

In this paper we target the Release/Acquire memory model (RA, for short). This model, obtained by restricting the C/C++11 memory model to Release/ Acquire atomics, is a well-studied fundamental memory model weaker than x86- TSO, which, roughly speaking, ensures "causal consistency" together with "perlocation-SC" and "RMW (read-modify-write) atomicity" [29, 30]. These assurances make RA suffciently strong for implementing common synchronization idioms. RA allows more performant implementations than SC, since, in particular, it allows the reordering of a write followed by a read from a diferent location, which is commonly performed by hardware, and it is non-multi-copyatomic, thus allowing less centralized architectures like POWER [45].

Our frst contribution is a Brookes-style denotational semantics for RA. As Brookes's traces are totally ordered, this result may seem counterintuitive. The standard semantics for RA is a declarative (a.k.a. axiomatic) memory model, in the form of acyclicity consistency constraints over partially ordered candidate execution graphs. Since these graphs are not totally ordered, one might expect that Brookes's traces are insuffcient. Nevertheless, our frst key observation is that an *operational* presentation of RA as an interleaving semantics of a weak memory system lends itself to Brookes-style semantics. For that matter, we develop a notion of traces compatible with Kang et al.'s "view-based" machine [27], an operational semantics that is equivalent to RA's declarative formulation. Our main technical result is the (directed) adequacy of the proposed Brookes-style semantics w.r.t. that operational semantics of RA.

A main challenge when developing a denotational semantics lies in making it suffciently abstract. While *full* abstraction is often out of reach, as a yardstick, we want our semantics to be able to justify various compiler transformations/optimizations that are known to be sound under RA [47]. Indeed, an immediate practical application of a denotational semantics is the ability to provide *local* formal justifcations of program transformations, such as those performed by optimizing compilers. In this setting, to show that an optimization *N* ↠ *M* is valid amounts to showing that replacing *N* by *M* anywhere in a larger program does not introduce new behaviors, which follows from <sup>J</sup>*M*<sup>K</sup> *<sup>≤</sup>* <sup>J</sup>*N*<sup>K</sup> given a directionally adequate denotation function <sup>J</sup>*−*K.

To support various compiler transformations, we close our denotations under certain operations, including analogs to Brookes's stutter and mumble, but also several RA-specifc operations, that allow us to relate programs which would naively correspond to rather diferent sets of traces. Given these closure operations, our semantics validates standard program transformations, including structural transformations, algebraic laws of parallel programming, and all known thread-local RA-valid compiler optimizations. Thus, the denotational semantics is instrumental in formally establishing validity of transformations under RA, which is a non-trivial task [19, 47].

Our second contribution is to connect the core semantics of parallel programming languages exhibiting weak behaviors to the more standard semantic account for programming languages with efects. Brookes presented his semantics for a simple imperative WHILE language, but Benton et al., Dvir et al. [6, 20] later recast it atop Moggi's monad-based approach [38] which uses a functional, higher-order core language. In this approach the core language is modularly extended with efect constructs to denote program efects. In particular, we defne parallel composition as a frst-class operator. This is in contrast to most of the research of weak memory models that employ imperative languages and assume a single top-level parallel composition.

A denotational semantics given in this monadic style comes ready-made with a rich semantic toolkit for program denotation [7], transformations [5, 8–10, 23], reasoning [2, 36], etc.. We challenge and reuse this diverse toolkit throughout the development. We follow a standard approach and develop specialized logical relations to establish the compositionality property of our proposed semantics; its soundness, which allows one to use the denotational semantics to show that certain outcomes are impossible under RA; and adequacy. This development puts weak memory models, which often require bespoke and highly specialized presentations, on a similar footing to many other programming efects.

*Outline.* In §2 we lay the groundwork for the rest of the paper by introducing the programming language that we will use (§2.1), the main ideas that underpin Brookes's trace-based denotational semantics (§2.2), and the operational RA model (§2.3). In §3 we present the core aspects of our denotational semantics. First, we discuss our extension of RA's operations semantics with frst-class parallelism, which enables denotations to be defned for concurrent composition (§3.1). We then present RA traces (§3.2) and use them to defne the denotations of key program constructs (§3.3). Next, we show how the restriction of traces within denotations (§3.4) and the addition of closure operations (§3.5) make our denotational semantics more abstract. The denotational semantics extends to the entire programming language standardly using Moggi's monad-based approach (§3.6). With the denotational semantics in place, we present our main results in §4. Finally, we conclude and discuss related work in §5. More details are available in the extended version of this paper [21].

# **2 Preliminaries**

We frst introduce the language and its operational semantics under the Sequential Consistency (SC) memory model (§2.1). We then outline Brookes's denotational semantics for SC (§2.2). Finally, we introduce Kang et al.'s operational presentation of Release/Acquire (RA) (§2.3).

#### **2.1 Language and Operational Semantics**

The programming language we use is an extension of a functional language with shared-state constructs. Program terms *M* and *N* can be composed sequentially explicitly as *M***;***N* or implicitly by left-to-right evaluation in the pairing construct *hM, Ni*. They can be composed in parallel as *M ∥ N*. We assume preemptive scheduling, thus imposing no restrictions on the interleaving execution steps between parallel threads. To introduce the memory-access constructs, we present the well-known *message passing* litmus test, adapted to the functional setting:

$$(\mathbf{x} := 1 \; ; \mathbf{y} := 1) \parallel \langle \mathbf{y} ?, \mathbf{x} ? \rangle \tag{\text{MP}}$$

Here, x and y refer to distinct shared memory locations. Assignment *ℓ***:=***v* stores the value *v* at location *ℓ* in memory, and dereference *ℓ***?** loads a value from *ℓ*. The language also includes atomic read-modify-write (RMW) constructs. For example, assuming integer storable values, FAA (*ℓ, v*) (Fetch-And-Add) atomically adds *v* to the value stored in *ℓ*. In contrast, interleaving is permitted between the dereferencing, adding, and storing in *ℓ* **:=** (*ℓ***?** + *v*). The underlying *memory model* dictates the behavior of the memory-access constructs more specifcally.

In the functional setting, execution results in a returned value: *ℓ* **:=***v* returns the unit value *hi*, i.e. the empty tuple; *ℓ***?**, and the RMW constructs such as FAA (*ℓ, v*), return the loaded value; *M* **;** *N* returns what *N* returns; and *hM, Ni*, as well as *M ∥ N*, return the pair consisting of the return value of *M* and the return value of *N*. We assume left-to-right execution of pairs, so in the (MP) example *h*y**?***,* x**?***i* steps to *hv,* x**?***i* for a value *v* that can be loaded from y, and *hv,* x**?***i* steps to *hv, wi* for a value *w* that can be loaded from x. In between, the left side of the parallel composition (*∥*) can take steps.

We can use intermediate results in subsequent computations via let binding: let *a* = *M* in *N* binds the result of *M* to *a* in *N*. Thus, we execute *M* frst, and substitute the resulting value *V* for *a* in *N* before executing *N*[*a 7→ V* ]. Similarly, we deconstruct pairs by matching: match *M* with *ha, bi. N* binds the components of the pair that *M* returns to *a* and *b* respectively in *N*. The frst and second projections fst and snd, as well as the operation swap that swaps the pair constituents, are defned using match standardly.

*Sequential consistency.* In the strongest memory model of Sequential Consistency (SC), every value stored is immediately made available to every thread, and every dereference must load the latest stored value. Thus the underlying memory model uses maps from locations to values for the memory state that evolves during program execution. Given an initial state, the behavior of a program in SC depends only on the choice of interleaving of steps. Though any such map can serve as an initial state, litmus tests are traditionally designed with the memory that sets all values to 0 in mind. In (MP) the order of the two stores and the two loads ensures that executions under SC may return *hhi,h*0*,* 0*ii*, *hhi,h*0*,* 1*ii*, and *hhi,h*1*,* 1*ii*, *but not hhi,h*1*,* 0*ii*.

*Observations.* An *observable behavior* of an entire program is a value it may evaluate to from given initial memory values. While programs may internally interact and observe the memory, we do not consider it feasible to observe the memory directly.

#### **2.2 Overview of Brookes's Trace-based Semantics**

Observable behavior as defned for whole programs is too crude to study program *terms* that can interact with the program context within which they run. Indeed, compare *M*<sup>1</sup> defned as x **:=** 1 **;** y **:=** 1 **;** y**?** versus *M*<sup>2</sup> defned as x **:=** 1 **;** y **:=** x**? ;** y**?**. Under SC, the diference between them as whole programs is unobservable: starting from any initial state both return 1. Now consider them within the program context *− ∥* x **:=** 2. That is, compare *M*<sup>1</sup> *∥* x **:=** 2 versus *M*<sup>2</sup> *∥* x **:=** 2. In the frst, *M*<sup>1</sup> still always returns 1; but in the second, *M*<sup>2</sup> can also return 2 by interleaving the store of 2 in x immediately after the store of <sup>1</sup> in <sup>x</sup>. Thus, if <sup>J</sup>*M*K, i.e. *<sup>M</sup>*'s denotation, were to simply map initial states to possible results according to executions of *M*, we could not defne <sup>J</sup>*<sup>M</sup> <sup>∥</sup> <sup>N</sup>*<sup>K</sup> in terms of <sup>J</sup>*M*<sup>K</sup> and <sup>J</sup>*N*<sup>K</sup> alone, because we would have <sup>J</sup>*M*<sup>1</sup><sup>K</sup> <sup>=</sup> <sup>J</sup>*M*<sup>2</sup><sup>K</sup> but also <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup>* <sup>x</sup> **:=** <sup>2</sup><sup>K</sup> *<sup>6</sup>*<sup>=</sup> <sup>J</sup>*M*<sup>2</sup> *<sup>∥</sup>* <sup>x</sup> **:=** <sup>2</sup>K. We conclude that <sup>J</sup>*M*<sup>K</sup> must contain more information on *M* than an "input-output" relation; it must account for interference by the environment.

*Adequacy in SC.* A prominent approach to defne compositional semantics for concurrent programs is due to Brookes [13], who defned a denotational semantics for SC by taking <sup>J</sup>*M*<sup>K</sup> to be a set of traces of *<sup>M</sup>* closed under certain rewrite rules as we detail below. Brookes established a (directional) adequacy theorem: if <sup>J</sup>*M*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*N*<sup>K</sup> then the transformation *<sup>M</sup>* <sup>↠</sup> *<sup>N</sup>* is valid under SC. The latter means that, when assuming SC-based operational semantics, *M* can be replaced by *N* within a program without introducing new observable behaviors for it. Thus, adequacy formally grounds the intuition that the denotational semantics soundly captures behavior of program terms.

As a particular practical beneft, formal and informal simulation arguments which are used to justify transformations in operational semantics can be replaced by cleaner and simpler proofs based on the denotational semantics. For example, a simple argument shows that <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> holds in Brookes's semantics. Thanks to adequacy, this justifes Write-Write Elimination (WW-Elim) x **:=** *v* **;** x **:=** *w* ↠ x **:=** *w* in SC.

*Traces in SC.* In Brookes's semantics, a program term is denoted by the set of traces, each trace consisting of a sequence of transitions. Each transition is of the form *hµ, ρi*, where *µ* and *ρ* are memories, i.e. maps from locations to values. A transition describes a program term's execution relying on a memory state *µ* in order to guarantee the memory state *ρ*.

For example, <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> includes all traces of the form *<sup>h</sup>ρ, ρ* [<sup>x</sup> := *<sup>w</sup>*]*i*, where *ρ* [x := *w*] is equal to *ρ* except for mapping x to *w*. The defnition is compositional: the traces in <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> are obtained from sequential compositions of traces from <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>*<sup>K</sup> with traces from <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*K, obtaining all traces of the form *hµ, µ* [x := *v*]*i hρ, ρ* [x := *w*]*i*. Such a trace relies on *µ* in order to guarantee *µ* [x := *v*], and then relies on *ρ* in order to guarantee *ρ* [x := *w*]. Allowing *ρ 6*= *µ* [x := *v*] refects the possibility of environment interference between the two store instructions. Indeed, when denoting parallel composition <sup>J</sup>*<sup>M</sup> <sup>∥</sup> <sup>N</sup>*<sup>K</sup> we include all traces obtained by interleaving transitions from a trace from <sup>J</sup>*M*<sup>K</sup> with transitions from a trace from <sup>J</sup>*N*K. By sequencing and interleaving, one subterm's guarantee can fulfll the requirement which another subterm relies on. They may also relegate reliances and guarantees to their mutual context.

In the functional setting, executions not only modify the state but also return values. In this setting, traces are pairs, which we write as *ξ* ∴ *r*, where *ξ* is the sequence of transitions and *r* represents the fnal value that the program term guarantees to return [6]. For example, the semantics of dereference <sup>J</sup>x**?**<sup>K</sup> includes all traces of the form *hµ, µi* ∴*µ*(x). Indeed, the execution of x**?** does not change the memory and returns the value loaded from x. In the semantics of assignment <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>*K, instead of *<sup>h</sup>µ, µ* [<sup>x</sup> := *<sup>v</sup>*]*<sup>i</sup>* we have *<sup>h</sup>µ, µ* [<sup>x</sup> := *<sup>v</sup>*]*<sup>i</sup>* <sup>∴</sup> *hi*.

*Rewrite rules in SC.* Were denotations in Brookes's semantics defned to *only* include the traces explicitly mentioned above, it would not be abstract enough to justify (WW-Elim), which eliminates redundant writes. Indeed, we only saw traces with two transitions in <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*K, but in <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> we saw traces with one. The semantics would still be adequate, but it would lack abstraction. This is where Brookes's second main idea comes into play, making the denotations more abstract by closing them under two operations that rewrite traces:


term can always omit a guarantee to the environment, and rely on its own omitted guarantee instead of relying on the environment.

Denotations in Brookes's semantics are defned to be sets of traces *closed* under rewrite rules: applying a rewrite to a trace in the set results in a trace that is also in the set. For example, <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> is the least closed set with all traces of the form *<sup>h</sup>ρ, ρ* [<sup>x</sup> := *<sup>w</sup>*]*<sup>i</sup>* <sup>∴</sup>*hi*, and <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> is the least closed set with all sequential compositions of traces from <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>*<sup>K</sup> with traces from <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*K.

Closure under these rules makes traces in <sup>J</sup>*M*<sup>K</sup> correspond precisely to *interrupted executions* of *M*, which are executions of *M* in which the memory can arbitrarily change between steps of execution. Each transition *hµ, ρi* in a trace in <sup>J</sup>*M*<sup>K</sup> corresponds to multiple execution steps of *<sup>M</sup>* that transition *<sup>µ</sup>* into *<sup>ρ</sup>*, and each gap between transitions accounts for possible environment interruption. The rewrite rules maintain this correspondence: stutter corresponds to taking 0 steps, and mumble corresponds to taking *n* + *m* steps instead of taking *n* steps and then *m* steps when the environment did not change the memory in between. Brookes's adequacy proof is based on this precise correspondence. In particular, the single-pair traces in <sup>J</sup>*M*<sup>K</sup> correspond to the (uninterrupted) executions, the "input-output" relation, of *M*.

*Abstraction in SC.* Brookes's semantics is *fully abstract*, meaning that the converse to adequacy also holds: if *<sup>N</sup>* <sup>↠</sup> *<sup>M</sup>* is valid under SC, then <sup>J</sup>*N*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*M*K. However, Brookes's proof relies on an artifcial program construct, await, that permits waiting for a specifed memory snapshot and then step (atomically) to a second specifed memory snapshot. Thus, in realistic languages, when this construct is unavailable, Brookes's full abstraction proof does not apply.

Nevertheless, even without full abstraction, one can still provide evidence that an adequate semantics is abstract by ensuring that it supports known transformations. As an example, we show directly that <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> holds in Brookes's semantics. Since <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> is closed, it suffces to show that <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> *<sup>⊇</sup> hµ, µ* [x := *w*]*i* ∴ *hi*  memory *µ* . For a memory *µ*, we have *<sup>h</sup>µ, µ* [<sup>x</sup> := *<sup>v</sup>*]*i hρ, ρ* [<sup>x</sup> := *<sup>w</sup>*]*<sup>i</sup>* <sup>∴</sup>*hi ∈* <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> for every memory *<sup>ρ</sup>*, in particular when *ρ* = *µ* [x := *v*]. Since *ρ* [x := *w*] = *µ* [x := *v*] [x := *w*] = *µ* [x := *w*], we have *<sup>h</sup>µ, µ* [<sup>x</sup> := *<sup>v</sup>*]*i h<sup>µ</sup>* [<sup>x</sup> := *<sup>v</sup>*]*, µ* [<sup>x</sup> := *<sup>w</sup>*]*<sup>i</sup>* <sup>∴</sup> *hi ∈* <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*K. After applying mumble, we have *<sup>h</sup>µ, µ* [<sup>x</sup> := *<sup>w</sup>*]*<sup>i</sup>* <sup>∴</sup> *hi ∈* <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*K.

#### **2.3 Overview of Release/Acquire Operational Semantics**

Memory accesses in RA are more subtle in than in SC. To address this we adopt Kang et al.'s "view-based" machine [27], an operational presentation of RA proven to be equivalent to the original declarative formulation of RA [e.g. 30]. In this model, rather than the memory holding only the latest value written to every variable, the memory accumulates a set of memory update messages for each location. Each thread maintains its own *view* that captures which messages the thread can observe, and is used to constrain the messages that the thread

**Fig. 1.** Illustrations of a memory (top) and a trace (bottom), in the setting of two memory locations, x and y. **Top:** A memory holding six messages. The timelines are purposefully misaligned and not to scale to emphasize that timestamps for diferent locations are incomparable and that only the order between them is relevant. The graph structure that the views impose is illustrated by arrows pointing between messages. Messages that are not dovetailed are set apart, e.g. *ν*<sup>3</sup> dovetails with *ν*2, which does not dovetail with *ν*1. **Bottom:** A trace with two transitions: *α hµ*1*, ρ*1*i hµ*2*, ρ*2*i ω* ∴ 5. The memory illustrated on top is *ρ*2. Messages and edges that are not part of a previous memory are highlighted. The local messages are *ν*<sup>2</sup> and *ν*3, and the rest are environment messages.

may read and write. The messages in the memory carry views as well, which are inherited from the thread that wrote the message, and passed to any thread that reads the message. Thus views indirectly maintain a causal relationship between messages in memory throughout the evolution of the system.

More concretely, causality is enforced by timestamping messages, thus placing them on their location's *timeline*. To capture the atomicity of RMWs, each message occupies a half-open segment (*q, t*] on their location's timeline, where *t* is the message's timestamp. It *dovetails* with a message at the same location with timestamp *q*. An RMW "modifes" a message by dovetailing with it.

A view *κ* associates a timestamp *κ*(*ℓ*) to each location *ℓ*, obscuring the portion of *ℓ*'s timeline before *κ*(*ℓ*). The view *points to* a message at *ℓ* with timestamp *κ*(*ℓ*). A view *ω dominates* a view *α*, written *α ≤ ω*, if *α*(*ℓ*) *≤ ω*(*ℓ*) for every *ℓ*.

Messages point to messages via the view they carry, and must point to themselves. So when specifying a message, the value its view takes at its location may be omitted. For example, assuming of two location, x and y, we denote by x:1@(*.*5*,* **1***.***7**] ⟪y@3*.*5⟫ the message at location x that carries the value 1, occupies the segment (*.*5*,* 1*.*7] on x's timeline, and carries the view *κ* such that *κ*(x) = 1*.*7 and *κ*(y) = 3*.*5. An example memory is depicted on the top of Figure 1.

When a thread writes to *ℓ*, it must increase the timestamp its view associates with *ℓ* and use its new view as the message's view. The message's segment must not overlap with any other segment on *ℓ*'s timeline. In particular, only one message can ever dovetail with a given message. A thread can only read from

**Fig. 2.** Depictions of a step during an execution of a litmus test, with the view of the right thread changing from *σ* to *σ ′* . The value each message carries is in its bottom-right corner. Views are illustrated implicitly in the graph structure that they impose. Obscured messages are faded. **Left:** As the right thread in (MP) loads 1 from y, it inherits the view of *ϵ*1, obscuring *ν*0. **Right:** The right thread in (SB) loading 0 from x. Storing *ϵ*<sup>1</sup> did not obscure *ν*0.

revealed messages, and when it reads, its view increases as needed to dominate the view of the loaded message. This may obscure messages at other locations.

Revisiting the (MP) litmus test, starting with a memory with a single message holding 0 at each location, and with all views pointing to the timestamps of these message, suppose the right thread loaded 1 from y, as depicted on the left side of Figure 2. Such a message can only be available if the left thread stored it. Before storing 1 to y, the left thread stored 1 to x, obscuring the initial x message. The right thread inherits this limitation through the causal relationship, so it will not be able to load 0 from x. Therefore, RA forbids the outcome *hhi,h*1*,* 0*ii*.

In contrast, consider the litmus test known as *store bufering*:

$$(\mathbf{x} := \mathbf{1} ; \mathbf{y} ?) \parallel (\mathbf{y} := \mathbf{1} ; \mathbf{x} ?) \tag{SB}$$

By considering the possible interleavings, one can check that no execution in SC returns *h*0*,* 0*i*. However, in RA some do. Indeed, even if the left thread stores to x before the right thread loads from x, the right thread's view allows it to load 0, as depicted on the right side of Figure 2.

We can recover the SC behavior by interspersing fences between sequenced memory accesses, which we model with FAA (z*,* 0) to a fresh location z. Thus, compare (SB) to the *store bufering with fences* litmus test:

$$\left(\mathbf{x} := 1; \text{FAA}\left(\mathbf{z}, 0\right); \mathbf{y}?\right) \parallel \left(\mathbf{y} := 1; \text{FAA}\left(\mathbf{z}, 0\right); \mathbf{x}?\right) \tag{\text{SB} + \text{F}}$$

Both of the FAA (z*,* 0) instructions store messages that must dovetail with the message that they load from, and in that also inherit its view. They cannot both dovetail with the same message because their segments cannot intersect. Thus, one of them—say, the one on the right—will have to dovetail with the other. In this scenario, the view of the message that the left thread stores at z points to the message it previously stored at x. When the right thread loads the message from z it inherits this view, obscuring the initial message to x. Therefore, when it later loads from x, it must load what the left thread stored. Thus, like in SC, no execution in RA returns *h*0*,* 0*i*.

# **3 Denotational Semantics for Release/Acquire**

We start this section by explaining how we support frst-class concurrent composition (*∥*) in the operational semantics of Release/Acquire (§3.1). In the rest of the section we present the core of our denotational semantics. First, we present our notion of a trace, adapted to RA, along with four basic rewrite rules that our denotations are closed under (§3.2). Next, we defne the denotations of the key program constructs (§3.3). We then present further aspects of the denotational semantics that make it more abstract: restrictions that traces in denotations must uphold (§3.4), and three more rewrite rules under which denotations are closed (§3.5). For completeness, we show how to give denotations to the whole language standardly, using Moggi's approach (§3.6).

#### **3.1 First-class Concurrent Composition**

Kang et al. presentation assumes top-level parallelism, a common practice in studies of weak-memory models. This comes at the cost of the uniformity and compositionality. In particular, the denotation <sup>J</sup>*<sup>M</sup> <sup>∥</sup> <sup>N</sup>*<sup>K</sup> cannot be defned. We resolve this by extending Kang et al.'s operational semantics to support frst-class parallelism by organizing thread views in an evolving *view-tree*, a binary tree with view-labelled leaves, rather than in a fxed fat mapping. Thus, *states* that accompany executing terms consist of a memory and a view-tree. In discourse, we do not distinguish between a view-leaf and its label.

An *initial state* consists of a memory with a single message at each location, and a view which points to these messages' timestamps. The example below shows how threads inherit their parent's view upon activation and combine their views as they synchronize:

Example. In the following, ⇝ is the execution step relation, ⇝*<sup>∗</sup>* is its refexivetransitive closure, *µ*<sup>0</sup> is an initial memory, *κ*˙ is the *κ*-labelled view-leaf, *T* b *R* is the view-tree that consists of a node connected to the view-trees *T* and *R*, and *ω* is the least view that dominates both *ω*<sup>1</sup> and *ω*2:

$$\begin{aligned} \left< \mu\_0, \dot{\alpha} \right>, M \left; \left( N\_1 \parallel N\_2 \right) \leadsto^\* \left< \mu\_1, \dot{\alpha}' \right>, N\_1 \parallel N\_2 \leadsto \left< \mu\_1, \dot{\alpha}' \right>, N\_1 \parallel N\_2 \\ \leadsto^\* \left< \rho, \dot{\omega}\_1 \stackrel{\frown}{\ }{\dot{\omega}\_2} \right>, V\_1 \parallel V\_2 \leadsto \left< \rho, \dot{\omega} \right>, \left< V\_1, V\_2 \right> \end{aligned}$$

First, *M* runs until it returns a value, which is discarded by the sequencing construct. Next, the parallel composition *N*<sup>1</sup> *∥ N*<sup>2</sup> activates. The threads then interleave executions, each with its associated side of the view-tree. Finally, once both threads return a value, they synchronize.

Handling parallel composition as a frst-class construct allows us to decompose Write-Read Reordering (WR-Reord) (x **:=** *v*) **;** y**?** ↠ fst *h*y**?***,*(x **:=** *v*)*i*, a crucial reordering of memory accesses valid under RA but not under SC, into a combination of Write-Read Deorder (WR-Deord) *h*(x **:=** *v*)*,* y**?***i* ↠ (x **:=** *v*) *∥* y**?** together with structural transformations and laws of parallel programming:

$$\begin{array}{c} \downarrow^{\text{Structural}} \downarrow^{\text{(WR-Decord)}}\\ (\mathbf{x} := v) \; ; \mathbf{y} ? \rightarrow \mathbf{snd} \; \langle \begin{subarray}{l} (\mathbf{x} := v) \end{subarray}, \mathbf{y} ? \rangle \rightarrow \mathbf{snd} \; ((\mathbf{x} := v) \; \| \; \mathbf{y} ?)\\ \downarrow^{\text{Par. Program. Swymmetry}} \qquad \downarrow^{\text{Structural}} \qquad \downarrow^{\text{Par. Program. Swqueezing}}\\ \rightarrow \mathbf{snd} \; (\mathbf{swan} \; (\mathbf{y} ? \parallel (\mathbf{x} := v))) \; \rightarrow \mathbf{sfs} \; (\mathbf{y} ? \parallel (\mathbf{x} := v)) \; \rightarrow \mathbf{fst} \; \langle \mathbf{y} ?, (\mathbf{x} := v) \rangle \end{array}$$

This provides a separation of concerns: the components of this decomposition are supported by our semantics using independent arguments. It also sheds a light on the interesting part, as they are all valid under SC except for (WR-Deord).

#### **3.2 Traces for Release/Acquire**

Adapting Brookes's SC-traces, our RA-traces also include a sequence of transitions *ξ*, each transition a pair of RA memories; and a return value *r*. Intuitively, these play a similar role here, formally grounded in analogs to the stutter and mumble rewrite rules. Seeing that the operational semantics only adds messages and never modifes them, we require that every memory snapshot in the sequence *ξ* be contained in the subsequent one, whether it be within or across transitions. A message added within a transition is a *local message*; otherwise it is an *environment message*. We call the frst memory in *ξ*'s frst transition its *opening memory*, and the second memory in *ξ*'s last transition its *closing memory*.

In addition, RA-traces include an initial view *α*, declaring which messages are relied upon to be revealed in *ξ*'s opening memory; and a fnal view *ω*, declaring which messages are guaranteed to be revealed in *ξ*'s closing memory. We ground these intuition formally in the rewind and forward rewrite rules below.

We write the trace as *α ξ ω*∴*r*. See an illustration on the bottom of Figure 1.

*Stutter & Mumble.* We defne the stutter (St) and mumble (Mu) rewrite rules:

$$\alpha \boxed{\xi \eta} \omega \mathrel{\mathop{:}} r \xrightarrow{\mathsf{St}} \alpha \boxed{\underbrace{\xi \langle \mu, \mu \rangle \eta} \, ^{\mathsf{T}} \omega \mathrel{\mathop{:}} r} \quad \qquad \alpha \boxed{\underbrace{\xi \langle \mu, \rho \rangle \langle \rho, \theta \rangle \eta} \, ^{\mathsf{T}} \omega \mathrel{\mathop{:}} r \xrightarrow{\mathsf{Mu}} \alpha \boxed{\underbrace{\xi \langle \mu, \theta \rangle \eta} \, ^{\mathsf{T}} \omega \mathrel{\mathop{:}} r}$$

As in Brookes's semantics, their role is to make the semantics more abstract by divorcing the length of the sequence from the individual steps taken in the operational semantics, while maintaining the transitions' Rely/Guarantee character.

*Rewind & Forward.* The rewind (Rw) rewrite rules establish the fact that the term only relies on certain messages being *revealed*, not on messages being obscured. The rewind rule modifes the initial view, making it point to earlier messages on the timelines. Thus, relied upon messages will remain available after the rewrite. Similarly, the forward (Fw) rewrite rule establish the fact that the term only guarantees that certain messages are revealed. The forward rule modifes the fnal view, making it point to later messages on the timelines. Thus, any message guaranteed to be available was already guaranteed beforehand. The rules are schematically depicted in Figure 3.

$$
\begin{pmatrix}
\overbrace{\alpha}^{\mathcal{L}} \dots \underbrace{\begin{bmatrix} \overleftarrow{\tilde{\epsilon}} \end{bmatrix}}^{\mathcal{L}} \dots \underbrace{\overleftarrow{\tilde{\nu}} \} \dots \overleftarrow{\tilde{\epsilon}}^{\mathcal{L}} \dots \underbrace{\overleftarrow{\tilde{\epsilon}} \} \dots \underbrace{\overleftarrow{\tilde{\nu}} \} \dots \overleftarrow{\tilde{\epsilon}}^{\mathcal{L}} \dots \overleftarrow{\tilde{\nu}} \dots \underbrace{\overleftarrow{\tilde{\nu}} \} \dots \underbrace{\overleftarrow{\tilde{\epsilon}} \} \dots \overleftarrow{\tilde{\epsilon}}^{\mathcal{L}}
$$

**Fig. 3.** Schematic depictions of the rewind and forward rewrite rule, focusing on a single location, where the initial/fnal view points to *ν* before and points to *ϵ* after. The messages *ν* and *ϵ* may coincide, dovetail, or be separated. **Left:** The initial view *α* is "rewound" to *α ′* . **Right:** The fnal view *ω* is "forwarded" to *ω ′* .

#### **3.3 Introducing Denotations for RA**

We present denotations of key constructs of the programming language. By referring to the notion of a *closed set* below, we mean a set that is closed under certain rewrite rules, such as stutter, mumble, rewind, and forward from §3.2.

*Pure.* A pure (i.e. efect-free) computation guarantees a returned value, and otherwise can only guarantee what it relies on. For example, defne <sup>J</sup>2 + 3<sup>K</sup> as least closed set with all traces of the form *κ hµ, µi κ* ∴ 5.

*Sequence.* In denoting sequential composition we must make sure that the frst component does not obscure any message that the second component relies on. Thus, defne <sup>J</sup>*hM, Ni*<sup>K</sup> as least closed set with all traces of the form *<sup>α</sup> ξη <sup>ω</sup>*∴*hr, si*, where there exists a view *<sup>κ</sup>* such that *<sup>α</sup> <sup>ξ</sup> <sup>κ</sup>* <sup>∴</sup> *<sup>r</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>K</sup> and *<sup>κ</sup> <sup>η</sup> <sup>ω</sup>* <sup>∴</sup> *<sup>s</sup> <sup>∈</sup>* <sup>J</sup>*N*K. The *existence* of the revealed messages is implicit: *ξ*'s closing memory must be contained in the memory that follows it, which is *η*'s opening memory. The defnition of <sup>J</sup>*<sup>M</sup>* **;** *<sup>N</sup>*<sup>K</sup> is the same, except that the frst component of the returned pair is discarded. That is, with traces of the form *α ξη ω* ∴ *s*.

*Parallel.* Threads composed in parallel rely on the same preceding sequential environment and guarantee to the same succeeding sequential environment. Thus, defne <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*2<sup>K</sup> as the least closed set with all traces of the form *<sup>α</sup> <sup>ξ</sup> <sup>ω</sup>*∴*hr*1*, r*2*i*, where there exist sequences *ξ*<sup>1</sup> and *ξ*<sup>2</sup> such that and *ξ* is obtained by interleaving their transitions, and *<sup>α</sup> <sup>ξ</sup><sup>i</sup> <sup>ω</sup>* <sup>∴</sup> *<sup>r</sup><sup>i</sup> <sup>∈</sup>* <sup>J</sup>*M<sup>i</sup>*<sup>K</sup> (for *<sup>i</sup> ∈ {*1*,* <sup>2</sup>*}*).

*Dereference.* We defne <sup>J</sup>*ℓ***?**<sup>K</sup> to be the least closed set with all traces of the form *α hµ, µi ω* ∴*v*, where *ℓ*:*v*@(*q, α*(*ℓ*)]⟪*κ*⟫ *∈ µ* for some timestamp *q* and view *κ*, and both *α ≤ ω* and *κ ≤ ω*.

*Assignment.* Defne <sup>J</sup>*<sup>ℓ</sup>* **:=** *<sup>v</sup>*<sup>K</sup> as the least closed set with all traces of the form *α hµ, ρi ω* ∴ *hi* where *ρ* is obtained by adding the message *ℓ*:*v*@(*q, ω*(*ℓ*)]⟪*ω*⟫ to *µ* for some timestamp *q*, and *α ≤ ω*.

*Read-modify-write.* The defnition of <sup>J</sup>FAA (*ℓ, w*)<sup>K</sup> combines the two above, along with a dovetailing requirement. Specifcally, it is the least closed set with all traces of the form *α hµ, ρi ω* ∴*v*, where *ℓ*:*v*@(*q, α*(*ℓ*)]⟪*κ*⟫ *∈ µ* for some timestamp *q* and view *κ*, both *α ≤ ω* and *κ ≤ ω*, and *ρ* is obtained by adding the message *ℓ*: (*v*+*w*) @(*α*(*ℓ*)*, ω*(*ℓ*)]⟪*ω*⟫ to *µ*. The semantics of other RMWs is defned similarly. Example. We show that <sup>J</sup>*<sup>ℓ</sup>* **:=** *<sup>v</sup>* **;** *<sup>v</sup>*<sup>K</sup> *<sup>⊆</sup>* <sup>J</sup>*<sup>ℓ</sup>* **:=** *<sup>v</sup>* **;** *<sup>ℓ</sup>***?**K. When sequencing two traces, the fnal view of the frst must match the initial view of the second, so traces in <sup>J</sup>*<sup>ℓ</sup>* **:=** *<sup>v</sup>* **;** *<sup>v</sup>*<sup>K</sup> have the form *<sup>α</sup> <sup>h</sup>µ, ρi hθ, θ<sup>i</sup> <sup>ω</sup>* <sup>∴</sup>*v*, where *<sup>ρ</sup>* is obtained by adding the message *ℓ*:*v*@(*q, ω*(*ℓ*)]⟪*ω*⟫ to *µ* for some timestamp *q*, and *α ≤ ω*. Since *ω* points to this added message, and since *ρ ⊆ θ* as memories along a trace's sequence, *<sup>ω</sup> <sup>h</sup>θ, θ<sup>i</sup> <sup>ω</sup>* <sup>∴</sup> *<sup>v</sup> <sup>∈</sup>* <sup>J</sup>*ℓ***?**K. By sequencing, *<sup>α</sup> <sup>h</sup>µ, ρi hθ, θ<sup>i</sup> <sup>ω</sup>* <sup>∴</sup> *<sup>v</sup> <sup>∈</sup>* <sup>J</sup>*<sup>ℓ</sup>* **:=** *<sup>v</sup>* **;** *<sup>ℓ</sup>***?**K.

#### **3.4 Correspondence to the Operational Semantics**

Traces in denotations, if unconstrained, may represent behaviors that include operationally unreachable states. Forbidding such redundant traces eliminates a source of diferentiation between denotations, thus increasing their abstraction.

*Reachable states.* Consider the transformation x**? ;** y**?** ↠ y**?**, a consequence of the RA-valid Irrelevant Read Elimination (R-Elim) x**? ;** *hi* ↠ *hi* and structural equivalences. Consider the state *S* that consists of the memory at the top of Figure 1 and the view that points to *ν*<sup>3</sup> and *ϵ*2. The only step x**? ;** y**?** can take from the state *S* is to load *ν*3, inheriting the view that *ν*<sup>3</sup> carries, which changes the thread's view to point to *ϵ*3. Only *ϵ*<sup>3</sup> is available in the following step, which means the term returns 3. In contrast, starting from *S*, the term y**?** can load from *ϵ*<sup>2</sup> to return 7. This analysis does not invalidate the transformation because the state *S* is unreachable by an execution starting from an initial state, and should therefore be ignored when determining observable behaviors.

*Internalizing invariants.* Just as we ignore unreachable states in the operational semantics, we discard "unreachable" traces to refne our denotational semantics. We consider a state to be valid if it adheres to the following invariants.

*Scattering: segments in memory never overlap.*

*Pointing: views always point to messages.*


Memory snapshots in traces are required to obey each of the invariants above. The initial and fnal view must point to and dominate the opening and closing memory respectively. This means that there must be a message to load that allows the initial and fnal view to be equal, and we obtain <sup>J</sup>x**? ;** *hi*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*hi*K.

We also uphold requirements that correspond to the relation between the states across a possibly-interrupted series of steps in the operational semantics:

*Accumulating: the memory after contains the memory before.* We require that every memory snapshot contains the one before it.

**Fig. 4.** Two variations on the memory illustrated in Figure 1. **Top:** This can function as a memory snapshot in a trace. It demonstrates that the views of messages along a timeline do not have to be ordered: *ϵ*<sup>2</sup> appears earlier than *ϵ*<sup>3</sup> on y's timeline but points to a later message on x's timeline. **Bottom:** This cannot function as a memory snapshot in a trace, because it contains an ascending path. Intuitively, no thread could have written *ϵ*<sup>2</sup> because the view that *ϵ*<sup>2</sup> carries indicates that the thread would have already "known" about *ν*<sup>3</sup> and therefore, following the causality chain, about *ϵ*<sup>3</sup> as well. Thus, the thread would have been forbidden from picking *ϵ*2's timestamp.

*Delimiting: if the view-trees before and after are leaves, then the view after dominates the view before, and the view of any written message dominates the view before and is dominated by the view after.* We impose the analogous requirement on the initial and fnal views, and on the local messages.

The trace in Figure 1 adheres to the invariants and relationships we have listed.

*Concrete operational correspondence.* We call the rewrite rules that were defned in §3.2 *concrete* because they maintain a certain concrete interpretation of traces. To see this, consider the operational semantics for RA augmented with an additional kind of step, which any term can take. The only change along this step is that a view in the view-tree inherits the view from a message that is available to it. This addition does not change the observable behaviors of whole programs, and maintains the above invariants.

Each trace in the denotations of §3.3, if closed only under the concrete rewrite rules, corresponds to an interrupted execution in the augmented operational semantics. The correspondence is similar to that from Brookes's semantics in terms of the sequence of transitions and return value. The initial and fnal views determine the views at the beginning and the end of the interrupted execution.

The introduction of the rewrite rules in §3.5 will mean that traces do not have such a clear operational interpretation. The key to our proof of adequacy

**Fig. 5.** Schematic depiction of the tighten rewrite rule, that focuses on a particular memory snapshot within the trace, in the setting of *k*+1 locations. The message *ν* is "tightened" to *ν ′* , such that for each *i* it points to *β<sup>i</sup>* instead of *ϵ<sup>i</sup>* . This includes the case that *β<sup>i</sup>* and *ϵ<sup>i</sup>* are the same message in some locations.

is to partially recover this operational correspondence in terms of the overall observable behaviors (§4).

#### **3.5 Abstract Rewrite Rules**

Transitions in RA traces consist of sets of messages, which record much more information about the operational execution than the mappings from locations to values we had in SC. This makes the trace-based semantics too concrete. We resolve the memory-concreteness issue by introducing three *abstract* rewrite rules that obfuscate information about local messages. This makes the denotations more abstract by blurring the distinctions that denotations can make.

*Tighten.* Recall the transformation (WR-Deord) that we wish to support. Let *<sup>τ</sup>*<sup>1</sup> *<sup>∈</sup>* <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>*<sup>K</sup> and *<sup>τ</sup>*<sup>2</sup> *<sup>∈</sup>* <sup>J</sup>y**?**K, such that they compose sequentially to form a trace from <sup>J</sup>*h*(<sup>x</sup> **:=** *<sup>v</sup>*)*,* <sup>y</sup>**?***i*K. Then *<sup>τ</sup>*1's fnal view *<sup>κ</sup>* must equal *<sup>τ</sup>*2's initial view. The view *κ* dominates the view *σ* of the local message *ν*<sup>1</sup> stored by *τ*1, and *κ* cannot obscure the message *ν*<sup>2</sup> from which *τ*<sup>2</sup> loaded its value. Thus, *σ* cannot obscure *ν*2. In contrast, consider *τ*<sup>1</sup> and *τ*<sup>2</sup> that compose in parallel to form a trace from <sup>J</sup>(<sup>x</sup> **:=** *<sup>v</sup>*) *<sup>∥</sup>* <sup>y</sup>**?**K. Here, the view of the local message may very well obscure the loaded message. Indeed, the fnal view of *τ*<sup>1</sup> may dominate the initial view of *τ*2.

To resolve this, observe that the purpose of recording views in messages is to encumber its loaders. Under this perspective, the view of a local message guarantees to the environment that loading the local message will keep certain messages revealed. Therefore, making the view larger only weakens the guarantee. Thus, we introduce the tighten (Ti) rewrite rule that makes the view of a local message larger. The rule is depicted in Figure 5, and Figure 6 provides a concrete example. Using tighten, we can show that <sup>J</sup>*h*(<sup>x</sup> **:=** *<sup>v</sup>*)*,* <sup>y</sup>**?***i*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>(<sup>x</sup> **:=** *<sup>v</sup>*) *<sup>∥</sup>* <sup>y</sup>**?**K.

*Absorb.* Recall the transformation (WW-Elim) that we wish to support. To show this we aim to replicate, as far as we can, the reasoning we have used to show <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> in Brookes's semantics. Recall that, to use mumble, we made the memories match across the two transitions of <sup>J</sup><sup>x</sup> **:=** *<sup>v</sup>* **;** <sup>x</sup> **:=** *<sup>w</sup>*K. Doing so here, we end up with two local messages, whereas traces from <sup>J</sup><sup>x</sup> **:=** *<sup>w</sup>*<sup>K</sup> only have a single local message. Roughly speaking, the equality concerning SC

**Fig. 6.** A possible result from rewriting the trace from Figure 1 using tighten. Since *ν*<sup>2</sup> is local in the trace from Figure 1, tighten can advance its view to point to *ϵ*<sup>3</sup> instead of *ϵ*1. The same replacement is applied throughout the trace's sequence, not just the closing memory.

memories *µ* [x := *v*] [x := *w*] = *µ* [x := *w*] does not transfer to RA where memory, by accumulating messages, is more concrete. We resolve this by adding the absorb (Ab) rewrite rule, which replaces two dovetailed local messages with one that carries the second message's value. The rule is depicted in Figure 7, and Figure 8 provides a specifc example.

*Dilute.* There is another known family of transformations that are valid under RA memory, yet we cannot justify with the rules we presented. These introduce non-modifying atomic updates, such as Read to FAA (R-FAA) *ℓ***?** ↠ FAA (*ℓ,* 0).

Running within some context, FAA (*ℓ,* 0) reads a message *ν*, to which it dovetails another message *ϵ* with the same value. It's possible that some *β* dovetails with *ϵ* later in the execution. In the same context, we can simulate this behavior with *ℓ***?** instead, by having the context provide *ν ′* instead of *ν*, with the diference that it takes up the same segment that *ν* and *ϵ* have taken up combined. If there is a *β* as mentioned, it can now dovetail with *ν ′* to the same efect. In this scenario, *ν* is an environment message, but we must also account for the case that it is local to allow for composition, such as in *ℓ***:=***v* **;***ℓ***?** ↠ *ℓ***:=***v* **;**FAA (*ℓ,* 0).

We internalize the idea behind this argument as the dilute (Di) rewrite rule, in which a message is replaced by two message that together occupy the same segment, the second being a local message that cannot appear before the frst in the trace and must carry the same value. With dilute, <sup>J</sup>*ℓ***?**<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>FAA (*ℓ,* 0)K. The rule is depicted in Figure 7, and Figure 9 provides a specifc example.

#### **3.6 Monadic Presentation**

One of the contributions of this work is to bridge research of weak-memory models with Moggi's monad-based approach [38] to denotational semantics. In this approach, one start by defning a monad, which has three components. The

**Fig. 7.** Schematic depictions of the absorb (left) and dilute (right) rewrite rules, that focus on the segment of the dovetailed messages together with all pointers into and out of them, within a particular memory snapshot. The *circular* cloud represents the subset of the memory that the messages in focus are pointing to, showing that they all have the same view. The *elliptical* clouds represent views including the initial and fnal view, as well as other messages—that point to each of the dovetailing messages. **Left:** The message *ν* is "absorbed" into the message *ϵ* to become *ϵ ′* . No view may point to *ν*. **Right:** The message *ν ′* "dilutes" into *ν* and *ϵ*. While *ϵ* must be a local message, *ν* and *ν ′* can appear anywhere the trace's sequence, as long as they appear in the same places in the sequence, and that *ϵ* does not appear before. The views that point to *ν ′* before diluting can point either to *ν* or to *ϵ* after diluting.

frst associates for every set *X*, which we think of as representing returned values, to a set *T X* representing computations that return values from *X*. In our case, *T X* consists of countable sets of traces closed under rewrite rules.

Denotations are then defned according to their *typing judgments*. For example, *a, b* : Loc *` ha, b***?***i* : (Loc *×* Val) means that in the context that the free variables *a* and *b* are locations, the term *ha, b***?***i* is a location-value pair. Given a function *<sup>γ</sup>* that maps *<sup>a</sup>* and *<sup>b</sup>* to locations, <sup>J</sup>*ha, b***?***i*<sup>K</sup> *<sup>γ</sup> ∈ T* (Loc *<sup>×</sup>* Val). For *<sup>Γ</sup> ` <sup>M</sup>* : *<sup>A</sup>* and *<sup>Γ</sup> ` <sup>N</sup>* : *<sup>A</sup>*, we generalize containment <sup>J</sup>*N*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*M*<sup>K</sup> pointwise: if *<sup>γ</sup>* maps variables in *<sup>Γ</sup>* appropriately by their type, then <sup>J</sup>*N*<sup>K</sup> *<sup>γ</sup> <sup>⊇</sup>* <sup>J</sup>*M*<sup>K</sup> *<sup>γ</sup>*. This degenerates when *Γ* is empty, i.e. when *M* and *N* are *closed terms*.

The second monad component is a function return*<sup>T</sup> <sup>X</sup>* : *X → T X* maps values to pure computations that return that value. The third component sequences computations, such that the latter depends on the value returned by the former: (⟫= *T X,Y* ) : (*T X*) *×* (*X → T Y* ) *→ T Y* . Omitting the indices, the monad components must satisfy certain axioms that formalize the stated intuition: return *r*⟫=*f* = *f*(*r*), *P*⟫= return = *P* and (*P*⟫=*f*) ⟫=*g* = *P*⟫=*λr.*(*f*(*r*)⟫=*g*).

In our case, we defne return *r* as the least closed set with all traces of the form *κ hµ, µi κ*∴*r*; and *P* ⟫= *f* as the least closed set with all traces of the form *α ξη ω* ∴ *s*, where *α ξ κ* ∴ *r ∈ P* and *κ η ω* ∴ *s ∈ f*(*r*) for some *κ*.

*Denotations.* This approach comes read-made with denotations for standard language constructs. For example, <sup>J</sup>*hM, Ni*<sup>K</sup> *<sup>γ</sup>* <sup>B</sup> <sup>J</sup>*M*<sup>K</sup> *<sup>γ</sup>* ⟫<sup>=</sup> *λr.*(J*N*<sup>K</sup> *<sup>γ</sup>* ⟫<sup>=</sup> *λs.hr, si*). Similarly, <sup>J</sup>match *<sup>M</sup>* with *<sup>h</sup>a, bi. N*<sup>K</sup> *<sup>γ</sup>* <sup>B</sup> <sup>J</sup>*M*<sup>K</sup> *<sup>γ</sup>* ⟫<sup>=</sup> *<sup>λ</sup>hr, si.* <sup>J</sup>*N*<sup>K</sup> *<sup>γ</sup>* [*<sup>a</sup> 7→ <sup>r</sup>*] [*<sup>b</sup> 7→ <sup>s</sup>*], where *γ* [*a 7→ r*] is obtained from *γ* by mapping *a* to *r*. Pure computations use the return function, e.g. <sup>J</sup>*v*<sup>K</sup> <sup>=</sup> return *<sup>v</sup>*.

Program efects can be modularly introduced in this approach, such as memory access, where <sup>J</sup>*<sup>ℓ</sup>* **:=** *<sup>v</sup>*<sup>K</sup> *∈ T {hi}* and <sup>J</sup>*ℓ***?**K*,* <sup>J</sup>FAA (*ℓ, v*)<sup>K</sup> *∈ T* Val; and par-

**Fig. 8.** A possible result from rewriting of the trace from Figure 6 using absorb. The dovetailed messages *ν*<sup>2</sup> and *ν*<sup>3</sup> are local in the trace from Figure 1, added within the same transition, so by rewriting by absorb they can be replaced by *ν ′* 3 obtained by stretching *ν*3's segment to cover *ν*2's segment.

allel composition, a function (*|||<sup>T</sup> X,Y* ) : *T X × T Y → T* (*X × Y* ) with which <sup>J</sup>*<sup>M</sup> <sup>∥</sup> <sup>N</sup>*<sup>K</sup> *<sup>γ</sup>* <sup>B</sup> <sup>J</sup>*M*<sup>K</sup> *<sup>γ</sup> |||* <sup>J</sup>*N*<sup>K</sup> *<sup>γ</sup>*. The defnition remains the same: we obtain traces in *P ||| Q* by interleaving transitions and pairing returned values of traces with matching views, one from *P* and one from *Q*.

Adhering to left-to-right evaluation both operationally and denotationally, *M* **:=***N* is equivalent to match *hM, Ni*with*ha, bi. a* **:=***b*. In traces of assignment, the added local message is free to dovetail with a previous message, unlike in RMW traces where it must. Therefore, we have <sup>J</sup>*<sup>ℓ</sup>* **:=** (*ℓ***?** <sup>+</sup> *<sup>v</sup>*)<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>FAA (*ℓ, v*)K.

*Structural reasoning.* Among the general results and proof techniques this approach supplies are *structural equivalences*. These are denotational equations that hold due to the properties of the core calculus, and are preserved by modular expansions with program efects. For instance, if *K* is efect-free, then <sup>J</sup>if *<sup>K</sup>* then *<sup>M</sup>* **;** *<sup>N</sup>* else *<sup>M</sup>* **;** *<sup>N</sup>′* <sup>K</sup> <sup>=</sup> <sup>J</sup>*<sup>M</sup>* **;** if *<sup>K</sup>* then *<sup>N</sup>* else *<sup>N</sup>′* <sup>K</sup>. Equivalences such as this one may otherwise require challenging ad-hoc proofs [e.g. 24, 26].

More generally, structural reasoning composes to derive further equivalences. For example, from <sup>J</sup>*hi*<sup>K</sup> <sup>=</sup> <sup>J</sup>*ℓ***? ;** *hi*<sup>K</sup> and structural equivalences, namely "left neutrality" <sup>J</sup>*K*<sup>K</sup> <sup>=</sup> <sup>J</sup>*hi* **;** *<sup>K</sup>*<sup>K</sup> and "associativity" <sup>J</sup>(*<sup>M</sup>* **;** *<sup>N</sup>*) **;** *<sup>K</sup>*<sup>K</sup> <sup>=</sup> <sup>J</sup>*<sup>M</sup>* **;** (*<sup>N</sup>* **;** *<sup>K</sup>*)K:

$$\mathbb{I}\left[K\right] = \left[\left<\rangle\right>; K\right] = \left[\left(\ell?\right>; \left<\rangle\right); K\right] = \left[\ell?\right>; \left(\left<\rangle\right>; K\right)\right] = \left[\ell?\right>; K\right] \tag{\star}$$

Structural reasoning generalizes to program transformations. For example, (⟫=) is monotonic, so we can also derive:

$$\mathbb{I}\left[\langle\rangle\right] = \left[\ell?\right] ; \langle\rangle\mathbb{I} = \left[\ell?\right] \rangle \mathbb{I} = \lambda v. \left[\langle\rangle\right] \supseteq \left[\text{FAA}\left(\ell, 0\right)\|\right] \mathbb{I} = \lambda v. \left[\langle\rangle\right] \mathbb{I} = \left[\text{FAA}\left(\ell, 0\right) ; \langle\rangle\right] \mathbb{I}$$

Since (*|||*) is also monotonic, we can use this to show that <sup>J</sup>(SB)<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>(SB+F)K.

**Fig. 9.** A possible result from rewriting of the trace from Figure 1 using dilute. The message *ϵ*<sup>1</sup> from Figure 1 was replaced with *ϵ ′* 1 , with the same value 1. The local message *β*—which takes up the rest of the missing space left behind by *ϵ*1—always appears with *ϵ ′* 1 , dovetailing with it and carrying the same value. The message *ϵ*2, that used to dovetail with *ϵ*1, now dovetails with *β*.

*Higher order.* An important aspect of a programming language is its facilitation of abstraction. Higher-order programming is a fexible instance of this, in which programmable functions can take functions as input and return functions as output. Moggi's approach supports this feature out-of-the-box, in such a way that does not complicate the rest of the semantics, as the frst-order fragment of the semantics need not change to include it.

Every value returned by an execution has a semantic presentation which we use as the returned value in traces. The semantic and syntactic values are identifed in the frst-order fragment, but diferent syntactic functions may have the same semantics, so the identifcation does not extend to higher-order.

bound) and of *ground type* (all functions are applied to arguments). This defnition is in line with the expectation that a program should return a concrete result that the end-user can consume. Thus, we only consider observable behaviors of programs. Transformations only need to be valid when applied within programs. Programs degenerate to closed terms in the frst-order fragment. We classify a term as a *program* if it is *closed* (every variable occurrence is

# **4 Main Results**

We present the main results that we have proven about our denotational semantics. Moggi's semantic toolkit features ubiquitously in their proofs.

*Compositionality.* In its most basic form, this key feature of denotational semantics means that a program term's denotation is defned using *the denotations* of its immediate subterms. We have used this in (*⋆*). In our case denotations are sets, where each elements represents a possible behavior of the term, we are interested in establishing a directional generalization of compositionality:

140 Y. Dvir, O. Kammar, and O. Lahav

**Lemma 1.** *If* <sup>J</sup>*M*<sup>K</sup> *<sup>⊆</sup>* <sup>J</sup>*N*<sup>K</sup> *then* <sup>J</sup>Ξ [*M*]<sup>K</sup> *<sup>⊆</sup>* <sup>J</sup>Ξ [*N*]<sup>K</sup> *for any program context* Ξ [*−*]*.*

Compositionality is a consequence of its monadic design using monotonic operators, and is not substantially diferent from previous work [e.g. 20].

*Observability correspondence.* The abstract rewrite rules break the direct correspondence between traces and interrupted executions. For example, in our analysis of (WW-Elim), by using absorb, we ended up with a trace in which only one message is added even though the program term adds two messages.

Still, some connection must remain to obtain a proof of adequacy. In particular, we would like traces to correspond to observable behavior of programs. In one direction, an even stronger property holds, known as soundness:

**Lemma 2.** *For every execution of a program M in the operational semantics of RA, there exists <sup>α</sup> <sup>h</sup>µ, ρ<sup>i</sup> <sup>ω</sup>* <sup>∴</sup> *<sup>r</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>K</sup> *that matches the execution: <sup>h</sup>α, µ<sup>i</sup> is the initial state, hω, ρi is the fnal state, and r matches the value returned.*

To prove soundness, we take a trace where transitions correspond to the memoryaccessing execution steps, and then use mumble to obtain a single transition.

Ignoring the fnal state, the correspondence holds in the other direction too:

**Lemma 3.** *For every program <sup>M</sup> and <sup>α</sup> <sup>h</sup>µ, ρ<sup>i</sup> <sup>ω</sup>*∴*<sup>r</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>K</sup> *there is an observable behavior of M with initial state hα, µi and return value matching r.*

The lack of correspondence with the fnal state is an artifact of the concretenessabstraction divergence between the operational and denotational semantics. Due to this divergence, it is signifcantly more challenging to establish this direction of the correspondence than in previous work.

*Overcoming the concreteness-abstraction hurdle.* The most technically challenging step in proving Lemma 3 is to prove the application of abstract rewrite rules can be deferred to the end. We defne the *basic denotation* of a term *<sup>M</sup>* by <sup>J</sup>*M*K, which is the denotation were it defned using only the concrete rewrite rules. Denoting its closure under the abstract rewrite rules by <sup>J</sup>*M*<sup>K</sup> *†* , we claim:

#### **Lemma 4.** *If <sup>M</sup> is a program, then* <sup>J</sup>*M*<sup>K</sup> *†* <sup>=</sup> <sup>J</sup>*M*K*.*

Thus, to obtain all of the traces that result from the regular denotational construction, where all of the rewrite rules are applied throughout the entire denotational construction, it is enough to close only under the concrete rewrite rules as the denotation of a program is built-up from its subterms, applying the abstract rewrite rules only at the top level.

The intuition that guides the inductive proof of Lemma 4 is that the abstract rewrite rules can be percolated out. To get the main idea across while keeping the discussion self-contained, we focus on the <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*<sup>2</sup><sup>K</sup> *† <sup>⊇</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*<sup>2</sup><sup>K</sup> case.

Let *<sup>π</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*<sup>2</sup>K. By defnition, *<sup>π</sup>* is obtained by frst composing some *<sup>τ</sup>*<sup>1</sup> *<sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup><sup>K</sup> in parallel with some *<sup>τ</sup>*<sup>2</sup> *<sup>∈</sup>* <sup>J</sup>*M*<sup>2</sup>K, i.e. interleaving transitions and pairing return values, and then rewriting the resulting trace *τ* with concrete and abstract rules. By the inductive hypothesis, <sup>J</sup>*Mi*<sup>K</sup> *† <sup>⊇</sup>* <sup>J</sup>*Mi*K. So *<sup>τ</sup><sup>i</sup> <sup>∈</sup>* <sup>J</sup>*Mi*<sup>K</sup> *†* , meaning that *τ<sup>i</sup>* is the result of rewriting some *τ ′ <sup>i</sup> <sup>∈</sup>* <sup>J</sup>*Mi*<sup>K</sup> with abstract rules.

To warm up, we frst address the case where *τ ′* 1 Ab *−→ τ*<sup>1</sup> and *τ ′* <sup>2</sup> = *τ*2. We would hope, naively, that we can compose *τ ′* <sup>1</sup> with *τ ′* 2 to obtain some *τ ′ <sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*2<sup>K</sup> such that *τ ′* Ab *−→ τ* , and thus *τ ′* rewrites to *π*. However, they do not compose because *τ ′* <sup>1</sup> has two local message, and *τ ′* <sup>2</sup> has only the one environment message that matches the result of "absorbing" the two messages. Rather, *τ ′* 1 can compose with a trace *τ*¯<sup>2</sup> which is equal to *τ ′* 2 except for having the required two environment messages instead of the combined one.

We formalize this by introducing a dual auxiliary rewrite rule ¯x for each abstract rule x. For example, the dual of absorb is expel, which splits up an environment message dually to how absorb combines local messages. The auxiliary rewrite rules keep us within the basic denotations:

**Lemma 5.** *If <sup>τ</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>K</sup> *and <sup>τ</sup>* <sup>z</sup>*−→ <sup>π</sup> for some auxiliary rule* <sup>z</sup>*, then <sup>π</sup> <sup>∈</sup>* <sup>J</sup>*M*K*.*

Then we apply *τ ′* 2 ¯<sup>x</sup>*−→ <sup>τ</sup>*¯<sup>2</sup> *<sup>∈</sup>* <sup>J</sup>*Mi*K, and obtain the required *<sup>τ</sup> ′* by composing *τ ′* 1 in parallel with *τ*¯2. This process of applying the dual rewrite in order to percolate an abstract rewrite out holds for sequential composition too. We summarize:

**Lemma 6.** *If π ′* <sup>x</sup>*−→ π for some abstract* x*, and π composes in parallel with ϱ to obtain τ , then there exist ϱ ′* ¯<sup>x</sup>*−→ ϱ and τ ′* <sup>x</sup>*−→ τ , such that π ′ composes in parallel with ϱ ′ to obtain τ ′ . Similarly for sequential composition.*

In the case where there are more abstract rewrite rules needed to obtain *τ*<sup>1</sup> from *τ ′* 1 , we can repeat the process. Yet two problems remain.

The frst problem is that *π* is obtained from *τ ′ <sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*2<sup>K</sup> by both concrete and abstract rewrites, starting with the abstract rewrites that we have "peeled of" *<sup>τ</sup>*1. To show that *<sup>π</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*2<sup>K</sup> *†* , we need the concrete rewrites to come before the abstract rewrites.

The second problem appears once we remove our simplifying assumption that *τ ′* <sup>2</sup> = *τ*2. In the general case, we obtain *τ*¯<sup>2</sup> from *τ ′* <sup>2</sup> using abstract rewrites followed by auxiliary rewrites. If we could replace the sequence of rewrites with one in which the abstract rewrites follow the auxiliary rewrites, then *τ ′* 2 could be rewritten with auxiliary rules to some *τ*¯ *′* <sup>2</sup> *<sup>∈</sup>* <sup>J</sup>*M*<sup>2</sup><sup>K</sup> by using Lemma 5, which in turn could be rewritten with abstract rewrites to *<sup>τ</sup>*¯<sup>2</sup> *<sup>∈</sup>* <sup>J</sup>*M*<sup>2</sup><sup>K</sup> *†* . This would allow the proof to continue by repeating the process to the other side.

Both problems are solved by commuting the abstract rewrites outwards:

#### **Lemma 7.** *For any rewrite sequence starting with τ and ending with π, there exists one in which all of the abstract rewrites appear last.*

Thus, we can do as we planned and repeat the process to the other side, "peeling of" the abstract rewrites from *τ*¯<sup>2</sup> to obtain *τ*¯ *′* <sup>2</sup> *<sup>∈</sup>* <sup>J</sup>*M*<sup>2</sup>K, rewriting *<sup>τ</sup> ′* <sup>1</sup> with the dual auxiliary rules in lockstep, resulting in some *τ*¯ *′* <sup>1</sup> *<sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup><sup>K</sup> by Lemma 5. By Lemma 6, these compose in parallel to some *<sup>τ</sup>*¯ *<sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*<sup>2</sup><sup>K</sup> that rewrites with

concrete and abstract rules to *τ* , and thus to *π*. By Lemma 7, we can rewrite *τ*¯ with concrete rules to some *τ*¯ *′ <sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*2<sup>K</sup> frst, and with abstract rules afterwards, obtaining *<sup>π</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>1</sup> *<sup>∥</sup> <sup>M</sup>*2<sup>K</sup> *†* .

Having established Lemma 4, the rest is relatively straightforward. First, traces in basic denotations correspond to interrupted executions, and in particular, an analog of Lemma 3 holds for basic denotations:

**Lemma 8.** *For every program <sup>M</sup> and <sup>α</sup> <sup>h</sup>µ, ρ<sup>i</sup> <sup>ω</sup>*∴*<sup>r</sup> <sup>∈</sup>* <sup>J</sup>*M*<sup>K</sup> *there is an observable behavior of M with initial state hα, µi and return value matching r.*

Next, it is clear from their defnition that the abstract rules do not change the number of transitions. Thus, thanks to Lemma 4, the single-transition traces in <sup>J</sup>*M*<sup>K</sup> are the result of rewriting single-transition traces in <sup>J</sup>*M*<sup>K</sup> by abstract rules, which correspond to observable behaviors of *M* by Lemma 8.

Lemma 3 follows from the fact that the abstract rules preserve the correspondence between traces and observable behavior of programs. For example, due to absorb there is a trace which only adds one message in the denotation of a program that adds two messages; yet the initial view, the opening memory, and the returned value are maintained. The tighten rule similarly preserves these. In both cases, the execution exhibiting the behavior can remain unchanged. The dilute rule may replace an initial message's timestamp with a smaller one, in which case the execution exhibiting the behavior needs to use the new timestamp accordingly, but otherwise remains the same.

*Adequacy.* The central result is (directional) adequacy, stating that denotational approximation corresponds to refnement of observable behaviors:

**Theorem 9.** *If* <sup>J</sup>*M*<sup>K</sup> *<sup>⊆</sup>* <sup>J</sup>*N*K*, then for all program contexts* Ξ [*−*]*, every observable behavior of* Ξ [*M*] *is an observable behavior of* Ξ [*N*]*.*

In particular, <sup>J</sup>*M*<sup>K</sup> *<sup>⊆</sup>* <sup>J</sup>*N*<sup>K</sup> implies that *<sup>N</sup>* <sup>↠</sup> *<sup>M</sup>* is valid under RA, because the efect of applying it is unobservable.

Adequacy follows immediately from the above results. Indeed, using soundness, an observable behavior of Ξ [*M*] corresponds to a single-transition *τ ∈* <sup>J</sup>Ξ [*M*]K; by the assumption and compositionality *<sup>τ</sup> <sup>∈</sup>* <sup>J</sup>Ξ [*N*]K; and using the other direction, *τ* corresponds to an observable behavior of Ξ [*N*].

*Higher-order subtleties.* When applying the above results in the presence of higher order, one must pay attention to the *program* assumption. Indeed, suppose <sup>J</sup>*M*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*M′* <sup>K</sup>. Compositionality does not entail that <sup>J</sup>*λa. M*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*λa. M′* <sup>K</sup>. Indeed, a function *λa. M* is a value, i.e. it does not execute, and in particular it does not perform any efects, regardless of *<sup>M</sup>*. Accordingly, <sup>J</sup>*λa. M*<sup>K</sup> consists of closures of traces of the form *κ hµ, µi κ* ∴ *f*, where *f* is a function that returns sets of traces obtained from <sup>J</sup>*M*K. The fact that <sup>J</sup>*M*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*M′* <sup>K</sup> is not helpful, because traces in <sup>J</sup>*λa. M′* <sup>K</sup> have diferent returned values *<sup>f</sup> ′* from traces in <sup>J</sup>*λa. M*K.

Directional compositionality is still useful in the presence of abstractions. For example, if *<sup>M</sup>* is a program that returns a location, then from <sup>J</sup>*<sup>a</sup>* **:=** *<sup>v</sup>* **;** *<sup>a</sup>* **:=** *<sup>w</sup>*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*<sup>a</sup>* **:=** *<sup>w</sup>*<sup>K</sup> it follows that <sup>J</sup>(*λa. a* **:=** *<sup>v</sup>* **;** *<sup>a</sup>* **:=** *<sup>w</sup>*) *<sup>M</sup>*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>(*λa. a* **:=** *<sup>w</sup>*) *<sup>M</sup>*K.


**Fig. 10.** A selective list of supported non-structural transformations. Along with Symmetry, the denotational semantics supports all symmetric-monoidal laws with the binary operator (*∥*) and the unit *hi*. Similar transformations, replacing FAA with other RMWs, are supported too. The abstract rewrite rules used to validate a transformation is mentioned, if there is one.

To deal with the need to prove properties "pointwise" that abstractions bring about, such as containment of denotations in the proof of directional compositionality, we use logical relations. Moggi's toolkit provides a standard way to defne these, thereby lifting properties to their higher-order counterparts.

*Transformations exhibiting abstraction.* To the best of our knowledge, all transformations *N* ↠ *M* proven to be valid under RA in the existing literature are supported by our denotational semantics, i.e. <sup>J</sup>*N*<sup>K</sup> *<sup>⊇</sup>* <sup>J</sup>*M*K. Structural transformations are supported by virtue of using Moggi's standard semantics. Our semantics also validates "algebraic laws of parallel programming", such as sequencing *M ∥ N* ↠ *hM, Ni* and its generalization that Hoare and van Staden [22] recognized, (*M*<sup>1</sup> **;** *M*2) *∥* (*N*<sup>1</sup> **;** *N*2) ↠ (*M*<sup>1</sup> *∥ N*1) **;** (*M*<sup>2</sup> *∥ N*2), which in the functional setting can take the more expressive form in which the values returned are passed on to the following computation. See Figure 10 for a partial list.

Hence we claim that our adequate denotational semantics is suffciently abstract. This supports the case that Moggi's semantic toolkit can successfully scale to handle the intricacies of RA concurrency by adapting Brookes's traces.

# **5 Related Work and Concluding Remarks**

Our work follows the approach of Brookes [13] and its extension to higher-order functions using monads by Benton et al. [6]. Brookes developed a denotational semantics for shared memory concurrency under standard sequentially consistency [33], and established full abstraction w.r.t. a language that has a global atomic await instruction that locks the entire memory. The concepts behind this approach had been used in multiple related developments, e.g. [12, 34, 35, 46]. We hope that our work that targets RA will pave the way for similar continuations.

Jagadeesan et al. [25] adapted Brookes's semantics to the x86-TSO memory model [40]. They showed that for x86-TSO it suffces to include the fnal store bufer at the end of the trace and add two additional simple closure rules that emulate non-deterministic propagation of writes from store bufers to memory, and identify observably equivalent store bufers. The x86-TSO model, however, is much closer to sequential consistency than RA, which we study in this paper. In particular, unlike RA, x86-TSO is "multi-copy-atomic" (writes by one thread are made globally visible to *all* other threads at the same time) and successful RMW operations are immediately globally visible. Additionally, the parallel composition construct in Jagadeesan et al. [25] is rather strong: threads are forked and joined only when the store bufers are empty. Being non-multicopy-atomic, RA requires a more delicate notion of traces and closure rules, but it has more natural meta-theoretic properties, which one would expect from a programming language concurrency model: sequencing, a.k.a. thread-inlining, is unsound under x86-TSO [see 25, 31] but sound under RA (see Figure 10).

Burckhardt et al. [14] developed a denotational semantics for hardware weak memory models (including x86-TSO) following an alternative approach. They represent sequential code blocks by sequences of operations that the code performs, and close them under certain rewrite rules (reorderings and eliminations) that characterize the memory model. This approach does not validates important optimizations, such as Read-Read Elimination. Moreover, unlike x86-TSO, RA cannot be characterized by rewrite operations on SC traces [31].

Dodds et al. [19] developed a fully abstract denotational semantics for RA, extended with fences and non-atomic accesses. Their semantics is based on RA's *declarative* (a.k.a. axiomatic) formulation as acyclicity criteria on execution graphs. Roughly speaking, their denotation of code blocks (that they assume to be sequential) quantifes over all possible context execution graphs and calculates for each context the "happens-before" relation between context actions that is induced by the block. They further use a fnite approximation of these histories to atomically validate refnement in a model checker. While we target RA as well, there are two crucial diferences between our work and Dodds et al. [19]. First, we employ Brookes-style totally ordered traces and use interleaving-based operational presentation of RA. Second, and more importantly, we strive for a compositional semantics where denotations of compound programs are defned as functions of denotations of their constituents, which is not the case for Dodds et al. [19]. Their model can nonetheless validate transformations by checking them locally without access to the full program.

Others present non-compositional techniques and tools to check refnement under weak memory models between whole-thread sequential programs that apply for any concurrent context. Poetzl and Kroening [43] considered the SC-for-DRF model, using locks to avoid races. Their approach matches source to target by checking that they perform the same state transitions from lock to subsequent unlock operations and that the source does not allow more data-races. Morisset et al. [39] and Chakraborty and Vafeiadis [16] addressed this problem for the C/C++11 model, of which RA is a central fragment, by implementing matching algorithms between source and target that validate that all transformations between them have been independently proven to be safe under C/C++11.

Cho et al. [18] introduced a specialized semantics for *sequential* programs that can be used for justifying compiler optimizations under weak memory concurrency. They showed that behavior refnement under their sequential semantics implies refnement under any (sequential or parallel) context in the Promising Semantics 2.1 [17]. Their work focuses on optimizations of race-free accesses that are similar to C11's "non-atomics" [4, 32]. It cannot be used to establish the soundness of program transformations that we study in this paper. Adding non-atomics to our model is an important future work.

Denotational approaches were developed for models much weaker than RA [15, 24, 26, 28, 41] that allow the infamous Read-Write Reorder and thus, for a high-level programming language, require addressing the challenge of detecting semantic dependencies between instructions [3]. These approaches are based on summarizing multiple partial orders between actions that may arise when a given program is executed under some context. In contrast, we use totally ordered traces by relating to RA's interleaving operational semantics. In particular, Kavanagh and Brookes [28] use partial orders, Castellan, Paviotti et al. [15, 41] use event structures, and Jagadeesan et al., Jefrey et al. [24, 26] employ "Pomsets with Preconditions" which trades compositionality for supporting non-multicopy-atomicity, as in RA. These approaches do not validate certain access eliminations, nor Irrelevant Load Introduction, which our model validates.

An exciting aspect of our work is the connection between memory models to Moggi's monadic approach. For SC, Abadi and Plotkin, Dvir et al. [1, 20] have made an even stronger connection via algebraic theories [42]. These allow to modularly combine shared memory concurrency with other computational efects. Birkedal et al. [11] develop semantics for a type-and-efect system for SC memory which they use to enhance compiler optimizations based on assumptions on the context that come from the type system. We hope to the current work can serve as a basis to extend such accounts to weaker models.

**Acknowledgments.** Supported by the Israel Science Foundation (grant number 814/ 22) and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 851811); and by a Royal Society University Research Fellowship and Enhancement Award.

### 146 Y. Dvir, O. Kammar, and O. Lahav

# **References**


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Intel PMDK Transactions: Specifcation, Validation and Concurrency<sup>⋆</sup>

Azalea Raad<sup>1</sup> , Ori Lahav<sup>2</sup> , John Wickerson<sup>1</sup> , Piotr Balcer<sup>3</sup> and Brijesh Dongol4(B)

> 1 Imperial College London, London, UK <sup>2</sup> Tel Aviv University, Tel Aviv, Israel 3 Intel, Gdansk, Poland <sup>4</sup> University of Surrey, Guildford, UK

> > b.dongol@surrey.ac.uk

Abstract. Software Transactional Memory (STM) is an extensively studied paradigm that provides an easy-to-use mechanism for thread safety and concurrency control. With the recent advent of byte-addressable persistent memory, a natural question to ask is whether STM systems can be adapted to support failure atomicity. In this paper, we answer this question by showing how STM can be easily integrated with Intel's Persistent Memory Development Kit (PMDK) transactional library (which we refer to as txPMDK) to obtain STM systems that are both concurrent and persistent. We demonstrate this approach using known STM systems, TML and NOrec, which when combined with txPMDK result in persistent STM systems, referred to as PMDK-TML and PMDK-NORec, respectively. However, it turns out that existing correctness criteria are insufcient for specifying the behaviour of txPMDK and our concurrent extensions. We therefore develop a new correctness criterion, dynamic durable opacity, that extends the previously defned notion of durable opacity with dynamic memory allocation. We provide a model of txPMDK, then show that this model satisfes dynamic durable opacity. Moreover, dynamic durable opacity supports concurrent transactions, thus we also use it to show correctness of both PMDK-TML and PMDK-NORec.

# 1 Introduction

Persistent memory technologies (aka non-volatile memory, NVM) such as Memory-Semantic SSD [53] and XL-FLASH [13], combine the durability of hard drives with the fast and fne-grained accesses of DRAMs, with the potential to radically change how we build fault-tolerant systems. However, NVM also raises fundamental questions about semantics and the applicability of standard programming models.

<sup>⋆</sup> Raad is funded by a UKRI fellowship MR/V024299/1, EPSRC grant EP/X037029/1 and VeTSS. Lahav is supported by the Israel Science Foundation (grant 814/22) and by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 851811). Wickerson is funded by EPSRC grant EP/R006865/1. Dongol is funded by EPSRC grants EP/Y036425/1, EP/X037142/1, EP/X015149/1, EP/V038915/1, EP/R025134/2 and VeTSS.

```
1 struct loc {
2 pmem::obj::p<int> value;
3 pmem::obj::persistent_ptr<loc> next; };
4
5 struct root { pmem::obj::persistent_ptr<loc> head = nullptr; };
6
7 void post_crash(...) {
8 auto pop = pmem::obj::pool<root>::open("file",...);
9 auto root = pop.root();
10 pmem::obj::transaction::run(pop, [&]{
11 auto xvalue = root->head->value;
12 }); }
13
14 int main(...) {
15 auto pop = pmem::obj::pool<root>::open("file",...);
16 auto root = pop.root();
17 pmem::obj::transaction::run(pop, [&]{
18 auto x = pmem::obj::make_persistent<loc>();
19 x->value = 42;
20 x->next = nullptr;
21 root->head = x;
22 }); }
```
Fig. 1: C++ snippet for allocating in persistent memory using txPMDK [54]

Among the most widely used collections of libraries for persistent programming is Intel's Persistent Memory Development Kit (PMDK), which was frst released in 2015 [30]. One important component of PMDK is its transactional library, which we refer to as txPMDK, and which supports generic failure-atomic programming. A programmer can use txPMDK to protect against full system crashes by starting a transaction, performing transactional reads and writes, then committing the transaction. If a crash occurs during a transaction, but before the commit, then upon recovery, any writes performed by the transaction will be rolled back. If a crash occurs during the commit, the transaction will either be rolled back or be committed successfully, depending on how much of the commit operation has been executed. If a crash occurs after committing, the efect of the transaction is guaranteed to persist.

Most software transactional memory (STM) algorithms leave memory allocation implicit, since they are generally safe under standard allocation techniques (e.g. malloc). Memory that is allocated as part of a transaction can be deallocated if the transaction is aborted. However, in the context of persistency, memory allocation is more subtle since transactions may be interrupted by a crash.

For example, consider the program in Fig. 1. Persistent memory is allocated, accessed and maintained via memory pools [54] (fles that are memory mapped into the process address space) of a certain type (e.g. of type loc in Fig. 1). Due to address space layout randomization (ASLR) in most operating systems, the location of the pool can difer between executions and across crashes. As such, every pool has a root object from which all other objects in the pool can be

found. That is, to avoid memory leaks, all objects in the pool must be reachable from the root. An application locates the root object using a pool object pointer (POP) that is to be created with every program invocation (e.g. line 15). After locating the pool root (line 16), we use a txPMDK transaction (lines 17–22) to allocate a persistent loc object x (line 18) with value 42 (line 19) and add it to the pool (line 21).

Consider the scenario where the execution of this transaction crashes. After recovery from the crash, we then execute post\_crash (line 7). As before, we open the pool (line 8) and locate its root (line 9). We then use a txPMDK transaction to read from the loc object allocated and added at the pool head prior to the crash (line 11). There are then three cases to consider: the crash may have occurred (1) before the transaction started the commit process, (2) after the transaction successfully committed, or (3) while the transaction was in the process of committing.

In case (1), the execution of the two transactions can be depicted as follows, where the PBegin events capture commencing the transactions (lines 17 and 10), PAlloc(x) denotes the persistent allocation of x (line 18); PWrite(x->value,42) captures writing to x (line 19); and PRead(root->head):x denotes reading from x->value and returning the value x (frst part of line 11). As the frst transaction never reached the commit stage, its efects (i.e. allocating x and writing to it) should be invisible (i.e. rolled back), and thus the read of the second transaction efectively reads from unallocated memory, leading to an error such as a segmentation fault.

In case (2), the execution of the transactions is as follows, where the PCommit events capture the end (successful commit) of the transactions (lines 22 and 12), the efects of the frst transaction fully persist upon successful commit, and thus the read in the second transaction does not fault.

$$\begin{array}{c} \text{PBegin PA11cc} \{ \text{x} \} \\ \hline + \xrightarrow{\text{PRegin PA11cc} \{ \text{x} \}} \text{\\ + \xrightarrow{\text{PCommit}} + \xrightarrow{\text{Person}} + \xrightarrow{\text{PRegin A10.5} \{ \text{read} \} : \text{x} \quad \text{(x->1ue)} : \text{42. PCommit} \\ \end{array} \\ \begin{array}{c} \text{PRead} \\ \text{PRegin A20.5} \{ \text{read} \} : \text{x} \quad \text{(x->1ue)} : \text{42. PCommit} \\ \text{(x->1ue)} : \text{42. Person} \\ \text{(x->1ue)} : \text{62. Person} \\ \end{array}$$

Finally, in case (3), either of the two behaviours depicted above is possible (i.e. the second transaction may either cause a segmentation fault or read from x).

Efcient and correct memory allocation in a persistent memory setting is challenging ([54, Chapter 16] and [55]). In addition to the ASLR issue mentioned above, the allocator must guarantee failure atomicity of heap operations on several internal data structures managed by PMDK. Therefore, PMDK provides its own allocator that is designed specifcally to work with txPMDK.

We identify two key drawbacks of txPMDK as follows. In this paper, we take steps towards addressing both of these drawbacks.

A) Lack of concurrency support. Unlike existing STM systems in the persistent setting [39,32] that provide both failure atomicity (ensuring that a transaction

either commits fully or not at all in case of a crash) and isolation (as defned by ACID properties, ensuring that the efects of incomplete transactions are invisible to concurrently executing transactions), txPMDK only provides failure atomicity and does not ofer isolation in concurrent settings. In particular, naïvely implemented applications with racy PMDK transactions lead to memory inconsistencies. This is against the spirit of STM: the primary function of STM systems is providing a concurrency control mechanism that ensures isolation. The current txPMDK implementation provides two solutions: threads either execute concurrent transactions over disjoint parts of the memory [54, Chapter 7], or use user-defned fne-grained locks within a transaction to ensure memory isolation [54, Chapter 14]. However, both solutions are sub-optimal: the former enforces serial execution when transactions operate over the same part of the memory, and the latter expects too much of the user.

B) Lack of a suitable correctness criterion. There is no formal specifcation describing the desired behaviour of txPMDK, and hence no rigorous description or correctness proof of its implementation. This undermines the utility of txPMDK in safety-critical settings and makes it impossible to develop formally verifed applications that use txPMDK. Indeed, there is currently no correctness criterion for STM systems that provide dynamic memory allocation (a large category that includes all realistic implementations).

# 1.1 Concurrency for txPMDK

Integrating concurrency with PMDK transactions is an important end goal for PMDK developers. The existing approach requires integration of locks with txPMDK, which introduces overhead for programmers. Our paper shows that STM and PMDK can be easily combined, improving programmability. Many other works have aimed to develop failure-atomic and concurrent transactions (e.g. OneFile [52] and Romulus [16]), but none use of-the-shelf commercially available libraries. Moreover, these other works have not addressed correctness with the level of rigour that our paper does. In other work, popular key-value stores Memcached and Redis have been ported to use PMDK [36,37]; our work paves the way for concurrent version of these applications to be developed. Another example is the work of Chajed et al [11], who provide a simulation-based technique for verifying refnement of durable flesystems, where concurrency is handled by durable transactions.

We tackle the frst drawback (A) mentioned above by developing, specifying, and validating two thread-safe versions of txPMDK.

Contribution A: Making txPMDK thread-safe. We combine txPMDK with two of-the-shelf (thread-safe) STM systems, TML [17] and NOrec [18], to obtain two new implementations, PMDK-TML and PMDK-NORec, that support concurrent failure-atomic transactions with dynamic memory allocation. In particular, we reuse the existing concurrency control mechanisms provided by these STM systems to ensure atomicity of write-backs, thus obtaining memory isolation even in a multi-threaded setting. We show that it is possible to integrate

Fig. 2: The contributions of this paper and their relationships to prior work

these mechanisms with txPMDK to additionally achieve failure atomicity. Our approach is modular, with a clear separation of concerns between the isolation required due to concurrency and the atomicity required due to the possibility of system crashes. This shows that concurrency and failure atomicity are two orthogonal concerns, highlighting a pathway towards a mix-and-match approach to combining (concurrent) STM and failure-atomic transactions. Finally, in order to provide the same interface as PMDK, we extend both TML and NOrec with an explicit operation for memory allocation.

# 1.2 Specifcation and Validation

To tackle drawback (B) above, we make four contributions. Together, they provide the frst formal (and rigorous) specifcation of txPMDK and validation of its implementation.

Contribution B1: A model of txPMDK. We provide a formal specifcation of txPMDK as an abstract transition system. Our formal specifcation models almost all key components of txPMDK (including its redo and undo logs, as well as the interaction of these components with system crashes), with the exception of memory deallocation within txPMDK transactions.

Contribution B2: A correctness criterion for transactions with dynamic allocation. Although the literature includes several correctness criterion for transactional memory (TM), none can adequately capture txPMDK in that they do not account for dynamic memory allocation. We develop a new correctness condition, dynamic durable opacity (denoted ddOpacity), by extending durable opacity [6] to account for dynamic allocation. ddOpacity supports not only sequential transactions such as txPMDK, but also concurrent ones. To demonstrate the suitability of ddOpacity for concurrent and persistent (durable) transactions, later we validate our two concurrent txPMDK implementations (PMDK-NORec and PMDK-TML) against ddOpacity.

Contribution B3: An operational characterisation of our correctness criterion. Our aim is to show that txPMDK conforms to ddOpacity, or more precisely, that our model of txPMDK refnes our model of ddOpacity. To demonstrate this, we use a new intermediate model called ddTMS. While ddOpacity is defned declaratively, ddTMS is defned operationally, which makes it conceptually closer to our model of the txPMDK implementation. We prove that ddTMS is a sound model of ddOpacity (i.e. every trace of ddTMS satisfes ddOpacity).

Contribution B4: Validation of txPMDK, PMDK-TML and PMDK-NORec in FDR4. We mechanise our implementations (txPMDK, PMDK-TML and PMDK-NORec) and specifcation (ddTMS) using the CSP modelling language. We use the FDR4 model checker [26] to show the implementations are refnements of ddTMS over both the persistent SC (PSC) [31] and persistent TSO (Px86sim) [50] memory models. For Px86sim, we use an equivalent formulation called PTSOsyn developed by Khyzha and Lahav [31]. The proof itself is fully automatic, requiring no user input outside of the encodings of the models themselves. Additionally, we develop a sequential lower bound (ddTMS-Seq), derived from ddTMS, and show that this lower bound refnes txPMDK (and hence that txPMDK is not vacuously strong). Our approach is based on an earlier technique for proving durable opacity [23], but incorporates much more sophisticated examples and memory models.

Outline. Fig. 2 gives an overview of the diferent components that we have developed in this paper and their relationships to each other and to prior work. We structure our paper by presenting the components of Fig. 2 roughly from the bottom up. In §2, we present the abstract txPMDK model, and in §3 we describe its integration with STM to provide concurrency support via PMDK-TML and PMDK-NORec. In §4 we present ddOpacity, in §5 we present ddTMS, and in §6 we describe our FDR4 encodings and bounded proofs of refnement.

Additional Material. We provide our FDR4 development as supplementary material [47]. The proofs of all theorems are given in an extended version [46].

# 2 Intel PMDK transactions

We describe the abstract interface txPMDK provides to clients (§2.1), our assumptions about the memory model over which txPMDK is run (§2.2) and the operations of txPMDK (§2.3). We present our PMDK abstraction in §2.3.

# 2.1 PMDK Interface

PMDK provides an extensive suite of libraries for simplifying persistent programming. The PMDK transactional library (txPMDK) has been designed to support failure-atomicity by providing operations for tracking memory locations that are to be made persistent, as well allocating and accessing (reading and writing) persistent memory within an atomic block.

In Fig. 3 we present an example client code that uses txPMDK. The code (due to [54, p. 131]) implements the push operation for a persistent linked-list queue.

```
1 struct queue_node {
2 pmem::obj::p<int> value;
3 pmem::obj::persistent_ptr<queue_node> next; };
4
5 struct queue { private:
6 pmem::obj::persistent_ptr<queue_node> head = nullptr;
7 pmem::obj::persistent_ptr<queue_node> tail = nullptr; };
8
9 void push(pmem::obj::pool_base &pmem_op, int value) {
10 pmem::obj::transaction::run(pmem_op, [&]{
11 auto node = pmem::obj::make_persistent<queue_node>();
12 node->value = value;
13 node->next = nullptr;
14 if (head == nullptr) {
15 head = tail = node;
16 } else {
17 tail->next = node;
18 tail = node; }
19 }); }
```
Fig. 3: C++ persistent push operation using txPMDK ([54, p. 131])

The implementation wraps a typical (non-persistent) push operation within a transaction using a C++ lambda [&] expression (line 10). The transaction is invoked using transaction::run, which operates over the memory pool pmem\_op. The node structure (lines 2 and 3), the queue structure (lines 6 and 7), and any new node declaration (line 11) are to be tracked by a PMDK transaction. Additionally, the push operation takes as input the persistent memory object pool, pmem\_op, which is a memory pool on which the transaction is to be executed. This argument is needed because the application memory may map fles from diferent fle systems. On line 7 we use make\_persistent to perform a transactional allocation on persistent memory that is linked to the object pool pmem\_op (see [54] for details). The remainder of the operation (lines 12–18) corresponds to an implementation of a standard push operation with (transactional) reads and writes on the indicated locations. At line 19, the C++ lambda and the transaction is closed, signalling that the transaction should be committed.

If the system crashes while push is executing, but before line 19 is executed, then upon recovery, the entire push operation will be rolled back so that the efect of the incomplete operation is not observed, and the queue remains a valid linked list. After line 19, the corresponding transaction executes a commit operation. If the system crashes during commit, depending on how much of the commit operation has been executed, the push operation will either be rolled back, or committed successfully. Note that roll-back in all cases ensures that the allocation at line 11 is undone.

#### 2.2 Memory Models

We consider the execution of our implementations over two diferent memory models: PSC and PTSOsyn [31]. Both models include a flush x instruction to persist the contents of the given location x to memory. PTSOsyn aims for fdelity to the Intel x86 architecture. In a race-free setting (as is the case for single-threaded txPMDK transactions) it is sound to use the simpler PSC model, though we conduct all of our experiments in both models.

PSC is a simple model that considers persistency efects and their interaction with sequential consistency. Writes are propagated directly to per-location persistence bufers, and are subsequently fushed to non-volatile memory, either due to a system action, or the execution of a flush instruction. A read from x frst attempts to fetch its value from the persistence bufer and if this fails, fetches its value from non-volatile memory.

Under Intel-x86, the memory models are further complicated by the interaction between total store ordering (TSO) efects [40] and persistency. Due to the abstract nature of our models (see Fig. 4) it is sufcient for us to focus on the simpler Px86sim model [50] since we do not use any of the advanced features [48,49,50]. We introduce a further simplifcation via PTSOsyn that is observationally equivalent to Px86sim [31]. Unlike Px86sim, which uses a single (global) persistence bufer, PTSOsyn uses per-location bufers simplifying the resulting FDR4 models (§6).

In PTSOsyn, writes are propagated from the store bufer in FIFO order to a per-location FIFO persistency bufer. Writes in the persistency bufer are later persisted to the non-volatile memory. A read from location x frst attempts to fetch the latest write to x from the store bufer. If this fails (i.e. no writes to x exists in the store bufer), it attempts to fetch the latest write from the persistence bufer of x, and if this fails, it fetches the value of x from non-volatile memory.

# 2.3 PMDK Implementation

We present the pseudo-code of our txPMDK abstraction in Fig. 4. We model all features of txPMDK (including its redo and and undo logs as well as its recovery mechanism in case of a crash) except memory deallocation within a txPMDK transaction. We use mem to model the memory, mapping each location (in loc) to a value-metadata pair. We model a value (in val) as an integers, and metadata as a boolean indicating whether the location is allocated. As we see below, the list of free (unallocated) locations, freeList, is calculated during recovery using metadata.

Each PMDK transaction maintains redo logs and an undo log. The redo logs record the locations allocated by the transaction so that if a crash occurs while committing, the allocated locations can be reallocated, allowing the transaction to commit upon recovery. Specifcally, txPMDK uses two distinct redo logs: tRedo and pRedo. Both are associated with felds undoValid (which is unset when the log is invalidated), checksum (used to indicate whether the log is valid), and allocs (which contains the set of locations allocated by the transaction). Note that txPMDK explicitly sets and unsets undoValid, whereas checksum is calculated (e.g. at line 36) and may be invalidated by crashes corrupting a partially completed write. The undo log records the original (overwritten) value of each location written to by the transaction, and is consulted if the transaction is to be rolled back. We model it as a map from locations to values (of type int).

```
158 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol
```

```
1 // Each location is persistent; there is no explicitly volatile memory.
2 mem : loc -> {
3 val : int; // the contents of this location
4 metadata : bool; } // false = not allocated, true = allocated
5 freeList : loc list // transient list of free locations
6
7 // Redo logs -- tRedo is transient; pRedo is persistent.
8 tRedot, pRedot : {undoValid:bool; checksum:int; allocs:loc set;}
9 undot : loc -> int // undo log recording the original val of each loc
10 undoValid : bool // undoValid global flag, initially true
```

```
11 PBegint ≜
12 tRedot := (true, -1, {})
13 pRedot := (true, -1, {})
14 undot := {}
15 undoValidt := true
16
17 PAlloct ≜
18 xt := freeList.take
19 tRedot.allocs :=
20 tRedot.allocs ∪ {xt}
21 return xt
22
23 PReadt(x) ≜
24 return mem[x].val
25
26 PWritet(x,v) ≜
27 if x ∈/ dom(undot) then
28 wt := mem[x].val
29 undot := undot ∪ {x 7→ wt}
30 flush undot
31 mem[x].val := v
32
33 PCommitt ≜
34 persist_writest
35 tRedot.undoValid := false
36 tRedot.checksum :=
          calc_checksum(tRedot)
37 pRedot := tRedot
38 flush pRedot
39 apply_pRedot
40 pRedot.checksum := -1
41 flush pRedot.checksum
                                 42 apply_pRedot ≜
                                 43 foreach x ∈ pRedot.allocs:
                                 44 mem[x].metadata := true
                                 45 flush mem[x].metadata
                                 46 if ¬pRedot.undoValid then
                                 47 undoValidt := false
                                 48 flush undoValidt
                                 49
                                 50 persist_writest ≜
                                 51 foreach x ∈ dom(undot): flush x
                                 52
                                 53 roll_backt ≜
                                 54 foreach (x 7→ v) ∈ undot:
                                 55 mem[x].val := v
                                 56 persist_writest
                                 57
                                 58 PAbortt ≜
                                 59 roll_backt
                                 60 undoValidt := false
                                 61 flush undoValidt
                                 62 foreach x ∈ tRedot.allocs:
                                 63 freeList.add(x)
                                 64
                                 65 PRecoveryt ≜
                                 66 if calc_checksum(pRedot)
                                 67 = pRedot.checksum
                                 68 then apply_pRedot
                                 69 if undoValidt then
                                 70 roll_backt
                                 71 foreach x ∈ dom(mem):
                                 72 if ¬mem[x].metadata then
                                 73 freeList.add(x)
```
Fig. 4: PMDK global variables and pseudo-code

A separate variable undoValid (distinct from undoValid in tRedo and pRedo) is used to determine whether this undo log is valid.

Each component in Fig. 4 have both a volatile and persistent copy, although some components, e.g. tRedo and freeList, are transient, i.e. their persistent

versions are never used. Likewise, the persistent redo log, pRedo, is only used in a persistent fashion and its volatile copy is never used.

We now describe the operations in Fig. 4. We assume the operations are executed by a transaction with id t. This id is not useful in the sequential setting in which txPMDK is used; however, in our concurrent extension (§3) the transaction id is critical.

PBegin. The begin operation simply sets all local variables to their initial values.

PAlloc. Allocation chooses and removes a free location, say x, from the free list, adds x to the transient redo log (line 20) and returns x. Removing x from freeList ensures it is not allocated twice, while the transient redo log is used together with the persistent redo log to ensure allocated locations are properly reallocated upon a system crash.

When the transaction commits, the transient redo log is copied to the persistent one (line 37), and the efect of the persistent log is applied at line 39 via apply\_pRedo. (Note that apply\_pRedo is also called by PRecovery on line 68.) The behaviour of this call depends on how much of the in-fight transaction was executed before the crash leading to the recovery. If a crash occurred after the transaction executed (line 37) and the corresponding write persisted (either due to a system fush or the execution of line 38), then executing apply\_pRedo via PRecovery has the same efect as the executing line 39, i.e. the efect of the redo log will be applied. This (persistently) sets the metadata feld of each location in the redo log to indicate that it is allocated (lines 43–45), and then invalidates the undo log (lines 46–48) so that the transaction is not rolled back.

PRead. A read from x simply returns its in-memory value (line 24). Note that location x may not be allocated; txPMDK delegates the responsibility of checking whether it is allocated to the client.

PWrite. A write to x frst checks (line 27) if the current transaction has already written to x (via a previously executed PWrite). If not, it logs the current value by reading the in-memory value of x (line 28) and records it in the undo log (line 29). The updated undo log is then made persistent (line 30). Once the current value of x is backed up in the undo log (either by the current write or by the previous write to x), the value of x in memory is updated to the new value v (line 31). As with the read, location x may not have been allocated; txPMDK delegates this check to the client.

PCommit. The main idea behind the commit operation is to ensure all writes are persisted, and that the persistent redo and undo logs are cleared in the correct order, as follows. (1) On line 34 all writes written by the transaction are persisted. (2) Next, the transient redo log is invalidated (line 35) and the checksum for the log is calculated (line 36). This updated transient log is then set to be the persistent redo log (line 37), which is then made persistent (line 38). Note that after executing line 38, we can be assured that the transaction has committed; if a crash occurs after this point, the recovery will redo and persist the allocation and the undo log will be cleared. (3) The operation then calls apply\_pRedo at line 39, which makes the allocation persistent and clears the undo log. (4) Finally,

at line 40, the pRedo checksum is invalidated since apply\_pRedo has already been executed. If a crash occurs after line 40 has been executed, then the recovery checks at line 67 and line 69 will fail, i.e. recovery will calculate the free list.

PAbort. A PMDK transaction is aborted by a PRead/PWrite that attempts to access (read/write) an unallocated location. When a transaction is aborted, all of its observable efects must be rolled back. First, the memory efects are rolled back (line 59), then the undo log is invalidated (line 60) and made persistent (line 61), preventing undo from being replayed in case a crash occurs. Finally, all of the locations allocated by the executing transaction are freed (lines 62–63).

Note that if a crash occurs during an abort, the efect of the abort will be replayed. PRecovery reconstructs the free list at lines 71–73, which efectively replays the loop at lines 62–63 of PAbort. Additionally, if a crash occurs before the write at line 60 has persisted, then the efect of undoing the operation will be explicitly replayed by the roll-back executed by PRecovery since undoValid holds. If the crash occurs after the write at line 60 has persisted, then no roll-back is necessary.

PRecovery. The recovery operation is executed immediately after a crash, and before any other operation is executed. The recovery proceeds in three phases: (1) The checksum of the persistent redo log is recalculated (line 67) and if it matches the stored checksum (pRedo.checksum) the apply\_pRedo operation is executed. As discussed, apply\_pRedo sets and persists the metadata of each location in the redo log, and then invalidates the undo log. (2) The transaction is rolled back if apply\_pRedo in step 1 fails; otherwise, no roll-back is performed. (3) The free list is reconstructed by inserting each location whose metadata is set to false into freeList (lines 71–73).

Correctness and Thread Safety. As discussed in §2.1, txPMDK is designed to be failure-atomic. This means that correctness criteria such as opacity [27,2] and TMS1/TMS2 [20] (restricted to sequential transactions) are inadequate since they do not accommodate crashes and recovery. This points to conditions such as durable opacity [6], which extends opacity with a persistency model. However, durable opacity (restricted to sequential transactions) is also insufcient since it does not defne correctness of allocations and assumes totally ordered histories. In §4 we develop a generalisation of durable opacity, called dynamic durable opacity (ddOpacity) that addresses both of these issues. As with durable opacity, ddOpacity defnes correctness for concurrent transactions. We develop concurrent extensions of PMDK transactions in §3, which we show to be correct against (i.e. refnements of) ddOpacity.

As discussed, PMDK transactions are not thread-safe; e.g. concurrent calls to PRead and PWrite on the same location create a data race causing PRead to return an undefned value (see the example in §1). We discuss techniques for mitigating against such races in §3. Nevertheless, some PMDK transactional operations are naturally thread-safe. In particular, PAlloc is designed to be thread-safe via an built-in arena mechanism: a memory pool split into disjoint arenas with each thread allocating from its own arena. Moreover, each thread uses locks for each arena to publish allocated memory to the shared pool [55].

Init. glb = 0 Intel PMDK Transactions: Specification, Validation and Concurrency 161

```
1 TxBegint ≜
2 do loct := glb
3 until even(loct)
4 PBegint
5
6 TxAlloct ≜
7 return PAlloct
8
9 TxWritet(x, v) ≜
10 if even(loct) then
11 if ¬cas(glb, loct, loct+1)
12 then PAbortt; return abort
13 else loct++
14 PWritet(x,v)
                                  15 TxReadt(x) ≜
                                  16 vt := PReadt(x)
                                  17 if even(loct) then
                                  18 if glb = loct then
                                  19 return vt
                                  20 else PAbortt; return abort
                                  21 else return vt
                                  22
                                  23 TxCommitt ≜
                                  24 PCommitt
                                  25 if odd(loct) then
                                  26 glb := loct+1
                                  27
                                  28 Recovery ≜
                                  29 foreach t ∈ TXId:
                                  30 PRecoveryt
                                  31 glb := 0
```
Fig. 5: Pseudo-code for PMDK-TML with our additions made w.r.t. TML highlighted red

# 3 Making PMDK Transactions Concurrent

We develop two algorithms that combine two existing STM systems with PMDK. The frst algorithm (§3) is based on TML [17], which uses pessimistic concurrency control via an eager write-back scheme. Writing transactions efectively take a lock and perform the writes in place. The second algorithm (§3) is based on NOrec [18], which utilises optimistic concurrency control via a lazy write-back scheme. In particular, transactional writes are collected in a local write set and written back when the transaction commits.

It turns out that PMDK can be incorporated within both algorithms straightforwardly. This is a strength of our approach and points towards a generic technique for extending existing STM systems with failure atomicity. Given the challenges of persistent allocation, we reuse PMDK's allocation mechanisms to provide an explicit allocation mechanism in both our extensions [54].

PMDK-TML. We present the pseudo-code for PMDK-TML (combining TML and txPMDK) in Fig. 5, where we highlight the calls to txPMDK operations. These calls are the only changes we have made to the TML algorithm. TML is based on a single global counter, glb, whose value is read and stored within a local variable loc<sup>t</sup> when transaction t begins (TxBegin). There is an in-fight writing transaction if glb is odd. TML is designed for read-heavy workloads, and thus allows multiple concurrent read-only transactions. A writing transaction causes all other concurrent transactions to abort.

PMDK-TML proposes a modular combination of PMDK with the TML algorithm by nesting a PMDK transaction inside a TML transaction; i.e. each transaction additionally starts a PMDK transaction. All reads and writes to memory are replaced by txPMDK read and write operations. Moreover, when a transaction aborts or commits, the operation calls a txPMDK abort or commit, respectively. Finally, PMDK-TML includes allocation and recovery operations, which call txPMDK allocation and recovery, respectively. The recovery operation additionally sets glb to 0.

A read-only transaction t may call PRead<sup>t</sup> at line 16 when another transaction t ′ is executing PWrite<sup>t</sup> ′ at line 14 on the same location. Since txPMDK does not guarantee thread safety for these calls, the value returned by PRead<sup>t</sup> should not be passed back to the client. This is indeed what occurs. First, note that if transaction t is read-only, then loc<sup>t</sup> is even. Moreover, a read-only transaction only returns the value returned by PRead<sup>t</sup> (line 19) if no other transaction has acquired the lock since t executed TxBegin<sup>t</sup> . In the scenario described above, t ′ must have incremented glb by successfully executing the CAS at line 11 as part of the frst write operation executed by t ′ , changing the value of glb. This means that t would abort since the test at line 18 would fail.

PMDK-NORec. We present PMDK-NORec (combining NOrec and PMDK) in Fig. 6, where we highlight the calls to txPMDK. These calls are the only changes we have made to the NOrec algorithm. As with TML, NOrec is based on a single global counter, glb, whose value is read and stored within a transaction-local variable loc when a transaction begins (TxBegin). There is an in-fight writing transaction if glb is odd. Unlike TML, NOrec performs lazy write-back, and hence utilises transaction-local read and write sets. A transaction only performs the write-back at commit time once it "acquires" the glb lock. Prior to write-back and read response, it ensures that the read sets are consistent using a per-location validate operation. We eschew details of the NOrec synchronisation mechanisms and refer the interested reader to the original paper [18].

The transformation from txPMDK to PMDK-NORec is similar to PMDK-TML. We ensure that a PMDK transaction is started when a PMDK-NORec transaction begins, and that this PMDK transaction is either aborted or committed before the PMDK-NORec transaction completes. We introduce TxAlloc and Recovery operations that are identical to PMDK-TML, and replace all calls to read and write from memory by PRead and PWrite operations, respectively.

As with PMDK-TML, a PRead executed by a transaction (at line 12, line 15 or line 31) may race with a PWrite (at line 43) executed by another transaction. However, since PWrite operations are only executed after a transaction takes the glb lock (at line 40), any transaction with a racy PRead is revalidated. If validation fails, the associated transaction is aborted.

# 4 A Declarative Correctness Criteria

We present a declarative correctness criteria for TM implementations. Unlike prior defnitions such as (durable) opacity, TMS1/2 etc. that are defned in terms of histories of invocations and responses, we defne dynamic durable opacity (ddOpacity) in terms of execution graphs, as is standard model for weak memory setting. Our models are inspired by prior work on declarative specifcations for Init: glb = 0 Intel PMDK Transactions: Specification, Validation and Concurrency 163

```
1 TxBegint ≜
2 do loct := glb
3 until even(loct)
4 PBegint
5
6 TxAlloct ≜
7 return PAlloct
8
9 TxReadt(x) ≜
10 if x ∈ dom(wrSett) then
11 return wrSett(x)
12 vt := PReadt(x)
13 while loct ̸= glb
14 loct := Validate
15 vt := PReadt(x)
16 rdSett := rdSett ∪ {x 7→ vt}
17 return vt
18
19 Recovery ≜
20 foreach t ∈ TXId:
21 PRecoveryt
22 glb := 0
                                23 TxWritet(x,v) ≜
                                24 wrSett := wrSett ∪ {x 7→ v}
                                25
                                26 Validatet ≜
                                27 while true
                                28 timet := glb
                                29 if odd(timet) then goto 28
                                30 foreach x 7→ v ∈ rdSett:
                                31 if PReadt(x) ̸= v
                                32 then PAbortt; return abort
                                33 if timet = glb
                                34 then return timet
                                35
                                36 TxCommitt ≜
                                37 if wrSett.isEmpty
                                38 then PCommitt
                                39 return
                                40 while ¬cas(glb, loct, loct + 1)
                                41 loct := Validatet
                                42 foreach x 7→ v ∈ wrSett:
                                43 PWritet(x, v)
                                44 PCommitt
                                45 glb := loct + 2
                                46 return
```
Fig. 6: Pseudo-code for PMDK-NORec, with our additions made w.r.t. NOrec highlighted red

transactional memory, which focussed on specifying relaxed transactions [22,14]. However, these prior works do not describe crashes or allocation.

Executions and Events. The traces of memory accesses generated by a program are commonly represented as a set of executions, where each execution G is a graph comprising: 1. a set of events (graph nodes); and 2. a number of relations on events (graph edges). Each event e corresponds to the execution of either a transactional event (e.g. marking the beginning of a transaction) or a memory access (read/write) within a transaction.

Defnition 1 (Events). An event is a tuple a = ⟨n, τ, t, l⟩, where n ∈ N is an event identifer, τ ∈ TId is a thread identifer, t ∈ TXId is a transaction identifer and l ∈ Lab is an event label.

A label may be B to mark the beginning of a transaction; A to denote a transactional abort; (M, x, 0) to denote a memory allocation yielding x initialised with value 0; (R, x, v) to denote reading value v from location x; (W, x, v) to denote writing v to x; C to mark the beginning of the transactional commit process; or S to denote a successful commit.

The functions tid, tx and lab respectively project the thread identifer, transaction identifer and the label of an event. The functions loc, val<sup>r</sup> and val<sup>w</sup>

164 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol

respectively project the location, the read value and the written value of a label, where applicable, and are lifted to events by defning e.g. loc(a) = loc(lab(a)).

Notation. Given a relation r and a set A, we write r ? , r <sup>+</sup> and r ∗ for the refexive, transitive and refexive-transitive closures of r, respectively. We write r −1 for the inverse of r; r|<sup>A</sup> for r∩(A×A); [A] for the identity relation on A, i.e. {(a, a) | a∈ A}; irrefexive(r) for ∄a.(a, a)∈r; and acyclic(r) for irrefexive(r <sup>+</sup>). We write r1;r<sup>2</sup> for the relational composition of r<sup>1</sup> and r2, i.e. {(a, b) | ∃c.(a, c)∈r1∧(c, b)∈ r2}. When A is a set of events, we write A<sup>x</sup> for {a∈A | loc(a)=x}, and r<sup>x</sup> for r|<sup>A</sup><sup>x</sup> . Analogously, we write A<sup>t</sup> for {a∈A | tx(a)=t}. The 'same-transaction' relation, st ⊆ E × E, is the equivalence relation st ≜ (a, b) ∈ E × E tx(a)=tx(b) .

Defnition 2. An execution, G ∈ Exec, is a tuple (E, po, clo,rf, mo), where:


Given a relation r ⊆ E × E, we write r<sup>T</sup> for lifting r to transaction classes: r<sup>T</sup> ≜ st; (r \ st);st. For instance, when (w, r) ∈ rf, w is a transaction t<sup>1</sup> event and r is a transaction t<sup>2</sup> event, then all events in t<sup>1</sup> are rfT-related to all events in t2. We write r<sup>I</sup> to restrict r to its intra-transactional edges (within a transaction): r<sup>I</sup> ≜ r ∩ st; and write r<sup>E</sup> to restrict r to its extra-transactional edges (outside a transaction): r<sup>E</sup> ≜ r \ st. Analogously, we write r<sup>i</sup> to restrict r to its intrathread edges: r<sup>i</sup> ≜ (a, b) ∈ r tid(a)=tid(b) ; and write r<sup>e</sup> to restrict r to its extra-thread edges: r<sup>e</sup> ≜r \ r<sup>i</sup> .

In the context of an execution G (we use the "G." prefx to make this explicit), the reads-before relation is rb ≜ (rf<sup>−</sup><sup>1</sup> ; mo).

Lastly, we write Commit for the events of committing transactions, i.e. those that have reached the commit stage: Commit ≜ dom(st; [C ]). We defne the sets of aborted events, Abort, and (commit)-successful events, Succ, analogously. We defne the set of commit-pending events as CPend ≜ Commit \ (Abort∪Succ), and the set of pending events as Pend ≜ E \ (CPend ∪ Abort ∪ Succ).

Given an execution G=(E, po, clo,rf, mo), we write G|<sup>A</sup> for (E ∩ A, po|E∩A, clo|E∩A,rf|E∩A, mo|E∩A). We further impose certain "well-formedness" conditions on executions, used to delimit transactions and restrict allocations. For example, we require that events of the same transaction are by the same thread and the

each t contains exactly one begin event. In particular, these conditions ensure that in the context of a well-formed execution G we have 1. G.Succ ⊆ G.Commit; 2. each t contains at most a single abort or success (|G.E<sup>t</sup> ∩ (A ∪ S)| ≤ 1) and thus G.(Succ∩Abort)=∅; and 3. G.E = G.(Pend⊎Abort⊎CPend⊎Succ), i.e. the sets G.Pend, G.Abort, G.CPend and G.Succ are pair-wise disjoint.

Execution Consistency. The defnition of (well-formed) executions above puts very few constraints on the rf and mo relations. Such restrictions and thus the permitted behaviours of a transactional program are determined by defning the set of consistent executions, defned separately for each transactional consistency model. The existing literature includes several defnitions of wellknown consistency models, including serialisability (SER) [41], snapshot isolation (SI) [9,44] and parallel snapshot isolation (PSI) [10,43].

Serialisability (SER). The serialisability (SER) consistency model [41] is one of the most well-known transactional consistency models, as it provides strong guarantees that are intuitive to understand and reason about. Specifcally, under SER, all concurrent transactions must appear to execute atomically one after another in a total sequential order. The existing declarative defnitions of SER [9,10,50] are somewhat restrictive in that they only account for fully committed (complete) transactions, i.e. they do not support pending or aborted transactions. Under the assumption that all transactions are complete, an execution (E, po, clo,rf, mo) is deemed to be serialisable (i.e. SER-consistent) if:

$$-\operatorname{rf}\_{\mathsf{I}} \cup \mathsf{mo}\_{\mathsf{I}} \cup \mathsf{rb}\_{\mathsf{I}} \subseteq \mathsf{po} \tag{\text{SER-INT}} \\ -\mathsf{clo} \cup \operatorname{rf}\_{\mathsf{T}} \cup \mathsf{mo}\_{\mathsf{T}} \cup \mathsf{rb}\_{\mathsf{T}} \text{ is acyclic} \tag{\text{SER-EXT}}$$

The ser-int axiom enforces intra-transactional consistency, ensuring that e.g. a transaction observes its own writes by requiring rf<sup>I</sup> ⊆ po (i.e. intra-transactional reads respect the program order). Analogously, the ser-ext axiom guarantees extra-transactional consistency, ensuring the existence of a total sequential order in which all concurrent transactions appear to execute atomically one after another. This total order is obtained by an arbitrary extension of the (partial) 'happensbefore' relation which captures synchronisation resulting from transactional orderings imposed by client order (clo) or confict between transactions (rf<sup>T</sup> ∪ mo<sup>T</sup> ∪ rbT). Two transactions are conficted if they both access (read or write) the same location x, and at least one of these accesses is a write. As such, the inclusion of rf<sup>T</sup> ∪ mo<sup>T</sup> ∪ rb<sup>T</sup> enforces confict-freedom of serialisable transactions. For instance, if transactions t<sup>1</sup> and t<sup>2</sup> both write to x via events w<sup>1</sup> and w<sup>2</sup> such that (w1, w2) ∈ mo, then t<sup>1</sup> must commit before t2, and thus the entire efect of t<sup>1</sup> must be visible to t2.

Opacity. We do not stipulate that all transactions commit successfully and allow for both aborted and pending transactions. As such, we opt for the stronger notion of transactional correctness known as opacity. In what follows we describe our notion of opacity over executions (formalised in Def. 3), and later relate it to the existing notion of opacity over histories [27] and prove that our characterisation of opacity is equivalent to that of the existing one (see Thm. 1). Further intuitions are provided in the extended version of this paper [46].

Defnition 3 (Opacity). An execution G = (E, po, clo,rf, mo) is opaque if:

– dom(rfT) ⊆ Vis (vis-rf) – rf<sup>I</sup> ∪ mo<sup>I</sup> ∪ rb<sup>I</sup> ⊆ po (int) – (clo ∪ rf<sup>T</sup> ∪ mo<sup>T</sup> ∪ (rbT; [Vis])) is acyclic (ext)

where Vis ≜ Succ ∪ CPendRF with CPendRF ≜ dom([CPend];rfT).

The existing defnition of opacity [27] does not account for memory allocation and assumes that all locations accessed (read/written) by a transaction are initialised with some value (typically 0). In our setting, we make no such assumption and extend the notion of opacity to dynamic opacity to account for memory allocation. More concretely, our goal is to ensure that accesses in visible transactions are valid, in that they are on locations that have been previously allocated in a visible transaction. We defne an execution to be dynamically opaque (Def. 4) if its visible write accesses are valid, i.e. are mo-preceded by a visible allocation.

Defnition 4 (Dynamic opacity). An execution G is dynamically opaque if it is opaque (Def. 3) and G.(W ∩ Vis) ⊆ rng [M ∩ Vis]; G.mo .

We next use the above defnitions to defne (dynamic durable) opacity over execution histories. In the context of persistent memory where executions may crash (e.g. due to a power failure) and resume thereafter upon recovery, a history is a sequence of events (Def. 1) partitioned into diferent eras separated by crash markers (recording a crash occurrence), provided that the threads in each era are distinct, i.e. thread identifers from previous eras are not reused after a crash.

Defnition 5 (Histories). A history, H ∈ Hist, is a pair (E,to), where E comprises events and crash markers, E ⊆ Event ∪ Crash with Crash ≜ (n, ) <sup>n</sup>∈<sup>N</sup> , and to is a total order on E , such that:


A history (E ′ , pto) is a prefx of history (E,to) if E′ ⊆ E, pto = to|E′ and dom(to; [E ′ ]) ⊆ E ′ .

The client order induced by a history H = (E,to), denoted by clo(H ), is the partial order on TXId defned by clo(H ) ≜ [S ∪ A];toT; [B]. We defne history opacity as a prefx-closed property (cf. [27]), designating a history H as opaque if every prefx (E, pto) of H induces an opaque execution. The notion of dynamic opacity over histories is defned analogously.

Defnition 6. A history H is opaque if for each prefx H<sup>p</sup> = (E, pto) of H , there exist rf, mo such that (E, pto<sup>i</sup> , clo(Hp),rf, mo) is opaque (Def. 3). H is dynamically opaque if for each prefx Hp=(E, pto) of H , there exist rf, mo such that (E, pto<sup>i</sup> , clo(Hp),rf, mo) is dynamically opaque (Def. 4).

We defne durable opacity over histories: a history H is durably opaque if the history obtained from H by removing crash markers is opaque. We defne dynamic, durable opacity analogously.

Defnition 7. A history (E,to) is durably opaque if (E \ Crash,to|E\Crash) is opaque. A history (E,to) is dynamically and durably opaque if the history (E \ Crash,to|E\Crash) is dynamically opaque.

Finally, we show that our defnitions of history (durable) opacity are equivalent to the original defnitions in the literature. (See [46] for the proof.)

Theorem 1. History opacity as defned in Def. 6 is equivalent to the original notion of opacity [27]. History durable opacity as defned in Def. 7 is equivalent to the original notion of durable opacity [6].

# 5 Operationally Proving Dynamic Durable Opacity

We develop an operational specifcation, ddTMS (§5.1), and prove it correct against ddOpacity (§5.2). In particular, we show that every history (i.e. observable trace) of ddTMS satisfes ddOpacity. As ddTMS is a concurrent operational specifcation, it serves as basis for validating the correctness of txP-MDK as well as our concurrent extensions PMDK-TML and PMDK-NORec.

# 5.1 ddTMS: The dTMS2 Automaton Extended with Allocation

ddTMS is based on dTMS2, which is an operational specifcation that guarantees durable opacity [6]. dTMS2 in turn is based on TMS2 automaton [20], which is known to satisfy opacity [33]. Furthermore, the ddTMS commit operation includes the simplifcation described by Armstrong et al [1], omitting a validity check when committing read-only transactions. In what follows we present ddTMS as a transition system.

ddTMS state. Formally, the state of ddTMS is given by the variables in Fig. 7. dTMS2 keeps track of a sequence of memory stores, mems, one for each committed writing transaction since the last crash. This allows us to determine whether reads are consistent with previously committed write operations. Each committing transaction that contains at least one write adds a new memory version to the end of the memory sequence. As we shall see, mems tracks allocated locations since it maps every allocated location to a value diferent from ⊥.

Each transaction t is associated with several variables: pc<sup>t</sup> , beginIdx <sup>t</sup> , rdSet<sup>t</sup> , wrSet<sup>t</sup> and alSet<sup>t</sup> . The pc<sup>t</sup> denotes the program counter, ranging over a set of program counter values ensuring each transaction is well-formed and that each transactional operation takes efect between its invocation and response. The beginIdx <sup>t</sup> ∈ N denotes the begin index, set to the index of the most recent memory version when the transaction begins. This is used to ensure the real-time ordering property between transactions. The rdSet<sup>t</sup> ∈ Loc ⇀ Val is the read set and wrSet<sup>t</sup> ∈ Loc ⇀ Val is the write set, recording the values read and written by

mems ∈ Mem ≜ Seq ⟨Loc → Val⊥⟩ Val<sup>⊥</sup> ≜ Val ∪ {⊥}, where ⊥ ∈/ Val S ∈ State ≜ TXId → TState s ∈ TState ≜ N × (Loc ⇀ Val) × (Loc ⇀ Val) × P (Loc) storing the local begin index, read set, write set and allocation set PC ∈ PCMap ≜ TXId → PCVal Invs ≜ TxBegin, TxRead(l), TxWrite(l, v), TxAlloc, TxCommit <sup>l</sup> <sup>∈</sup> Loc, v <sup>∈</sup> Val Resps ≜ TxBegin, TxRead(l, v), TxWrite(l, v), TxAlloc(l), TxCommit, Abort <sup>l</sup> <sup>∈</sup> Loc, v <sup>∈</sup> Val PCVal ≜ init, ready, aborted, committed, fault, Π(i), ∆(TxCommit) i ∈ Invs α ∈ Action ≜ inv(i), res(r), ε, <sup>i</sup> <sup>∈</sup> Invs, r <sup>∈</sup> Resps 168 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol

Initially, PC<sup>0</sup> ≜ λt.init S<sup>0</sup> ≜ λt.(0, ∅, ∅, ∅) mems<sup>0</sup> ≜ [λx. ⊥]

Fig. 7: ddTMS state

the transaction during its execution, respectively. We use S ⇀ T to denote a partial function from S to T. Finally, alSet<sup>t</sup> ⊆ Loc denotes the allocation set, containing the set of locations allocated by the transaction t. We use s.beginIdx, s.rdSet, s.wrSet and s.alSet to refer to the begin index, read set, write set and allocation set of a state s, respectively.

The read set is used to determine whether the values read by the transaction are consistent with its version of memory (using validIdx). The write set, on the other hand, is required because writes are modelled using deferred update semantics: writes are recorded in the transaction's write set and are not published to any shared state until the transaction commits.

ddTMS Global Transitions. ddTMS is specifed by the transition system shown in Fig. 8, where the ddTMS global transitions are given at the top and the per-transaction transitions are given at the bottom. The global transitions may either take a per-transaction step (rule (S)), match a transaction fault (rule (F)), crash (rule (X)), or behave chaotically due to a fault (rule (C)).

Note that a crash transition models both a crash and a recovery. It sets the program counter of every live transaction to aborted, preventing them from performing any further actions after the crash. Since transaction identifers are not reused, the program counters of completed transactions need not be modifed. After restarting, it must not be possible for any new transaction to interact with stale memory states prior to the crash. Thus, we reset the memory sequence to be a singleton sequence containing the last memory state prior to the crash.

Following the design of txPMDK (and our concurrent extensions PMDK-TML and PMDK-NORec) we do not check for reads and writes to unallocated memory within the library and instead delegate such checks to the client. An execution of txPMDK (as well as PMDK-TML and PMDK-NORec) that accesses unallocated memory is assumed to be faulty. In particular, a read or write of unallocated memory induces a fault (rule (F)). Once a fault is triggered, the program counter of each transaction is set to " fault" and recovery is impossible.


Fig. 8: The ddTMS global transitions (above) with its per-transaction transitions (below), where

InvOps ≜ {TxWrite(l, v), TxRead(l) | l ∈ Loc, v ∈ Val} ∪ {TxAlloc, TxCommit}

From a faulty state the system behaves chaotically, i.e. it is possible to generate any history using rule (C).

ddTMS Per-Transaction Transitions. The system contains externally visible transitions for invoking an operation (rules IB and IOp), which set the program counters to ∆(a), where a is the operation being performed. This allows the histories of the system to contain operation invocations without corresponding matching responses.

For the begin, allocation, read and write operations, an invocation can be followed by a single transition (rules DB, DA, DR-E, DR-I, DR-E and DW) that performs the operation combined with the corresponding response. Following an invocation, the commit operation is split into internal do actions ((DC-RO) and (DC-W)) and an external response (rule RC). Finally, after a read/write invocation, the system may perform a fault transition for a read (rule FR) or a write (rule FW). The main change from dTMS2 is the inclusion of an allocation procedure. The design of ddTMS allows the executing transaction, t, to tentatively allocate a location l within its transaction-local allocation set, alSet<sup>t</sup> . This allocation in ddTMS is optimistic – correctness of the allocation is only checked when t performs a read or commits.

Successful (non-faulty) read and write operations take allocations into account as follows. (1) A read operation of transaction t reads from a prior write (rule (DR-I)) or allocation (rule (DR-A)) performed by t itself. In this case, the operation may only proceed if the location l is either in the allocation or write set of t. The efect of the operation is to return the value of l in the write set (if it exists) or 0 if it only exists in the allocation set. (2) A read operation of transaction t reads from a write or allocation performed by another transaction (rule (DR-E)). Note that as with dTMS2 and TMS2, in ddTMS a read-only transaction may serialise with any memory index n after beginIdx <sup>t</sup> . Moreover, within validIdx, in addition to ensuring that t's read set is consistent with the memory index n (second conjunct), we must also ensure that t's allocation set is consistent with memory index n (third conjunct) by ensuring that none of the locations in the allocation set have been allocated at memory index n. (3) A write of transaction t successfuly performs its operation (rule (DW)), which can only happen if the location l being written has been allocated, either by t itself (frst disjunct), or by a prior transaction (second disjunct). A writing transaction must serialise after the last memory index in mems, thus the second disjunct checks allocation against the last memory index.

A successful (non-faulty) transaction is split into two cases: (1) t is a read-only transaction (rule (DC-RO)), where both alSet<sup>t</sup> and wrSet<sup>t</sup> are empty for t. In this case, the transaction simply commits. (2) t has performed an allocation or a write (rule (DC-W)). Here, we check that t is valid with respect to the last memory in mems using validIdx. The commit introduces a new memory into the memory sequence mems. The update also ensures that all pending allocations in alSet<sup>t</sup> take efect before applying the writes from t's write set.

# 5.2 Soundness of ddTMS

We state our main theorem relating ddTMS to ddOpacity. As the models are inherently diferent, we need several defnitions to transform ddTMS histories to those compatible with ddOpacity.

An execution of a labelled transition system (LTS) is an alternating sequence of states and actions, i.e. a sequence of the form s<sup>0</sup> a<sup>1</sup> s<sup>2</sup> a<sup>2</sup> . . . sn−<sup>1</sup> a<sup>n</sup> s<sup>n</sup> such that for each 0 < i ≤ n, si−<sup>1</sup> <sup>a</sup><sup>i</sup> −→ s<sup>i</sup> and s<sup>0</sup> is an initial state of the LTS. Suppose σ is an execution of ddTMS. We let AH <sup>σ</sup> = a<sup>1</sup> a<sup>2</sup> . . . a<sup>n</sup> be the action history corresponding to σ, and EH <sup>σ</sup> be the external history of σ, which is AH <sup>σ</sup> restricted to non-ϵ actions. Let FF <sup>σ</sup> be the longest fault-free prefx of EH <sup>σ</sup>. We generate the history (in the sense of Def. 5) corresponding to FF <sup>σ</sup> as follows. First, we construct the labelled history, LH <sup>σ</sup> of σ from FF <sup>σ</sup> by removing all invocation actions (leaving only responses and crashes). Then, we replace each response a<sup>i</sup> = α<sup>t</sup> by the event (i, t,t, L(α)), where L(res(TxBegin)) = B, L(res(TxAlloc(l))) = (M, l, 0), L(res(TxRead(l, v))) = (R, l, v), L(res(TxWrite(l, v))) = (W, l, v), L(res(Abort)) = A, L(inv(TxCommit)) = <sup>C</sup>, and <sup>L</sup>(res(TxCommit)) = <sup>S</sup>. Similarly, we replace each crash action <sup>a</sup><sup>i</sup> <sup>=</sup> by the pair (i, ). Note that in this construction, for simplicity, we confate threads and transactions, but this restriction is straightforward to generalise. Finally, let the ordered history of σ, denoted OH <sup>σ</sup>, be the total order corresponding to LH <sup>σ</sup>.

Theorem 2. For any execution σ of ddTMS, the ordered history OH <sup>σ</sup> satisfes ddOpacity.

The defnitions of (dynamic) durable opacity can lifted to the level of systems in the standard manner, providing a notion of correctness for implementations [28].

# 6 Modelling and Validating Correctness in FDR4

FDR4 [26] is a model checker for CSP [29] that has recently been used to verify linearisability [38], as well as opacity and durable opacity [23]. We similarly provide an FDR4 development, which allows proofs of refnement to be automatically checked up to certain bounds. This is in contrast to manual methods of proving correctness of concurrent objects [21,19], which require a signifcant amount of manual human input (though such manual proofs are unbounded).

An overview of our FDR4 development [47] is given in Fig. 9. We derive two specifcations from ddTMS. The frst is an FDR4 model of ddTMS itself, based on prior work [38,23], but contains the extensions described in §5.1. The second is ddTMS-Seq, which restricts ddTMS to a sequential crash-free specifcation. We use ddTMS-Seq to obtain (lower-bound) liveness-like guarantees, which strengthens traditional deadlock or divergence proofs of refnement. These lowerbound checks ensure our models contain at least the traces of ddTMS-Seq.

Fig. 10 summarises our experiments on the upper bound checks, where the times shown combine the compilation and model exploration times. Each row represents an experiment that bounds the number of transactions (#txns),

172 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol

Fig. 9: Overview of FDR4 checks


Fig. 10: Summary of upper bounds checks (total time in seconds: compilation + model exploration). The time out (TO) is set to 1000 seconds of compilation time.

locations (#locs), values (#val) and the size of the persistency and store bufers (#buf). The times reported are for an Apple M1 device with 16GB of memory. The frst row depicts a set of experiments where the implementations execute directly on NVM, without any bufers. As we discuss below, these tests are sufcient for checking lower bounds. The baseline for our checks sets the value of each parameter to two, and Fig. 10 allows us to see the cost of increasing each parameter. Note that all models time out when increasing the number of transactions to three, thus these times are not shown. Also note that for txPMDK (which is single-threaded), the checks for PSC also cover PTSOsyn, since PTSOsyn is equivalent to PSC in the absence of races [31]. Nevertheless, it is interesting to run the single-threaded experiments on the PTSOsyn model to understand the impact of the memory model on the checks.

In our experiments we use FDR4's built-in partial order reduction features to make the upper bound checks feasible. This has a huge impact on the model checking speed; for instance, the check for PMDK-TML with two transactions, two locations, two values and bufer size of two reduces from over 6000 seconds (1 hour and 40 minutes) to under 7 seconds, which is almost a 1000-fold improvement! This speed-up makes it feasible to use FDR4 for rapid prototyping when developing programs that use txPMDK, even for the relatively complex PTSOsyn memory model.

# 7 Related Work

Crash Consistency. Several authors have defned notions of atomicity for concurrent objects that take persistency into account (see [4] for a survey.) None of these conditions are suitable as they defne consistency for concurrent operations (of concurrent data structures) as opposed to transactional memory.

Approaches and semantics to crash-consistent transactions stretch back to the mid 1970s, which considered the problem in the database setting [24,34]. Since then, a myriad of defnitions have been developed for particular applications

(e.g. distributed systems, fle systems, etc.). For plain reads and writes, one of the frst studies of persistency models focussed on NVM is by Pelley et al. [42]. Since then, several semantic models for real hardware (Intel and ARM) have been developed [50,49,31,12,48]. For transactional memory, there are only a few notions that combine a notion of crash consistency with ACID guarantees as required for concurrent durable transactions. Raad et al. [50] defne a persistent serializability under relaxed memory, which does not handle aborted transactions. As we have already discussed, Bila et al. [6] defne durable opacity, but this is defned in terms of (totally ordered) histories as opposed to partially ordered graphs. Neither persistent serialisability nor durable opacity handle allocation.

Validating the txPMDK Implementation. Even without a clear consistency condition, a range of papers have explored correctness of the C/C++ implementation. Bozdogan et al. [8] built a sanitiser for persistent memory and used it to uncover memory-safety violations in txPMDK. Fu et al. [25] have built a tool for testing persistent key-value stores and uncovered consistency bugs in the PMDK libraries. Liu et al. [36] have built a tool for detecting cross-failure races in persistent programs, and uncovered a bug in PMDK's libpmemobj library (see 'Bug 4' in their paper). They are at a diferent level of abstraction than ours since they focus on the code itself and do not provide any description of the design principles behind PMDK.

Raad et al. [45] and Bila et al. [7] have developed logics for reasoning about programs over the Px86-TSO model (which we recall is equivalent to PTSOsyn). However, these logics have thus far only been applied to small examples. Extending these logics to cover a proof by simulation and a full (manual) proof of correctness of PMDK, PMDK-TML and PMDK-NORec would be a signifcant undertaking, but an interesting avenue for future work.

Transactional Memory (TM). Several works have studied the semantics of TM [15,22,44,43]. However, our works difer from those in that they do not account for persistency guarantees and crash consistency. However, while earlier works [44,43] merely propose a model for weak isolation (i.e. mixing transactional and non-transactional accesses), [15,22] formalise the weak isolation in various hardware and software TM platforms, albeit without validating their semantics.

Several approaches to crash consistency have recently been proposed. For a survey and comparison of techniques (in addition to transactions) see [3]. OneFile [52], Romulus [16], and Trinity and Quadra [51] together describe a set of algorithms that aim to improve the efciency of txPMDK by reducing the number of fence instructions. Liu et al. [35] present DudeTM, a persistent TM design that uses a shadow copy of NVM in DRAM, which is is shared amongst all transactions. Their approach comprises three key steps: Zardoshti et al. [56] present an alternative technique for making STMs persistent by instrumenting STM code with additional logging and fush instructions. However, none of these works have defned any formal correctness guarantees, and hence do not ofer any proofs of correctness either. In particular, the role of allocation and its interaction with reads and writes is generally unclear.

As well as defning durable opacity, Bila et al. [6] develop a persistent version of the TML STM [17] by introducing explicit undo logging and fush instructions. They then prove this to be durably opaque via the dTMS2 specifcation. More recently, Bila et al. [5] have developed a technique for transforming both an STM and its corresponding opacity proof by delegating reads/writes to memory locations controlled by the TM to an abstract library that is later refned to use volatile and non-volatile memory. Neither of these works use txPMDK, and are over a sequentially consistent memory model.

# 8 Conclusions and Future Work

Our main contribution is validating the correctness for txPMDK via the development of declarative (ddOpacity) and operational (ddTMS) consistency criteria. We provide an abstraction of txPMDK and show that it satisfes ddTMS and hence ddOpacity by extension. Additionally, we develop PMDK-TML and PMDK-NORec as two concurrent extensions of txPMDK that are based on existing STM designs, and show that these also satisfy ddTMS (and hence ddOpacity). All of our models are validated under the PSC and PTSOsyn memory models using FDR4.

As with most accepted existing transactional models (be it with or without persistency), we assume strong isolation, where each non-transactional access behaves like a singleton transaction (a transaction with a single access). That is, even ignoring persistency, there are no accepted defnitions or models for mixing non-transactional and transactional accesses, and all existing transactional models (including opacity and serialisability) assume strong isolation. Indeed, PMDK transactions are specifcally designed to be used in a purely transactional setting and are not meant to be used in combination with non-transactional accesses; i.e. they would have undefned semantics otherwise. Consequently, as we do not consider mixing transactional code with non-transactional code, RMW (read-modify-write) instructions are irrelevant in our setting. Specifcally, as non-transactional access are treated as singleton transactions, RMW instructions are not needed or relevant since they behave as transactions and their atomicity would be guaranteed by the transactional semantics.

One threat to validity of our work is that the model checking results are on a small number of transactions, locations, values, and bufer sizes (see Fig. 10). However, we have found that these sizes have been adequate for validating all of our examples, i.e., when errors are deliberately introduced, FDR validation fails and counter-examples are automatically generated. Currently, we do not know whether there is a small model theorem for durable opacity in general. This is a separate line of work and a general question that we believe is out of the scope of this paper. Specifcally, our focus here is on making PMDK transactions concurrent, providing a clear specifcation for PMDK (and its concurrent variations) with dynamic allocation, and validating correctness of the results under a realistic memory model.

# References


<sup>176</sup> Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Artifact Report: Intel PMDK Transactions: Specifcation, Validation and Concurrency<sup>⋆</sup>

Azalea Raad<sup>1</sup> , Ori Lahav<sup>2</sup> , John Wickerson<sup>1</sup> , Piotr Balcer , 3 and Brijesh Dongol4(B)

> 1 Imperial College London, London, UK <sup>2</sup> Tel Aviv University, Tel Aviv, Israel 3 Intel, Gdansk, Poland <sup>4</sup> University of Surrey, Guildford, UK b.dongol@surrey.ac.uk

Abstract. This report extends §6 of the main paper by providing further details of the mechanisation efort.

# 1 Modelling and Validating Correctness in FDR4

FDR4 [4] is a model checker for CSP [5] that has recently been used to verify linearisability [7], as well as opacity and durable opacity [3]. We similarly provide an FDR4 development, which allows proofs of refnement to be automatically checked up to certain bounds. This is in contrast to manual methods of proving correctness of concurrent objects [2,1], which require a signifcant amount of manual human input (though such manual proofs are unbounded). FDR4 uses a variety of underlying model checking paradigms and partial-order reduction techniques [4], depending on the structure of the fles to be verifed. FDR4 builds on FDR3, but the exact implementation details of FDR4 are not publicly available since it is a commercial product (available for free academic use).

The CSP fles corresponding to this report may be downloaded from [8].

### 1.1 Modelling Details

One of the most challenging aspects of the FDR4 development is the modelling work itself. Our algorithms execute over a shared memory, but the CSP formalism is based on communicating processes with no notion of shared states. Thus, for each location we must explicitly defne handler processes that communicate through channels to update and return the values of components (e.g. the addresses, read/write sets) of each model. Moreover, the implementations (txPMDK,

<sup>⋆</sup> Raad is funded by a UKRI fellowship MR/V024299/1, EPSRC grant EP/X037029/1 and VeTSS. Lahav is supported by the Israel Science Foundation (grant 814/22) and by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 851811). Wickerson is funded by EPSRC grant EP/R006865/1. Dongol is funded by EPSRC grants EP/Y036425/1, EP/X037142/1, EP/X015149/1, EP/V038915/1, EP/R025134/2 and VeTSS.

PMDK-NORec and PMDK-TML), the specifcation (ddTMS) and underlying memory models (PSC and PTSOsyn) we consider are non-trivial, signifcantly increasing the challenge of the modelling efort. Although constructing the models is challenging, once the models have been developed, they can be combined in a modular fashion. We have taken advantage of this feature to combine our implementations with diferent memory models during development. The combination of PMDK-TML and TML/NOrec also takes advantage of this modularity.

This modularity also means that our models are reusable. One could use our models to check other developments, e.g. those that use txPMDK to implement other failure-atomic data structures, or verify redesigns of txPMDK over diferent memory models. Specifcally, we use a top-level CSP process (which may comprise an interleaved composition of processes for each transaction) to model the most general client. Each transaction process begins a transaction, and then calls an unbounded number of reads, writes and allocations at non-deterministically chosen locations and with non-deterministically chosen values. An in-fight transaction process may also non-deterministically choose to terminate by calling commit instead of calling a read, write or allocation. Each operation call produces an externally visible invocation event, and when the operation terminates, an externally visible response is generated. Some operations may respond with an abort, in which case the transaction process itself terminates.

Additionally, there is an externally visible crash event that synchronises with all processes. At the level of the abstraction (i.e. ddTMS), this simply terminates all in-fight transactions, and resets the memory sequence (as detailed by the rule (X)). At the level of the implementation, all in-fight transactions are terminated and additionally, the store and persistency bufers are cleared. This means that when execution resumes, the value of each location is taken from NVM. Immediately after a crash (and before any other processes are started), the recovery process corresponding to the algorithm is executed. Note that transaction identifers are never reused.

We eschew further details of our FDR4 models since they are provided as supplementary material [8] and also refer the interested reader to other prior works [7,3].

#### 1.2 Overview of Development

An overview of our FDR4 development is given in Fig. 1. We derive two specifcations from ddTMS. The frst is an FDR4 model of ddTMS itself, based on prior work [7,3], but contains the extensions required for ddTMS. The second is ddTMS-Seq, which restricts ddTMS to a sequential crash-free specifcation. We use ddTMS-Seq to obtain (lower-bound) liveness-like guarantees, which strengthens traditional deadlock or divergence proofs of refnement. These lower-bound checks ensure our models contain at least the traces of ddTMS-Seq.

182 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol

Fig. 1: Overview of FDR4 checks


Fig. 2: Summary of upper bounds checks (total time in seconds: compilation + model exploration). The time out (TO) is set to 1000 seconds of compilation time.



Description of Tests. The fle Refinement.csp comprises six tests as detailed in Figs. 9 and 10 of the paper. There are three upper-bound checks, which show that PMDK, PMDK-TML and PMDK-NORec are refnements of ddTMS, validating soundness:


Each of these tests can be run against the memory models: NVM (which contains no crashes), PSC and PTSOsyn by commenting/uncommenting the relevant lines at the end of the fle MemoryP.csp.

Additionally, there are three lower-bound checks, which show ddTMS-Seq are refnements of PMDK, PMDK-TML and PMDK-NORec.

```
– PMDK [T= SeqFinalTMS
```

```
– FinalTML [T= SeqFinalTMS
```

```
– FinalNOrec [T= FinalNOrec
```
Each of these tests can be run against the memory models: NVM and PSC as defned in the fle MemoryP.csp. Note that the test against PTSOsyn times out. However, the tests above are sufcient since PTSOsyn reduces to PSC in the absence of data races (e.g., sequential executions).

Each check in FDR4 is split into two phases: (1) a compilation phase that builds the models; and (2) a model exploration phase. The characteristics of the upper and lower bounds checks are distinct. When naively checking the upper bound, compilation is almost instantaneous but model exploration times can be signifcant; these times are swapped for the lower bounds checks.

In general, lower-bounds take much longer to verify than the upper-bounds since FDR4 is optimised to verify abstract (low-detail) specifcations are refned by concrete (high-detail) implementations. The lower bounds checks use the more complex models as the specifcation, leading to the creation of very large spaceinefcient models, putting pressure on the available system memory. However, the lower-bound checks for PSC and PTSOsyn are superceded by the corresponding checks over NVM, since the memory models PSC and PTSOsyn are both supersets of NVM. That is, any trace over NVM must also be a trace PSC and PTSOsyn. For two transactions, two locations and two values, the checks for PMDK, PMDK-TML and PMDK-NORec take 12.16, 17.36, and 56.02 seconds, respectively.

#### 1.3 Summary of Results

Fig. 2 summarises our experiments on the upper bound checks, where the times shown combine the compilation and model exploration times. Each row represents an experiment that bounds the number of transactions (#txns), locations (#locs), values (#val) and the size of the persistency and store bufers (#buf). The times reported are for an Apple M1 device with 16GB of memory. The frst row depicts a set of experiments where the implementations execute directly on NVM, without any bufers. As we discuss below, these tests are sufcient for checking lower bounds. The baseline for our checks sets the value of each parameter to two, and Fig. 2 allows us to see the cost of increasing each parameter. Note that all models time out when increasing the number of transactions to three, thus these times are not shown. Also note that for txPMDK (which is single-threaded), the checks for PSC also cover PTSOsyn, since PTSOsyn is equivalent to PSC in the absence of races [6]. Nevertheless, it is interesting to run the single-threaded experiments on the PTSOsyn model to understand the impact of the memory model on the checks.

In our experiments we use FDR4's built-in partial order reduction features to make the upper bound checks feasible. This has a huge impact on the model checking speed; for instance, the check for PMDK-TML with two transactions, two locations, two values and bufer size of two reduces from over 6000 seconds

184 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol

(1 hour and 40 minutes) to under 7 seconds, which is almost a 1000-fold improvement! This speed-up makes it feasible to use FDR4 for rapid prototyping when developing programs that use txPMDK, even for the relatively complex PTSOsyn memory model.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Specifying and Verifying Persistent Libraries

, Azalea Raad<sup>2</sup> , and Viktor Vafeiadis<sup>1</sup> L´eo Stefanesco1(B)

> <sup>1</sup> MPI-SWS, Kaiserslautern, Germany 2 Imperial College, London, United Kingdom leo.stefanesco@mpi-sws.org

Abstract. We present a general framework for specifying and verifying persistent libraries, that is, libraries of data structures that provide some persistency guarantees upon a failure of the machine they are executing on. Our framework enables modular reasoning about the correctness of individual libraries (horizontal and vertical compositionality) and is general enough to encompass all existing persistent library specifcations ranging from hardware architectural specifcations to correctness conditions such as durable linearizability. As case studies, we specify the FliT and Mirror libraries, verify their implementations over Px86, and use them to build higher-level durably linearizable libraries, all within our framework. We also specify and verify a persistent transaction library that highlights some of the technical challenges which are specifc to persistent memory compared to weak memory and how they are handled by our framework.

# 1 Introduction

Persistent memory (PM), also known as non-volatile memory (NVM), is a new kind of memory, which can be used to extend the capacity of regular RAM, with the added beneft that its contents are preserved after a crash (e.g. a power failure). Employing PM can boost the performance of any program with access to data that needs to survive power failures, be it a complex database or a plain text editor.

Nevertheless, doing so is far from trivial. Data stored in PM is mediated through the processors' caching hierarchy, which generally does not propagate all memory accesses to the PM in the order issued by the processor, but rather performs these accesses on the cache and only propagates them to the memory asynchronously when necessary (i.e. upon a cache miss or when the cache has reached its capacity limit). Caches, moreover, do not preserve their contents upon a power failure, which results in rather complex persistency models describing when and how stores issued by a program are guaranteed to survive a power failure. To ensure correctness of their implementations, programmers have to use low-level primitives, such as fushes of individual cache lines, fences that enforce ordering of instructions, and non-temporal stores that bypass the cache hierarchy.

These primitives are often used to implement higher-level abstractions, packaged into persistent libraries, i.e. collections of data structures that must guarantee to preserve their contents after a power failure. Persistent libraries can be thought of as the analogue of concurrent libraries for persistency. And just as concurrent libraries require a specifcation, so do persistent libraries.

The question naturally arises: what is the right specifcation for persistent libraries? Prior work has suggested a number of candidate defnitions, such as durable linearizability, bufered durable linearizability [17], and strict linearizability [1], which are all extensions of the well-known correctness condition for concurrent data structures (i.e. linearizability [15]). In general, these defnitions stipulate the existence of a total order among all executed library operations, a contiguous prefx of which is persisted upon a crash: the various defnitions difer in exactly what this prefx should be, e.g. whether it is further constrained to include all fully executed operations.

Even though these specifcations have a nice compositionality property, we argue that none of them are the right specifcation pattern for every persistent concurrent library. While for high-level persistent data structures, such as stacks and queues, a strong specifcation such as durable or strict linearizability would be most appropriate, this is certainly not the case for a collection of low-level primitives. Take, for instance, a library whose interface simply exposes the exact primitives of the underlying platform: memory accesses, fences and fushes. Their semantics, recently formalized in [30,19,28] in the case of the Intel-x86 architecture and in [31,5] in the case of the ARMv8 architecture, quite clearly do not ft into the framework of the durable linearizability defnitions. More generally, there are useful concurrent libraries (especially in the context of weak memory consistency) that are not linearizable [26]; it is, therefore, conceivable that making those libraries persistent will require weak specifcations.

Another key problem with attempting to specify persistent libraries modularly is that they often break the usual abstraction boundaries. Indeed, some models such as epoch persistency [6,24] provide a global persistency barrier that afects all memory locations, and therefore all libraries using them. Such global operations also occur at higher abstraction layers: persistent transactional libraries often require memory locations to be registered with the library in order for them to be used inside transactions. As such, to ensure compatibility with such transactional libraries, implementers of other libraries must register all locations they use and ensure that any component libraries they use do the same.

In this paper, we introduce a general declarative framework that addresses both of these challenges. Our framework provides a very fexible way of specifying persistent libraries, allowing each library to have a very diferent specifcation be it durable linearizability or a more complex specifcation in the style of the hardware architecture persistency models. Further, to handle libraries that have a global efect (such as persistent barriers above) or, more generally, that make some assumptions about the internals of all other libraries, we introduce a tag system, allowing us to describe these assumptions modularly.

Our framework supports both horizontal and vertical compositionality. That is, we can verify an execution containing multiple libraries by verifying each library separately (horizontal compositionality). Moreover, we can completely verify the implementation of a library over a set of other libraries using the

specifcations of its constituent libraries without referring to their implementations (vertical compositionality). To achieve the latter, we defne a semantic notion of substitution in terms of execution graphs, which replaces each library node by a suitably constrained set of nodes (its implementation).

For simplicity, in §2, we develop a frst version of our framework over the classical notion of an execution history [15], which we extend with a notion of crashes. This basic version of our framework includes full support for weak persistency models but assumes an interleaving semantics of concurrency; i.e. sequential consistency (SC) [23].

Subsequently, in §3 we generalise and extend our framework to handle weak consistency models such as x86-TSO [32] and RC11 [22], thereby allowing us to represent hardware persistency models such as Px86 [30] and PARMv8 [31], in our framework. To do so, we rebase our formal development over execution graphs using Yacovet [26] as a means of specifying the consistency properties of concurrent libraries.

We illustrate the utility of our framework by encoding in it a number of existing persistency models, ranging from actual hardware models such as Px86 [30], to general-purpose correctness conditions such as durable linearizability [17]. We further consider two case studies, chosen to demonstrate the expressiveness of our framework beyond the kind of case studies that have been worked out in the consistency setting.

First, in §4 we use our framework to develop the frst formal specifcations of the FliT [35] and Mirror [10] libraries and establish the correctness of not only their implementations against their respective specifcations, but also their associated constructions for turning a linearizable library into a durably linearizable one. This generic theorem is new compared to the case studies in [26], and leverages our 'semantic' approach in §3. Moreover, our proofs of these constructions are the frst to establish this result in a weak consistency setting.

Second, in §5 we specify and prove an implementation of a persistent transactional library Ltrans, which provides a high-level construction to persist a set of writes atomically. The Ltrans library illustrates the need for a 'well-formedness' specifcation (in addition to its consistency and persistency specifcations) that requires clients of the Ltrans library to ensure e.g. that Ltrans writes appear only inside transactions. Moreover, it demonstrates the use of our tagging system to enable other libraries to interoperate with it.

Contributions and Outline. The remainder of this article is organised as follows.


188 L´eo Stefanesco, Azalea Raad, and Viktor Vafeiadis

§5 We specify a persistent transactional library Ltrans, develop an implementation of Ltrans (over the Intel-x86 architecture) and verify it against its specifcation. We then consider two case studies of vertical and horizontal composition in our framework using Ltrans.

We conclude and discuss related and future work in §6. The full proofs of all theorems stated in the paper are given in the technical appendix.

# 2 A General Framework for Persistency

We present our framework for specifying and verifying persistent libraries, which are collections of methods that operate on durable data structures. Following Herlihy et al. [15], we will represent program histories over a collection of libraries Λ as Λ-histories, i.e. as sequences of calls to the methods of Λ, which we will then gradually enhance to model persistency semantics. Throughout this section, we assume an underlying sequential consistency semantics; in §3 we will generalize our framework to account for weaker consistency models.

In the following, we assume the following infnite domains: Meth of method names, Loc of memory locations, Tid of thread identifers, and Val ⊇ Loc∪Tid of values. We let m range over method names, x over memory locations, t over thread identifers, and v over values. An optional value v<sup>⊥</sup> ∈ Val<sup>⊥</sup> is either a value v ∈ Val or ⊥ ∈/ Val.

#### 2.1 Library Interfaces

A library interface declares a set of method invocations of the form m(v). Some methods are are designated as constructors; a constructor returns a location pointing to the new library instance (object), which is passed as an argument to other library methods. An interface additionally contains a function, loc, which extracts these locations from the arguments and return values of its method calls.

Defnition 1. A library interface L is a tuple ⟨M,Mc, loc⟩, where the set of method invocations M is a subset of P (Meth × Val<sup>∗</sup> ), M<sup>c</sup> ⊆ M is the set of constructors, and loc : M × Val<sup>⊥</sup> → P (Loc) is the location function.

Example 1 (Queue library interface). The queue library interface, LQueue, has three methods: a constructor QueueNew(), which returns a new empty queue; QueueEnq(x, v) which adds value v to the end of queue x; and QueueDeq(x) which removes the head entry in queue x. We defne loc(QueueNew(), x) = loc(QueueEnq(x, ), ) = loc(QueueDeq(x), ) = {x}.

A collection Λ is a set of library interfaces with disjoint method names. When Λ consists of a single library interface L, we often write L instead of {L}.

# 2.2 Histories

Given a collection Λ, an event e ∈ Events(Λ) of Λ is either a method invocation m(v)<sup>t</sup> with m(v) ∈ S <sup>L</sup>∈<sup>Λ</sup> L.M and t ∈ Tid or method response (return) event ret(v)<sup>t</sup> .

A Λ-history is a sequence of events of Λ whose projection to each thread is an alternating sequence of invocation and return events which starts with an invocation.

Defnition 2 (Sequential event sequences). A sequence of events e<sup>1</sup> . . . e<sup>n</sup> is sequential if all its odd-numbered events e1, e3, . . . are invocation events and all its even-numbered events e2, e4, . . . are return events.

Defnition 3 (Histories). A Λ-history is a fnite sequence of events H ∈ Events(Λ) ∗ , such that for every thread t, the sub-sequence H[t] comprising only of t events is sequential. The set Hist(Λ) denotes the set of all Λ-histories.

When clear from the context, we refer to occurrences of events in a history by their corresponding events. For example, if H = e<sup>1</sup> . . . e<sup>n</sup> and i < j, we say that e<sup>i</sup> precedes e<sup>j</sup> and that e<sup>j</sup> succeeds e<sup>i</sup> . Given an invocation m(v)<sup>t</sup> in H, its matching return (when it exists) is the frst event of the form ret(v)<sup>t</sup> that succeeds it (they share the same thread). A call is a pair m(v)<sup>t</sup> :v<sup>⊥</sup> of an invocation and either its matching return v<sup>⊥</sup> ∈ Val (complete call) or v<sup>⊥</sup> = ⊥ (incomplete call).

A library (specifcation) comprises an interface and a set of consistent histories. The latter captures the allowed behaviors of the library, which is a guarantee made by the library implementation.

Defnition 4. A library specifcation (or simply a library) L is a tuple ⟨L, Sc⟩, where L is a library interface, and S<sup>c</sup> ⊆ Hist(L) denotes its set of consistent histories.

# 2.3 Linearizability

Linearizability [15] is a standard way of specifying concurrent libraries that have a sequential specifcation S, denoting a set of fnite sequences of complete calls. Given a sequential specifcation S, a concurrent library L is linearizable under S if each consistent history of L can be linearized into a sequential one in S, while respecting the happens before order, which captures causality between calls. It is sufcient to consider consistent executions because inconsistent executions are, by defnition, guaranteed by the library to never happen. Happens-before is defned as follows.

Defnition 5 (Happens-Before). A method call C<sup>1</sup> happens before another method call C<sup>2</sup> in a history H, written C<sup>1</sup> ≺<sup>H</sup> C<sup>2</sup> if the response of C<sup>1</sup> precedes the invocation of C<sup>2</sup> in H. When the choice of H is clear from the context, we drop the H subscript from ≺.

190 L´eo Stefanesco, Azalea Raad, and Viktor Vafeiadis

A history H is linearizable under a sequential specifcation S if there exists a linearization (in the order-theoretic sense) of ≺<sup>H</sup> that belongs to S. The subtlety is the treatment of incomplete calls, which may or may not have taken efect. We write compl(H) for the set of histories obtained from a history H by appending zero or more matching return events. We write trunc(H) for the history obtained from H by removing its incomplete calls. We can then defne linearizability as follows [14].

Defnition 6. A sequential history H<sup>ℓ</sup> is a sequentialization of a history H if there exists H′ ∈ trunc(compl(H)) such that H<sup>ℓ</sup> is a linearization of ≺H′ . A history H is linearizable under S if there exists a sequentialization of H that belongs to S. A library L is linearizable under S if all its consistent histories are linearizable under S.

For instance, we can specify the notion of linearizable queues as those that linearizable under the following sequential queue specifcation, SQueue.

Example 2 (Sequential queue specifcation). The behaviors of a sequential queue, SQueue, is expressed as a set of sequential histories as follows. Given a history H of LQueue and a location x ∈ Loc, let H[x] denote the sub-history containing calls c such that loc(c) = {x}. We defne SQueue as the set of all sequential histories H of LQueue such that for all x ∈ Loc, H[x] is of the form QueueNew()t<sup>0</sup> :x e<sup>1</sup> · · · en, where each QueueDeq call in e<sup>1</sup> · · · e<sup>n</sup> returns the value of the k-th QueueEnq call, if it exists and precedes the QueueDeq, where k is the number of preceding QueueDeq calls returning non-null values; otherwise, it returns null.

#### 2.4 Adding Failures

Our framework so far does not support reasoning about persistency as it lacks the ability to describe the persistent state of a library after a failure. Our frst extension is thus to extend the set of events of a collection, Events(Λ), with another type of event, a crash event .

Crash events allow us to specify the durability guarantees of a library. For instance, a library that does not persist any of its data may specify that a history with crash events is consistent if all of its sub-histories between crashes are (independently) consistent. In other words, in such a library, the method calls before a crash have no efect on the consistency of the history after the crash. We modify the defnition of happens-before accordingly by treating it both as an invocation and a return event. We also assume that, after a crash, the thread ids of the new threads are distinct from that of all the pre-crash threads. For libraries that do persist their data, a useful generic specifcation is durable linearizability [17], defned as follows.

Defnition 7. Given a history H, let ops(H) denote the sub-history obtained from H by removing all its crash markers. A history H is durably linearizable under S if there exists a sequentialization H<sup>ℓ</sup> of ops(H) such that H<sup>ℓ</sup> ∈ S.

Intuitively, this ensures that operations persist before they return, and they persist in the same order as they take efect before a crash.

Although durable linearizability can specify a large range of persistent datastructures, it can be too strong. For example, consider a (memory) register library Lwreg that only guarantees that writes to the same location are persisted in the order they are observed by concurrent reads. The Lwreg methods comprise RegNew() to allocate a new register, RegWrite(x, v) to write v to register x, and RegRead(x) to read from register x. The sequential specifcation Swreg is simple: once a register is allocated, a read R on x returns the latest value written to x, or 0 if R happens before all writes. The associated durable linearizability specifcation requires that writes be persisted in the linearization order; however, this is often not the case on existing hardware, e.g. in Px86 (the Intel-x86 persistency model) [30].

A more relaxed and realistic specifcation would consider two linearizations of the events: the standard volatile order ≺ and a persistent order nvo expressing the order in which events are persisted. The next sections will handle this more refned model, this paragraph only gives a quick tastes of the kind of models that are implemented by hardware. To capture the same-location guarantees, we stipulate a per-location ordering on writes that is respected by both linearizations. Specifcally, we require an ordering mo of the write calls such that for all locations x: 1) restricting mo to x, written mox, totally orders writes to x; and 2) mo<sup>x</sup> ⊆≺ and mo<sup>x</sup> ⊆ nvo. Given a history H, we can then combine these two linearizations by using ≺ after the last crash and nvo before.

Formally, a history H with n−1 crashes can be decomposed into n (crashfree) eras; i.e. <sup>H</sup> <sup>=</sup> <sup>H</sup><sup>1</sup> · · · · · <sup>H</sup><sup>n</sup> where each <sup>H</sup><sup>i</sup> is crash-free. Let us write ≺<sup>i</sup> for ≺ ∩(H<sup>i</sup> ×Hi) and so forth. We then consider k-sequentializations of the form H<sup>k</sup> <sup>ℓ</sup> = H (1) ℓ · · · H (k−1) ℓ · H (k) ℓ , where H (k) ℓ is a sequentialization of E<sup>k</sup> w.r.t. ≺<sup>k</sup> and H (i) ℓ is a sequentialization of E<sup>i</sup> w.r.t. nvo<sup>i</sup> , for i < k. We can now specify our weak register library as follows, where H comprises n eras:

$$H \in \mathsf{L}\_{\mathsf{wreg}}.\mathsf{S}\_{\mathsf{c}} \iff \forall k \le n.\,\exists H^k\_\ell \text{ } k\text{-seq. of } H.\,\, H^k\_\ell \in S\_{\mathsf{wreg}}.$$

Example 3. The following history is valid according to this specifcation but not according to the durably linearizable one:

$$W\_{\mathtt{t}1}(x,1) \cdot W\_{\mathtt{t}2}(y,1) \cdot R\_{\mathtt{t}3}(y) \cdot \mathtt{ret}\_{\mathtt{t}3}(1) \cdot R\_{\mathtt{t}3}(x) \cdot \mathtt{ret}\_{\mathtt{t}3}(0) \cdot \zeta \cdot R\_{\mathtt{t}4}(y) \cdot \mathtt{ret}\_{\mathtt{t}4}(0) \cdot R\_{\mathtt{t}4}(x) \cdot \mathtt{ret}\_{\mathtt{t}4}(1)$$

While the writes to x (W<sup>t</sup><sup>1</sup> (x, 1)) and y (W<sup>t</sup><sup>2</sup> (y, 1)) are executing, thread t<sup>3</sup> observes the new value (1) of y but the old value (0) of x; i.e. ≺ must order W<sup>t</sup><sup>2</sup> (y, 1) before W<sup>t</sup><sup>1</sup> (x, 1). By contrast, after the crash the new value (1) of x but the old value of y (0) is visible; i.e. nvo must order the two writes in the opposite order to ≺ (W<sup>t</sup><sup>1</sup> (x, 1) before W<sup>t</sup><sup>2</sup> (y, 1)).

Persist Instructions. The persistent registers described above are too weak to be practical, as there is no way to control how writes to diferent locations are persisted. In realistic hardware models such as Px86, this control is aforded to the programmer using per-location persist instructions (e.g. CLFLUSH), ensuring that all writes on a location x persist before a write-back on x. Here, we consider a coarser (stronger) variant, denoted by PFENCE, that ensures that all writes (on all locations) that happen before a PFENCE are persisted. Later in §3 we describe how to specify the behavior of per-location persist operations.

Formally, we specify PFENCE by extending the specifcation of Lwreg as follows: given history H, write call c<sup>w</sup> and PFENCE event c<sup>f</sup> , if c<sup>w</sup> ≺<sup>H</sup> c<sup>f</sup> , then (cw, c<sup>f</sup> ) ∈ nvo.

Example 4. Consider the history obtained from example 3 by adding a PFENCE:

$$\begin{array}{c} W\_{t\_1}(x,1) \cdot W\_{t\_2}(y,1) \cdot R\_{t\_3}(y) \cdot \mathbf{ret}\_{t\_3}(1) \cdot R\_{t\_3}(x) \cdot \mathbf{ret}\_{t\_3}(0) \cdot \mathsf{PFENCE}\_{t\_4}() \cdot \mathbf{ret}\_{t\_4}() \cdot \xi \\\ R\_{t\_4}(y) \cdot \mathbf{ret}\_{t\_4}(0) \cdot R\_{t\_4}(x) \cdot \mathbf{ret}\_{t\_4}(1) \end{array}$$

This history is no longer consistent according to the extended specifcation of Lwreg: as PFENCE has completed (returned), all its ≺-previous writes must have persisted and thus must be visible after the crash (which is not the case for Wt<sup>2</sup> (y, 1)).

#### 2.5 Adding Well-formedness Constraints

Our next extension is to allow library specifcations to constrain the usage of the library methods by the client of the library. For example, a library for a mutual exclusion lock may require that the "release lock" method is only called by a thread that previously acquired the lock and has not released it in between. Another example is a transactional library, which may require that transactional read and write methods are only called within transactions, i.e. between a "transaction-begin" and a "transaction-end" method call.

We call such constraints library well-formedness constraints, and extend the library specifcations with another component, Swf ⊆ Hist(L), which records the set of well-formed histories of the library. Ensuring that a program produces only well-formed histories of a certain library is an obligation of the clients of that library, so that the library implementation can rely upon well-formedness being satisfed.

#### 2.6 Tags and Global Specifcations

The goal of our framework is not only to specify libraries in isolation, but also to express how a library can enforce persistency guarantees across other libraries. For example, consider a library Ltrans for persistent transactions, where all operations wrapped within a transaction persist together atomically; i.e. either all or none of the operations in a transaction persist.

The Ltrans methods are: PTNewReg to allocate a register that can be accessed (read/written) within a transaction; PTBegin and PTEnd to start and end a transaction, respectively; PTRead(x) and PTWrite(x, v) to read from and write to Ltrans register x, respectively; and PTRecover to restore the atomicity of transactions whose histories were interrupted by a crash.

Consider the snippet below, where the PEnq(q, 33) (enqueuing 33 into persistent queue q) and PSetAdd(s, 77) (adding 77 to persistent set s) are wrapped within an Ltrans transaction and thus should take efect atomically and at the latest after the end of the call to PTEnd.

```
PTBegin();
  PEnq(q, 33);
  PSetAdd(s, 77);
PTEnd();
```
Such guarantees are not ofered by existing hardware primitives e.g. on Intelx86 or ARMv8 [30,31] architectures. As such, to ensure atomicity, the persistent queue and set implementations cannot directly use hardware reads/writes; rather, they must use those provided by the transactional library whose implementation could use e.g. an undo-log to provide atomicity.

Our framework as described so far cannot express such cross-library persistency guarantees. The difculty is that the transactional library relies on other libraries using certain primitives. This, however, is against the spirit of compositional specifcation, which precludes the transactional library from referring to other libraries (e.g. the queue or set libraries). Specifcally, there are two challenges. First, both well-formedness requirements and consistency guarantees of Ltrans must apply to any method call that is designed to use (transitively) the primitives of Ltrans. Second, we must formally express atomicity ("all operations persist atomically"), without Ltrans knowing what it means for a method of an arbitrary library to persist. In other words, Ltrans needs to introduce an abstract notion of 'having persisted' for an operation, and guarantee that all methods in a transaction 'persist' atomically.

To remedy this, we introduce the notion of tags. Specifcally, to address the frst challenge, the transactional library provides the tag t to designate those operations that are 'transaction-aware' and as such must be used inside a transaction. To address the second challenge, the transaction library provides the tag p tr, denoting an operation that has abstractly persisted. The specifcation of Ltrans then guarantees that all operations tagged with t inside a transaction persist atomically, in that either they are all tagged with p tr of none of them are. Dually, using the well-formedness condition, Ltrans requires that all operations tagged with t appear inside a transaction. Note that as the persistent queue and set libraries tag their operations with t, verifying their implementations incurs related proof obligations; we will revisit this later when we formalize the notion of library implementations.

Remark 1 (Why bespoke persistency?). The reader may question why 'having persisted' is not a primitive notion in our framework, as in an existing model of Px86 [19] where histories track the set P of persisted events. This is because associating a Boolean ('having persisted') fag with an operation may not be sufcient to describe whether it has persisted. To see this, consider a library Lpair with operations Write(x, l, r) (writing (l, r) to pair x), Readl(x) and Readr(x) (reading the left and right components of x, respectively). Suppose Lpair is implemented by storing the left component in an Ltrans register and the right component in a Lwreg register. The specifcation of Lpair would need to track the persistence of each component separately, and hence a single set P of persisted events would not sufce.

Let us see how libraries can use these tags in global well-formedness and consistency specifcations. The dilemma is, on the one hand, the specifcation of Ltrans needs to refer to events from other libraries, but on the other hand, it should not depend on other libraries to preserve encapsulation. Our idea is to anonymize these external events such that the global specifcation depends only on their relevant tags. A library should only rely on the tags it introduces itself, as well as the tags of the libraries it uses.

We now revisit several of our defnitions to account for tags and global specifcations. A library interface now additionally holds the tags it introduces as well as those it uses. For instance, the Ltrans library described above depends on no tag and introduces tags t and p tr .

Defnition 8 (Interfaces). An interface is a tuple L = ⟨M,Mc, loc, Tagsnew, Tagsdep⟩, where M, Mc, and loc are as in Def. 1, Tagsnew is the set of tags L introduces, and Tagsdep is the set of tags L uses. The set of tags usable by L is Tags(L) ≜ L.Tagsnew ∪ L.Tagsdep.

We next defne the notion of tagged method invocations (where a method invocation is associated with a set of tags). Hereafter, our notions of events, history (and so forth) use tagged method invocations (rather than methods invocations).

Defnition 9. Given a library interface L, a tagged method invocation is of the form m(v) T t : v⊥, where the new component is a set of tags T ⊆ Tags(L).

A global specifcation of a library interface L is a set of histories with some "anonymized" events. These are formalized using a designated library interface, ⋆<sup>L</sup> (with a single method ⋆), which can be tagged with any tag from Tags(L).

Defnition 10. Given an interface L, the interface ⋆<sup>L</sup> is ⟨{⋆}, ∅, ∅, ∅, Tags(L)⟩.

Now, given any history H ∈ Hist({L} ∪ Λ), let πL(H) ∈ Hist({L, ⋆L}) denote the anonymization of H such that each non-L event e in H labelled with a method m(v) T t : v<sup>⊥</sup> of L ′ ∈ Λ is replaced with ⋆ T <sup>t</sup> of ⋆<sup>L</sup> if T ̸= ∅ and is discarded otherwise. It is then straightforward to extend the notion of libraries with global specifcations as follows.

Defnition 11. A library specifcation L is a tuple ⟨L, Λtags, Sc, Swf, Tc, Twf⟩, where L, S<sup>c</sup> and Swf are as in Def. 4; T<sup>c</sup> and Twf ⊆ Hist({L, ⋆L}) are the globally consistent and globally well-formed histories, respectively; and Λtags denotes the tag-dependencies, i.e. a collection of libraries that provide all tags that L uses: L.Tagsdep ⊆ S L′∈Λtags L ′ .Tagsnew. Both Twf and T<sup>c</sup> contain the empty history.

In the context of a history, we write ⌊t⌋ for the set of events or calls tagged with the tag t (we consider a return event tagged the same way as its unique matching invocation).

For the Ltrans library, the globally well-formed set Ltrans.Twf comprises histories H such that for each thread t, E[t] restricted to PTBegin, PTEnd and events of the form t-tagged events is of the form described by the regular expression (PTBegin.⌊t⌋ ∗ .PTEnd) ∗ . In particular, transaction nesting is disallowed in our simple Ltrans library.

To defne global consistency, we need to know when two operations are part of the same transaction. Given a history H, we defne the same-transaction relation, strans, relating pairs of e, e′ ∈ ⌊t⌋ ∪ PTEnd ∪ PTBegin executed by the same thread t such that there is no PTBegin or PTEnd executed by t between them. The set Ltrans.T<sup>c</sup> of globally consistent histories contains histories H such that ∀(e, e′ ) ∈ strans, e ∈ ⌊p tr⌋ ⇔ e ′ ∈ ⌊p tr⌋, and all completed PTEnd calls are in ⌊p tr⌋. Since the PTEnd call is related to all events inside its transaction, this specifcation does express that (1) a transaction persist by the time the call to PTEnd fnishes and (2) all events persist atomically.

Finally, we need to defne the local consistency predicate Ltrans.S<sup>c</sup> describing the behavior of the registers provided by Ltrans. This is where the we defne the concrete meaning of 'having persisted' for these registers. Let S be the sequential specifcation of a register. Let H ∈ Hist(Ltrans) be a history decomposed into k eras as <sup>H</sup><sup>1</sup> · · <sup>H</sup><sup>2</sup> · · · · · · <sup>H</sup>k. Then <sup>H</sup> <sup>∈</sup> <sup>L</sup>trans.S<sup>c</sup> if all events are tagged with t, and there exists a ≺-linearization H<sup>ℓ</sup> of (H<sup>1</sup> · · <sup>H</sup><sup>2</sup> · · · · · · <sup>H</sup>k−1)<sup>∩</sup> ⌊p tr⌋ · H<sup>k</sup> such that H<sup>ℓ</sup> ∈ S, where ⌊p tr⌋ is the set of events of H tagged with p tr. In other words, a write operation is seen after a crash if it has persisted. The requirement that such operations must appear within transactions and the guarantee that they persist at the same time in a transaction are covered by the global specifcations.

#### 2.7 Library Implementations

We have described how to specify persistent libraries in our framework, and next describe how to implement persistent libraries. This is formalized by the judgment Λ ⊢ I : L, stating that I is a correct implementation of library L and only uses calls in the collection of libraries Λ. As usual in such 'layered' frameworks [13,26], the base layer, which represents the primitives of the hardware, is specifed as a library, keeping the framework uniform. This judgement can be composed vertically as follows, where I[IL] denotes replacing the calls to library L in I with their implementations given by I<sup>L</sup> (which in turn calls libraries Λ ′ ):

$$\frac{A,\mathsf{L}\vdash I:\mathsf{L}'\quad A'\vdash I\_L:\mathsf{L}}{A,A'\vdash I[I\_L]:\mathsf{L}'}$$

As we describe later, this judgment denotes contextual refnement and is impractical to prove directly. We defne a stronger notion that is compositional and more practical to use.

Defnition 12 (Implementation). Given a collection Λ of libraries and a library L, an implementation I of L over Λ is a map, I : L.M × Val<sup>⊥</sup> −→

```
globals log := Q.new()
method PTNewReg() := alloc(1)
method PTRead(l) := read(l)
method PTWrite(l, v) :=
  Q.append(log, ( l , v));
  write ( l , v)
method PTBegin() := FENCE();
method PTEnd() :=
  Append(log, COMMITTED);
  FENCE()
                                          method PTRecover() :=
                                            let w = Q.new() in
                                            while (x := Q.pop(log))
                                              if (x = COMMITTED)
                                                w = Q.new();
                                              else
                                                Q.append(w, x);
                                            while (( l , v) = Q.pop(log)) {
                                              write ( l , v); }
```
Fig. 1. Implementation of Ltrans

P(Hist(Λ)), such that it is downward-closed: 1) if H ∈ I(m(v)<sup>t</sup> , v⊥) and H′ is a prefx of H, then H′ ∈ I(m(v), ⊥); and 2) each I(m(v)<sup>t</sup> :v⊥) history only contain events by thread t.

Intuitively, I(m(v), v⊥) contains the histories corresponding to a call m(v) with outcome v⊥, where v<sup>⊥</sup> = ⊥ denotes that the call has not terminated yet and v<sup>⊥</sup> = v ∈ Val denotes the return value. Downward-closure means that an implementation contains all partial histories. We use a concrete programming language to write these implementations; its syntax and semantics are standard and given in the appendix [34].

For example, the implementation of Ltrans over Lwreg and LQueue is given in Fig. 1. The idea is to keep an undo-log as a persistent queue that tracks the values of the variables before the transaction begins. At the end of a transaction, and after all its writes have persisted, we write the sentinel value COMMITTED to the log to indicate that the transaction was completed successfully. After a crash, the recovery routine PTRecover returns the undo-log and undoes the operations of incomplete transactions by writing their previous values.

Histories and Implementations. An implementation I of L over Λ is correct if for all histories H ∈ Hist({L} ∪ Λ ′ ) that use library L as well as those in Λ ′ , and all histories H′ obtained by replacing calls to L methods with their implementation in I, if H′ is consistent, then so is H (it satisfes the L specifcation).

We defne the action H ·I of an implementation I on an abstract history H in a 'relational' way: H′ ∈ H·I when we can match each operation m′ (v) in H′ with some operation f(m′ (v)) in H in such a way that the collection f −1 (m(v)<sup>t</sup> :v⊥) of operations corresponding to some call m(v)<sup>t</sup> :v<sup>⊥</sup> in H agrees with I(m(v)<sup>t</sup> :v⊥).

Defnition 13. Let I be an implementation of L over Λ; let H ∈ Hist({L}∪Λ ′ ) and H′ ∈ Hist(Λ ∪ Λ ′ ) be two histories. Given a map f : {1, . . . , |H′ |} → {1, . . . , |H|}, H′ (I, f)-matches H if the following hold:

1. f is surjective;


The action of I on a history H is defned as follows:

$$H \cdot I \overset{\triangle}{=} \{ H' \mid \exists f. \, H' \; (I, f)\text{-} matches \, H \}.$$

Condition 1 ensures that all events of the abstract history are matched with an implementation event; condition 2 ensures that the events that do not belong to the library being implemented (L) are left untouched, and condition 3 ensures that the thread-local order of events in the implementation agrees with the one in the specifcation. The last condition (4) states that the events corresponding to the implementation of a call m(v) are consecutive in the history of the executing thread t, and correspond to the implementation I.

Well-formedness and Consistency. Recall that libraries specify both how they should be used (well-formedness), and what they guarantee if used correctly (consistency). Using these specifcations (expressed as sets of histories) to defne implementation correctness is more subtle than one might expect. Specifcally, if we view a program using a library L as a downward-closed set of histories in Hist(L), we cannot assume all its histories are in the set L.Swf of well-formed histories, as the semantics of the program will contain unreachable traces (see [26]). To formalize reachability at a semantic level, we defne hereditary consistency, stating that each step in the history was consistent, and thus the current 'state' is reachable.

Defnition 14 (Consistency). History H ∈Hist(Λ) is consistent if for all L∈ Λ, H[L] ∈ L.S<sup>c</sup> and πL(H) ∈ L.Tc. It is hereditarily consistent if all H[1..k] are consistent, for k ≤ |H|.

This defnition uses the 'anonymization' operator π<sup>L</sup> defned in §2.6 to test that the history H follows the global consistency predicates of every L ∈ Λ.

We further require that programs using libraries respect encapsulation, defned below, stating that locations obtained from a library constructor are only used by that library instance. Specifcally, the frst condition ensures that distinct constructor calls return distinct locations. The second condition ensures that a non-constructor call e of L uses locations that have been allocated by an earlier call c (c ≺ e) to an L constructor.

Defnition 15 (Encapsulation). A history H ∈ Hist(Λ) is encapsulated if the following hold, where C denotes the set of calls to constructors in H:


198 L´eo Stefanesco, Azalea Raad, and Viktor Vafeiadis

We can now defne when a history of Λ is immediately well-formed: it must be encapsulated and be well-formed according to each library in Λ and all the tags it uses.

Defnition 16. History H ∈ Hist(Λ) is immediately well-formed if the following hold:


We fnally have the notions required to defne a correct implementation.

Implementation Correctness. As usual, an implementation is correct if all behaviors of the implementation are allowed by the specifcation. In our setting, this means that if a concrete history is hereditarily consistent, so should the abstract history. Moreover, assuming the abstract history is well-formed, all corresponding concrete histories should also be well-formed; this corresponds to the requirement that the library implementation uses its dependencies correctly, under the assumption that the program itself uses its libraries correctly.

Defnition 17 (Correct implementation). An implementation I of L over Λ is correct, written Λ ⊢ I : L, if for all collections Λ ′ , all 'abstract' histories H ∈ Hist({L} ∪ Λ ′ ) and all 'concrete' histories H′ ∈ H · I ⊆ Hist(Λ ∪ Λ ′ ), the following hold:


This defnition is similar to contextual refnement in that it quantifes over all contexts: it considers histories that use arbitrary libraries as well as those that concern I directly. We now present a more convenient, compositional method for proving an implementation correct, which allows one to only consider libraries and tags that are used by the implemented library.

#### 2.8 Compositionally Proving Implementation Correctness

Recall that in this section we present our framework in a simplifed sequentially consistent setting; later in §3 we generalize our framework to the weak memory setting. We introduce the notion of compositional correctness, simplifying the global correctness conditions in Def. 17. Specifcally, while Def. 17 considers histories with arbitrary libraries that may use tags introduced by L, our compositional condition requires one to prove that only those L methods that are L-tagged satisfy L.Tc.

Defnition 18 (Compositional correctness). An implementation I of L over Λ is compositionally correct if the following hold:


The preservation of well-formedness (condition 1) does not change compared to its counterpart in Def. 17, as in practice this condition is easy to prove directly. Condition 2 requires one to prove that the implementation is correct in isolation (without Λ ′ ). Condition 3 requires one to prove that global consistency requirements are maintained for all dependencies of the implementation. In practice, this corresponds to proving that those L operations tagged with existing tags in Λ obey the global specifcations associated with these tags. Intuitively, the onus is on the library that uses a tag for its methods to prove the associated global consistency predicate: we need not consider unknown methods tagged with tags in L.Tagsnew.

Finally, we show that it is sufcient to show an implementation I is compositionally correct as it implies that I is correct.

Theorem 1 (Correctness). If an implementation I of L over Λ is compositionally correct (Def. 18), then it is also correct (Def. 17).

Example 5 (Transactional Library Ltrans). Consider the implementation Itrans of Ltrans over Λ = {Lwreg, LQueue} given in Fig. 1, and let us assume we were to show that Itrans is compositionally correct. Our aim here is only to outline the proof obligations that must be discharged; later in §5 we give a full proof in the more general weak memory setting.


Example 6 (A Client of Ltrans). To see how the global consistency specifcations work, consider a simple min-max counter library, Lmmcnt, tracking the maximal and minimal integer it has been given. The Lmmcnt is to be used within


Fig. 2. Implementation Immcnt of Lmmcnt

Ltrans transactions, and provides four methods: mmNew() to construct a min-max counter, mmAdd(x, n), to add integer n to the min-max counter, and mmMin(x) and mmMax(x) to read the respective values.

We present the Immcnt implementation over Ltrans in Fig. 2. The idea is simply to track two integers denoting the minimal and maximal values of the numbers that have been added. Interestingly, even though they are stored in Ltrans registers, the implementation does not begin or end transactions: this is the responsibility of the client to avoid nesting transactions. This is enforced by Lmmcnt using a global well-formedness predicate. Moreover, the mmAdd operation is tagged with t from the Ltrans library, ensuring that it behaves well w.r.t. transactions. A non-example is a version of Immcnt where the minimum is in a Ltrans register, but the max is in a "normal" Lwreg register. This breaks the atomicity guarantee of transactions.

Formally, the interface Lmmcnt has four methods as above, where mmNew is the only constructor. The set of used tags is Tagsdep = {t, p tr}, and all Lmmcnt methods are tagged with t as they all use primitives from Ltrans. The consistency predicate is defned using the obvious sequential specifcation Smmcnt, which states that calls to mmMin return the minimum of all integers previously given to mmAdd in the sequential history. We lift this to (concurrent) histories as follows. A history H ∈ Hist(Lmmcnt) is in Lmmcnt.S<sup>c</sup> if there exists E<sup>ℓ</sup> ∈ Smmcnt that is a ≺-linearization of E1[p tr] · E2[p tr] · · · En−<sup>1</sup> · En[p tr], where H constructs n eras decomposed as <sup>H</sup> <sup>=</sup> <sup>E</sup><sup>1</sup> · · · · · <sup>E</sup><sup>n</sup> (recall that <sup>E</sup>[<sup>p</sup> tr] denotes the sub-history with events tagged with p tr, that is, persisted events.). The global specifcation and well-formedness conditions of Lmmcnt are trivial. Because Lmmcnt uses tag t of Ltrans, a well-formed history of Lmmcnt must satisfy Ltrans.Twf, which requires that all operations tagged with t be inside transactions, and Ltrans.T<sup>c</sup> guarantees that Lmmcnt operations persist atomically in a transaction.

When proving that the implementation in Figure 2 satisfes Lmmcnt using compositional correctness, one proof obligation is to show that, given histories H ∈ Hist({Ltrans, Lmmcnt, ⋆<sup>L</sup>trans}) and H′ ∈ H · Immcnt ⊆ Hist({Ltrans, ⋆<sup>L</sup>trans}), if π<sup>L</sup>trans (H′ ) ∈ Ltrans.Tc, then π<sup>L</sup>trans (H) ∈ Ltrans.Tc. This corresponds precisely to the fact that min-max counter operations persist atomically in a transaction, assuming the primitives it uses do as well.

#### 2.9 Generic Durable Persistency Theorems

We consider another family of libraries with persistent reads/writes guaranteeing the following:

if one replaces regular (volatile) reads/writes in a linearizable implementation with persistent ones, then the implementation obtained is durably linearizable.

We consider two such such libraries: FliT [35] and Mirror [10]. Thanks to our framework, we formalise the statement above for the frst time and prove it for both Flit and Mirror against a realistic consistency (concurrency) model (see §4).

# 3 Generalization to weak-memory

This section sketches how we generalize the framework presented in the previous section to the weak memory, where events generated by the program are not totally ordered. For lack of space, the technical details, which largely follow that of the previous section, are relegated to the Appendix [34]. The purpose of this section is to give an idea of how executions, a standard tool in the semantics of weak memory, generalize the histories we used in the Overview section, and to give enough context for the case studies that follow.

Unlike the histories that we discussed in the previous section, in which events are totally ordered by a notion of time, events in executions are only partially ordered, refecting that instructions executed in parallel are not naturally ordered. Formally, an execution is thus a set of events equipped with a partial order which represents the ordering between events from the same thread. This partial order, written po, for program-order, is depicted with black arrows in Fig. 3, where it orders minimally the initial event, and the two events of each thread according to the source code. Addi-

$$a = x; \ y = 2 \parallel a = y; \ x = 5$$

tional edges indicate, for each read-event returning the value v, the write-event that provided the value v: in that case, an rf-edge from the write-event to the read-event is added to the execution.

To be able to reason about synchronization, the notion of happens-before needs to be adapted to this setting. It is defned using po and an additional type of edge, synchronizes-with, written sw, which denotes that two events synchronize with each other, and in particular that one happens before the other. Usually, sw ⊆ rf, for example between a release-write and an acquire-read in the C11 memory model. Given these sw edges, the happens-before order they induce, which generalizes ≺ from the previous section is defned as the transitive closure (po ∪ sw) <sup>+</sup>. This is not sufcient however, because we consider partial executions G where the focus is on a subset of the libraries in some unknown global execution G′ , that is: G = G′ ⇃ L. Therefore, external events (in G′ but not in G) may induce happens-before relations between events of G, yet we want to specify library L without referring to any such execution G′ that contains it. To solve this issue, we use the technique of [26], and we add a fnal type of edge to executions: hb, which corresponds to both the external and the internal synchronization. Because of the latter, it must contain the internal synchronization: po ∪ sw ⊆ hb.

To summarize, an execution is a tuple ⟨E, po,rf,sw, hb⟩ comprised of a set E of events, and of the relations we just described. A library specifcation is the same as in the previous section, mutatis mutandis. The sets of executions that are parts of specifcations are defned using a formalism developed in the weak memory model literature. A set S of executions is described with conditions about relations built from po, rf, etc. Given a set V of events, we denote by [V ] the relation V × V , and we denote by R1; R<sup>2</sup> the standard composition of two relations R<sup>1</sup> and R2. For example, if R denotes the set of read-events of an execution and W the set of write-events, the condition [W];rf; [R] ⊆ sw states that if there is a rf-edges between two events e<sup>1</sup> ∈ W and e<sup>2</sup> ∈ R of an execution, there must also be a sw synchronization edge between e<sup>1</sup> and e2.

As in the previous section, the tag system allows the library specifcation to state which events must have been persisted in a valid execution. The semantics of a program is a set of executions that contain events from all the libraries used by the program; and whose happens-before order satisfy hb = (po ∪ sw) <sup>+</sup>, as there are no external synchronization in the executions of the whole program. The Appendix [34] details how our framework is defned in this more general setting.

# 4 Case Study: Durable Linearizability with FliT and Mirror

We consider a family of libraries that provide a simple interface with persistent memory accesses (reads and writes), allowing one to convert any linearisable implementation to a durably linearisable one by replacing regular (volatile) accesses with persistent ones supplied by the library. Specifcally, we consider two such libraries FliT [35] and Mirror [10]; we specify them both in our framework, prove their implementations sound against their respective specifcations, and further prove their general result for converting data structures.

#### 4.1 The FliT Library

FliT [35] is a persistent library that provides a simple interface very close to Px86, but with stronger persistency guarantees, which make it easier to implement durable data structures. Specifcally, a FliT object ℓ can be accessed via

```
method wrπ(ℓ, v) :
  if π = p then
    fetch-and-add(fit-counter(ℓ), 1);
    write(ℓ, v);
    fushopt(ℓ);
    fetch-and-add(fit-counter(ℓ), −1);
  else
    sfence;
    write(ℓ, v);
                                            method rdπ(ℓ) :
                                               local v = read(ℓ);
                                               if π = p ∧ fit-counter(ℓ) > 0 then
                                                 fushopt(ℓ);
                                               return v;
                                            method finishOp :
                                               sfence;
```


write and read methods, wrπ(ℓ, v) and rdπ(ℓ), as well as standard read-modifywrite methods. Each write (resp. read) operation has two variants, denoted by the type π ∈ {p, v}. This type specifes if the write (resp. read) is persistent (π = p) in that its efects must be persisted, or volatile (π = v) in that its persistency has been optimised and ofers weaker guarantees. The default access type is persistent (p), and the volatile accesses may be used as optimizations when weaker guarantees sufce. Wei et al. [35] introduce a notion of dependency between diferent operations as follows. If a (persistent or volatile) write w depends on a persistent write w ′ , then w ′ persists before w. If a persistent read r reads from a persistent write w, then r depends on w and thus w must be persisted upon reading if it has not already persisted. Though simple, FliT provides a strong guarantee as captured by a general result for correctly converting volatile data structures to persistent ones: if one replaces every memory access in the implementation of a linearizable data-structure with the corresponding persistent FliT access, then the resulting data structure is durably linearizable.

Compared to the original FliT development, our soundness proof is more formal and detailed: it is established against a formal specifcation (rather than an English description) and with respect to the formal Px86 model.

FliT Interface. The FliT interface uses the p Px86 from Px86 and contains a single constructor, new, allocating a new FliT location, as well as three other methods below, the last two of which are durable:


We write R and W respectively for the read and write events, and add the superscript π (e.g. R<sup>p</sup> ) to denote such events with the given persistency mode.

FliT Specifcation. We develop a formal specifcation of FliT in our framework, based on its original informal description. The correctness of FliT executions is described via a dependency relation that contains the program order and the total execution (linearization) order restricted to persistent write-read operations on the same location. Note that this dependency notion is stronger than the customary defnitions that use a rf relation (as in the Px86 specifcation) instead of lin, because a persistent read may not read directly from a persistent write w, but rather from another later (lin-after w) write.

Defnition 19 (FliT execution Correctness). A FliT execution G is correct if there exists a 'reads-from' relation rf and a total order lin ⊇ G.hb on G.E and an order nvo such that:

1. Each read event reads from the most recent previous write to the same location:

rf = S <sup>ℓ</sup>∈Loc([Wℓ]; lin; [Rℓ]) \ (lin; [Wℓ]; lin)


Px86 implementation of FliT. The implementation of FliT methods is given in Fig. 4. Whereas a naive implementation of this interface would have to issue a fush instruction both after persistent writes and in persistent reads, the implementation shown associates each location with a counter to avoid performing superfuous fushes when reading from a location whose value has already persisted. Specifcally, a persistent write on ℓ increments its counter before writing to and fushing it, and decrements the counter afterwards. As such, persistent reads only need to issue a fush if the counter is positive (i.e. if there is a concurrent write that has not executed its fush yet).

Theorem 2. The implementation of FliT in Fig. 4 is correct.

FliT and Durable Linearizability. Given a data structure implementation I, let p(I) denote the implementation obtained from I by 1) replacing reads/writes in the implementation with their corresponding persistent FliT instructions, and 2) adding a call to finishOp right before the end of each method. We then show that given an implementation I, if I is linearizable, then p(I) is durably linearizable<sup>3</sup> . We assume that all method implementations are singlethreaded, i.e. all plain executions I(m(v)) are totally ordered.

Theorem 3. If Px86 ⊨ I : Lin(S), then FliT ⊨ p(I) : DurLin(S).

#### 4.2 The Mirror Library

The Mirror [10] persistent library has similar goals to FliT. The main diference between the two is that Mirror operations do not ofer two variants, and their operations are implemented diferently from those of FliT. Specifcally, in Mirror each location has two copies: one in persistent memory to ensure durability,

<sup>3</sup> The defnition here is the same as in §2, as hb-linearizations of the execution still yield sequential executions.

and one in volatile memory for fast access. As such, read operations are implemented as simple loads from volatile memory, while writes have a more involved implementation than those of FliT.

We present the Mirror specifcation and implementation in the technical appendix where we also prove that its implementation is correct against its specifcation. As with FliT, we further prove that Mirror can be used to convert linearizable data structures to durably linearizable ones, as described above.

# 5 Case Study: Persistent Transactional Library

We revisit the Ltrans transactional library, develop its formal specifcation and verify its implementation (Fig. 1) against it. Recall the simple Ltrans implementation in Fig. 1 and that we do not allow for nested transactions. The implementation uses an undo-log which records the former values of persistent registers (locations) modifed in a transaction. If, after a crash, the recovery mechanism detects a partially persisted transaction (i.e. the last entry in the undo log is not COMMITTED), then it can use the undo-log to restore registers to their former values. The implementation uses a durably linearizable queue library<sup>4</sup> Q, and assumes that it is externally synchronized: the user is responsible for ensuring no two transactions are executed in parallel. We formalize this using a global well-formedness condition.

Later in §5.2 we develop a wrapper library LStrans for Ltrans that additionally provides synchronization using locks and prove that our implementation of this library is correct. To do this, we need to make small modifcations to the structure of the specifcation: the specifcation in §2 requires that any 'transaction-aware operation' (i.e. those tagged with t) be enclosed in calls to PTBegin and PTEnd. Since LStrans wraps the calls to PTBegin and PTEnd, the well-formedness condition needs to be generalized to allow operations tagged with t to appear between calls to operations that behave like PTBegin and PTEnd. To that end, we add two new tags b and e to denote such operations, respectively.

#### 5.1 Specifcation

The Ltrans library provides four tags: 1) t for transaction-aware 'client' operations; 2) p tr for operations that have persisted using transactions; and 3) b, e for operations that begin and end transactions, respectively. We write R, W, B, E, RC respectively for the sets of events labeled with read, write, begin, end and recovery methods. As before, we write e.g. ⌊t⌋ for the set of events tagged with t. Note that while B denotes the set of the begin events in library Ltrans, the ⌊b⌋ denotes the set of all events that are tagged with b, which includes B (of library Ltrans) as well as events of other (non-Ltrans) libraries that may be tagged with b; similarly for E and ⌊e⌋. As such, our local specifcations below (i.e. local well-formedness

<sup>4</sup> For example, take any linearizable queue implementation and use the FliT library as described in §4.

and consistency) are defned in terms of B and E, whereas our global specifcations are defned in terms of ⌊b⌋ and ⌊e⌋. As before, for brevity we write e.g. [t] as a shorthand for the relation [⌊t⌋]. We next defne the 'same-transaction' relation strans:

strans ≜ [⌊b⌋∪⌊e⌋∪⌊t⌋]; (po∪po−<sup>1</sup> ); [⌊b⌋∪⌊e⌋∪⌊t⌋] \ ((po; [e]; po)∪(po; [b]; po))

An execution is locally well-formed if the following hold:


An execution is globally well-formed if client operations are inside transactions:


An execution is locally-consistent if there exists a relation rf satisfying:


An execution is globally-consistent if there exists an order nvo over ⌊t⌋ satisfying:


Theorem 4. The Ltrans implementation in Fig. 1 over Px86 is correct.

#### 5.2 Vertical Library Composition: Adding Internal Synchronization

We next demonstrate how our framework can be used for vertical library composition, where an implementation of one library comprises calls to other libraries with non-trivial global specifcations. To this end, we develop LStrans, a wrapper library around Ltrans that is meant to be simpler to use by providing synchronization internally: rather than the user ensuring synchronization for Ltrans, one can use LStrans to prevent two transactions from executing in parallel. More formally, the well-formedness condition (3) of Ltrans becomes a correctness guarantee of LStrans. We consider a simple implementation of LStrans that uses a global lock acquired at the beginning of each transaction and released at the end as shown below.

globals lock := L.new() method LPTBegin() := L.acq(lock);PTBegin() method LPTEnd() := PTEnd();L.rel(lock)

Theorem 5. The implementation of LStrans above is correct.

Using compositional correctness, the main proof obligation is the condition stipulating that the implementation be well-formed, ensuring that Ltrans is used correctly by the LStrans implementation. This is straightforward as we can assume there exists an immediate prefx that is consistent. The existence of the hb-ordering of calls to PTBegin and PTEnd follows from the consistency of the global lock used by the implementation.

#### 5.3 Horizontal Library Composition

We next demonstrate how our framework can be used for horizontal library composition, where a client program comprises calls to multiple libraries. To this end, we develop a simple library, Lcntr, providing a persistent counter to be used in sequential (single-threaded) settings: If a client uses Lcntr in concurrent settings, it must call its methods within critical sections. The Lcntr provides three operations to create (NewCounter), increment (CounterInc) and read a counter (CounterRead). The specifcation and implementation of Lcntr are given in [34]

As Lcntr uses the tags of Ltrans, we defne Lcntr.Λtags ≜ {Ltrans}. The all the operations are tagged with t. As such, Lcntr inherits the global well-formedness condition of Ltrans, meaning that Lcntr operations must be used within transactions (i.e. hb-between operations respectively tagged with b and e). Putting it all together, the following client code snippet uses Lcntr in a correct way, even though Lcntr has no knowledge of the existence of LStrans.

c = NewCounter(); LPTBegin(); CounterInc(c); CounterInc(c); LPTEnd();

Specifcally, the above is an instance of horizontal library composition (as the client comprises calls to both LStrans and Lcntr), facilitated in our framework through global specifcations.

# 6 Conclusions, Related and Future Work

We presented a framework for specifying and verifying persistent libraries, and demonstrated its utility and generality by encoding existing correctness notions within it and proving the correctness of the FliT and Mirror libraries, as well as a persistent transactional library.

Related Work. The most closely related body of work to ours is [26]. However, while their framework can be used for specifying only the consistency guarantees of a library, ours can be used to specify both consistency and persistency guarantees. More generally, our tag system extends the expressivity of [26] with support for global efects such as some types of fences.

Existing literature includes several works on formal persistency models, both for hardware [25,30,31,5,6,19,29,28] and software [4,21,11], as well as correctness conditions for persistent libraries such as durable linearizability [17]. As we showed in §3, such models can be specifed in our framework.

There have been works [33] to specify libraries using an operational approach instead of the declarative approach that we advocate for here. While it is not generic in the memory model, it support weak memory, with a fragment of the C++ 11 memory model, and supports synchronization that is internal and external to the library. Another framework for formalizing behavior of concurrent objects in the presence of weak memory is [18], which is more syntactic as our framework: they use a process calculus, which allows them to handle callbacks between the library and the client. Extending our framework, which is more semantic, to handle this setting would probably require shifting from executions/histories to something similar to game semantics.

Additionally, there are several works on implementing and verifying algorithms that operate on NVM. [9] and [36] respectively developed persistent queue and set implementations in Px86. [8] provided a formal correctness proof of the implementation in [36]. All three of [8,36,9] assume that the underlying concurrency model is SC [23], rather than that of Px86 (namely TSO). As we demonstrated in §4–§5 we can use our framework to verify persistent implementations modularly while remaining faithful to the underlying concurrency model. [27,2] have developed persistent program logics for verifying programs under Px86. [20] recently formalized the consistency and persistency semantics of the Linux ext4 fle system, and developed a model-checking algorithm and tool for verifying the consistency and persistency behaviors of ext4 applications such as text editors.

Recently, and independently to this work, Bodenm¨uller et al [3] have proved the correctness of the Flit library under TSO. They used an operational approach, and modeled the libraries and the memory and persistency models operationally using automata, and proved a simulation result using KIV a specialized proof assistant. As for this paper, they proved that a linearizable library using Flit becomes durably linearizable.

Future Work. We believe our framework will pave the way for further work on verifying persistent libraries, whether manually (as done here), possibly with the assistance of an interactive theorem prover and/or program logics such as those of [7,27,2], or automatically via model checking. The work of [7] uses the framework of [26] to specify data structures in a program logic, and it would be natural to extend it to our framework for persistency. Existing work in the latter research direction, e.g. [12,20], has so far only considered low-level properties, such as the absence of races or the preservation of user-supplied invariants. It has not yet considered higher-level functional correctness properties, such as durable linearizability and its variants. We believe our framework will be helpful in that regard. In a more theoretical direction, it would be interesting to understand how our compositional correctness theorem fts in general settings for abstract logical relations such as [16].

#### Acknowledgments

We thank the anonymous reviewers for their feedback. This work has received funding from the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 101003349). Raad is funded by a UKRI fellowship MR/V024299/1, by the EPSRC grant EP/X037029/1 and by VeTSS.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Shachar Itzhaky1(B) , Sharon Shoham<sup>2</sup> , and Yakir Vizel<sup>1</sup>

> <sup>1</sup> Technion, Haifa, Israel shachari@cs.technion.ac.il <sup>2</sup> Tel-Aviv University, Tel Aviv-Yafo, Israel

Abstract. Hyperproperties specify the behavior of a system across multiple executions, and are an important extension of regular temporal properties. So far, such properties have resisted comprehensive treatment by software model-checking approaches such as IC3/PDR, due to the need to fnd not only an inductive invariant but also a total alignment of diferent executions that facilitates simpler inductive invariants. We show how this treatment is achieved via a reduction from the verifcation problem of ∀ ∗∃ <sup>∗</sup> hyperproperties to Constrained Horn Clauses (CHCs). Our starting point is a set of universally quantifed formulas in frst-order logic (modulo theories) that encode the verifcation of ∀ ∗∃ ∗ hyperproperties over infnite-state transition systems. The frst-order encoding uses uninterpreted predicates to capture the (1) witness function for existential quantifcation over traces, (2) alignment of executions, and (3) corresponding inductive invariant. Such an encoding was previously proposed for k-safety properties. Unfortunately, fnding a satisfying model for the resulting frst-order formulas is beyond reach for modern frst-order satisfability solvers. Previous works tackled this obstacle by developing specialized solvers for the aforementioned frst-order formulas. In contrast, we show that the same problems can be encoded as CHCs and solved by existing CHC solvers. CHC solvers take advantage of the unique structure of CHC formulas and handle the combination of quantifers with theories and uninterpreted predicates more efciently. Our key technical contribution is a logical transformation of the aforementioned sets of frst-order formulas to equi-satisfable sets of CHCs. The transformation to CHCs is sound and complete, and applying it to the frst-order formulas that encode verifcation of hyperproperties leads to a CHC encoding of these problems. We implemented the CHC encoding in a prototype tool and show that, using existing CHC solvers for solving the CHCs, the approach already outperforms state-of-the-art tools for hyperproperty verifcation by orders of magnitude.

# 1 Introduction

Hyperproperties [15] are properties that relate multiple execution traces, either taken from a single program or from multiple programs. Checking such properties is known as relational verifcation, and is essential when reasoning about security policies, program equivalence, concurrency protocols, etc. Existing specifcation languages for hyperproperties [14,6,43] extend standard ones, e.g., temporal logic or Hoare logic, with (explicit or implicit) quantifcation over traces. This shifts the focus from properties of individual traces to properties of sets of traces. For example, k-safety [15] is a class of hyperproperties, where k universal quantifers are used to defne a relational invariant over states originating from k traces.

This paper addresses verifcation of hyperproperties with ∀ ∗∃ <sup>∗</sup> quantifcation over traces and a body of the form □ϕ (where □ stands for "globally"). This fragment captures many hypersafety (e.g., the aforementioned k-safety) and hyperliveness properties, and was shown by [8] to express a wide class of properties of interest, including generalized non-interference (GNI) [38].

Verifcation of hyperproperties is more challenging than verifcation of singletrace properties, and, as a result, has gained a lot of attention in recent years. Unlike single-trace properties, verifcation of properties of k traces requires the discovery of relational inductive invariants, which defne the relation between states of k execution traces. Since the construction of invariants that hold between any k reachable states is hard (or even impossible, depending on the assertion logic), proving hyperproperties often hinges on fnding an alignment of any k traces such that the invariant only needs to describe aligned states.

In the case of k-safety properties, an alignment of traces is often given by a self composition [5,44] of the program, composing diferent copies of the program (or several diferent programs) together, e.g., by running the diferent copies in lockstep [48] or by more sophisticated composition schemes, e.g., [24]. While self composition allows to reduce k-safety verifcation to standard safety verifcation, this reduction requires to choose the alignment of the diferent copies a-priori. The choice of alignment, however, has a signifcant efect on the complexity of the inductive invariants themselves, as demonstrated by [41]. This renders the standard reduction from k-safety verifcation to safety verifcation, based on a fxed alignment, impractical in many cases. As a result, fnding a good alignment as part of relational verifcation has been a topic of interest in recent years [43,27,45,6,8].

In the case of hyperliveness properties that stem from the use of existential quantifcation over traces (i.e. ∀ ∗∃ <sup>∗</sup> properties), complexity rises further. Verifying such hyperliveness properties calls for fnding "witness" traces that match the universally quantifed traces, in addition to the relational invariant and alignment. This reduces verifcation of ∀ ∗∃ <sup>∗</sup> properties to the problem of inferring three ingredients: (i) a witness function for existential quantifcation over traces, (ii) an alignment of traces, and (iii) a corresponding relational inductive invariant. These ingredients are all interdependent: diferent witnesses call for diferent alignments and give rise to diferent invariants, with diferent levels of complexity. It is therefore desirable to search for the combination of the three of them simultaneously, which is the focus of this paper.

We propose a novel reduction from verifcation of hyperproperties with a ∀ ∗∃ <sup>∗</sup> quantifcation prefx over infnite-state transition systems to satisfability of Constrained Horn Clauses (CHCs) [11,10], also known as CHC-SAT. Importantly, the reduction does not fx any of the aforementioned verifcation ingredients, in particular, the alignment, a-priori. Instead, it is based on a CHC encoding of their joint requirements. The unique structure of CHCs makes it possible to adopt software model checking techniques (e.g. interpolation [39], IC3/PDR [32,35]) for solving them. Our reduction, thus, allows to use stateof-the-art CHC solvers [28,33,31,49] to achieve a highly efcient hyperproperty verifcation procedure.

While it is known that safety verifcation can be reduced to CHC-SAT, we are the frst to show how inferring the combination of a witness function, a trace alignment and an inductive invariant for hyperproperties of the ∀ ∗∃ ∗ -fragment can be reduced to CHC-SAT.

The frst step of our reduction to CHC-SAT is an encoding of the joint requirements of the witness-alignment-invariant ingredients as a set of universally quantifed formulas in frst-order logic (FOL) modulo theories, where uninterpreted predicates capture the witness, alignment and invariant, and frst-order theories (e.g., arithmetic and arrays) are used for modeling the transition system and the requirements. Such an encoding has been proposed by [41] for the problem of fnding an invariant together with an alignment in the context of verifcation of k-safety properties (the universally quantifed subset of this fragment). We extend their FOL encoding to ∀ ∗∃ <sup>∗</sup> properties, based on the game semantics introduced in [8].

Unfortunately, the resulting FOL formulas are beyond what modern frstorder satisfability solvers can handle due to a combination of quantifers with theories and uninterpreted predicates. In particular, the FOL formulas are not in the form of CHCs. As a result, previous works [41,45] that used a similar encoding could not rely on a (single) CHC-SAT query to fnd the alignment and invariant simultaneously. Instead, [41] resorted to an enumeration of potential alignments, using a separate CHC-SAT query to search for an inductive invariant (in a restricted language) for each candidate alignment. [45] developed a specialized solver that is able to handle these non-CHC formulas directly.

In contrast to previous works, we introduce a second step where we transform the set of universally quantifed FOL formulas to a set of universally quantifed CHCs. This step—which is also the key technical contribution of the paper allows us to use any CHC solver for hyperproperty verifcation, and beneft from current and future developments in this lively area of research. We emphasize that the transformation to CHCs is surprising since it allows us to overcome a seemingly unavoidable obstacle: a disjunction of atomic formulas involving unknown predicates, which arises from the encoding of a choice between diferent alignment and witness options.

We implemented the reduction of ∀ ∗∃ ∗ -hyperproperty verifcation to CHC-SAT in a tool called HyHorn, on top of Z3 [23], using Spacer [31] as a CHC solver. Our results show that HyHorn is very efcient in verifying ∀ ∗∃ ∗ -hyperproperties, outperforming the state-of-the-art [45,8,41] by orders of magnitude.

Our main contributions are:

– We develop a satisfability-preserving transformation of frst-order formulas of a certain form to CHCs. The transformation is accompanied by a bidirectional translation of solutions.

```
pre (a1 < a2 ∧ b1 > b2 )
squaresSum ( int a , int b ){
   assume (0 < a < b );
   int c =0;
   while (a <b) {c += a*a; a ++;}
   return c;
}
post (c1 > c2 )
 a1 < a2 ∧ b1 > b2 →
   ∀π1 : ¬(a < b), π2 : ¬(a < b) · □(c1 > c2)
                          (a)
                                                             (1)Init(V1) ∧ Init(V2) ∧ a2 > a1 ∧ b2 < b1 → Inv(V1, V2)
                                                   (2)Inv(V1, V2) ∧ A{1}(V1, V2) ∧ Tr (V1, V ′
                                                                                         1 ) ∧ V2 = V
                                                                                                     ′
                                                                                                    2 → Inv(V
                                                                                                               ′
                                                                                                               1 , V ′
                                                                                                                  2 )
                                                   (3)Inv(V1, V2) ∧ A{2}(V1, V2) ∧ V1 = V
                                                                                        ′
                                                                                        1 ∧ Tr (V2, V ′
                                                                                                   2 ) → Inv(V
                                                                                                               ′
                                                                                                               1 , V ′
                                                                                                                  2 )
                                               (4)Inv(V1, V2) ∧ A{1,2}(V1, V2) ∧ Tr (V1, V ′
                                                                                       1 ) ∧ Tr (V2, V ′
                                                                                                   2 ) → Inv(V
                                                                                                               ′
                                                                                                               1 , V ′
                                                                                                                  2 )
                                                                         (5)Inv(V1, V2) ∧ A{1}(V1, V2) → a1 < b1
                                                                         (6)Inv(V1, V2) ∧ A{2}(V1, V2) → a2 < b2
                                                                        (7)Inv(V1, V2) ∧ A{1,2}(V1, V2) → (a1 < b1 ∧ a2 < b2)
                                                                                                           ∨(a1 ≥ b1 ∧ a2 ≥ b2)
                                                             (8)Inv(V1, V2) → ((a1 ≥ b1 ∧ a2 ≥ b2) → c1 > c2)
                                                             (9)Inv(V1, V2) → A{1}(V1, V2) ∨ A{2}(V1, V2) ∨ A{1,2}(V1, V2)
                                                                                             (b)
```
Fig. 1: (a) A program that computes the sum of squares of integer interval [a, b) with a 2-safety specifcation for it, and (b) its frst-order encoding.


# 2 Overview

We illustrate our approach for verifying hyperproperties by reduction to CHC-SAT. We start with the simpler case of k-safety properties, followed by the more general case of ∀ ∗∃ <sup>∗</sup> hyperproperties.

# 2.1 Motivating Example

As a means for highlighting the challenges in verifying hyperproperties, and, in particular, in reducing the problem to CHC solving, we present the example program squaresSum and its 2-safety specifcation from [41] in Fig. 1a. Given positive integers a < b, the program computes the sum of squares of all integers in the interval [a, b). squaresSum is monotone in the sense that as the input interval increases, so does the output c. Formally, this is a 2-safety property that requires that whenever two traces satisfy the pre-condition [a2, b2) ⊂ [a1, b1), they also satisfy the post-condition c<sup>1</sup> > c2, where variable indices correspond to the traces that they represent. This is a special case of k-safety, where the relational property is checked at the end of the executions. More generally, we consider k-safety properties where the relational property is specifed at designated observation points (explained in Sec. 3).

To verify the 2-safety property, a prominent approach is to reduce the problem to a regular safety verifcation problem by composing the program with itself (known as "self composition"). There are (infnitely) many possibilities for aligning the traces in the composed system, and the alignment chosen has direct impact on the complexity of the inductive invariant needed to establish safety. For example, if the two traces of squaresSum are aligned in lockstep, then initially c<sup>1</sup> = c2, after one step, c<sup>1</sup> < c2, and only later on, c<sup>1</sup> > c2. Showing that c<sup>1</sup> > c<sup>2</sup> at the end requires tracking the diference c<sup>1</sup> − c2, which is a complex value because it involves the sum of squares itself. This cannot be captured by an inductive invariant in frst-order logic using theories currently supported by automated solvers (e.g., linear arithmetic) and is therefore beyond reach for state-of-the-art solvers. On the other hand, if the second trace, whose input is the smaller interval, "waits" for a<sup>1</sup> and a<sup>2</sup> to coincide before proceeding in lockstep, then the property that c<sup>1</sup> > c<sup>2</sup> becomes inductive (except for the frst step), greatly simplifying the inductive invariant. It is therefore important to consider the alignment and the (relational) inductive invariant together.

The requirements that the alignment and inductive invariant need to satisfy can be formulated in frst-order logic [41]. To do so, we denote the program variables by V = ⟨a, b, c⟩. We express the initial states and program steps as formulas over V (and primed variant V ′ ) : Init(V ) △= a > 0 ∧ b > a ∧ c = 0, Tr (V, V ′ ) △= a < b ∧ c ′ = c + a · a ∧ a ′ = a + 1 ∧ b ′ = b. To reason about two traces, we use two copies of V , denoted V<sup>1</sup> and V2. We introduce "unknown" predicates Inv, A{1}, A{2}, A{1,2} over ⟨V1, V2⟩ to capture the inductive invariant and desired alignment of the traces. {Au}<sup>u</sup> defne an arbiter that, when A<sup>u</sup> is satisfed, schedules the steps of the traces according to u (for example, schedule u = {1} stands for a step in trace 1 and a stutter in trace 2). The arbiter therefore determines the alignment of the traces. The inductive invariant Inv relates states of the two copies of the program, making it relational.

The problem of searching for the alignment and the inductive invariant simultaneously is then posed as a satisfability problem (modulo the theory of arithmetic) of the formulas in Fig. 1b. To ensure that the arbiter, which determines the alignment, does not avoid violations of the post-condition by making one of the traces stutter forever s.t. it never reaches its fnal state, formulas 5-7 require that the arbiter only schedule a trace if it has not exited the loop, unless both traces exited the loop (in which case both are scheduled). This "validity" requirement means that, at the latest, the arbiter must schedule a trace when the other reaches the fnal state. Formulas 1-4 then ensure that all states that are reachable, subject to the steps permitted by the arbiter, must satisfy Inv. Specifcally, the frst formula ensures the initiation condition of the inductive invariant: the invariant satisfes the pre-condition and includes all the initial states

of the composed system. Formulas 2-4 ensure the consecution of the invariant under every choice the arbiter makes. The 8th formula ensures the safety of the invariant and the last formula mandates that there is always at least one choice that is enabled, and that the system never reaches a "stuck" state.

An interpretation for the unknown predicates Inv, A{1}, A{2}, A{1,2} defnes an arbiter and a corresponding inductive invariant. A possible solution is

$$\begin{aligned} A\_{\{1\}}(V\_1, V\_2) &\stackrel{\triangle}{=} a\_1 < a\_2 \lor (b\_2 \le a\_1 < b\_1) & A\_{\{2\}}(V\_1, V\_2) &\stackrel{\triangle}{=} \bot \\ A\_{\{1,2\}}(V\_1, V\_2) &\stackrel{\triangle}{=} (a\_1 = a\_2 \land a\_1 < b\_2) \lor a\_1 \ge b\_1 \\ Inv(V\_1, V\_2) &\stackrel{\triangle}{=} 0 < a\_1 \le b\_1 \land 0 < a\_2 \le b\_2 \land \left( (a\_1 < a\_2 \land c\_1 \ge c\_2) \lor (a\_1 \ge a\_2 \land c\_1 > c\_2) \right) \end{aligned}$$

This solution captures the arbiter that makes the second trace wait until a<sup>1</sup> = a2, then makes both traces proceed together until the second one exits its loop, in which case the frst trace continues to execute alone until it also exits its loop and both traces are again (vacuously) scheduled together. The solution to Inv captures the corresponding inductive invariant previously discussed.

#### 2.2 Challenges in Encoding Hyperproperty Verifcation as CHC-SAT

The formulas of Fig. 1b, with the exception of the last one, are constrained Horn clauses. That is, when the implications in these formulas are converted to disjunctions, at most one predicate application appears positively in each clause.

Alas, the presence of the last formula precludes direct application of existing CHC solvers. The problem is the disjunction on the right hand side of the implication. Such a disjunction appears to be crucial for a correct encoding of the problem. The reason is that uninterpreted predicates designate semantic relations. With such predicates denoting the choice of schedule, it is easy to drop into a vacuous solution where some states have no corresponding choice and are essentially "stuck", unsoundly making a post-condition violation unreachable. Encoding the requirement that every state have a schedule results in a clause with multiple occurrences of positive literals, capturing inherent disjunctions over the possible choices, which are not Horn. In particular, these disjunctions cannot be eliminated by renaming [37].

Previous works tackled this obstacle either by employing explicit enumeration of alignments that satisfy the non-Horn clause to avoid the disjunction [41], or by developing specialized techniques that are able to handle such disjunctions [45].

#### 2.3 Our Approach: Transformation to CHC

In this paper, we show that the problem of searching for an alignment together with a (relational) inductive invariant can be encoded using CHCs, allowing us to reduce the problem to CHC-SAT, without fxing the alignment a priori.

A key insight of our reduction to CHC-SAT is the use of "doomed" states as a way to avoid the problematic disjunction over all choices of schedules. We refer to a given state as "doomed" if it necessarily reaches a state that violates

D{1}(V1,V2) ∧ D{2}(V1,V2) ∧ D{1,2}(V1,V2) ∧ Init(V1) ∧ Init(V2) ∧ a<sup>2</sup> > a<sup>1</sup> ∧ b<sup>2</sup> < b<sup>1</sup> → ⊥ ¬(a<sup>1</sup> ≥ b<sup>1</sup> ∧ a<sup>2</sup> ≥ b<sup>2</sup> → c<sup>1</sup> > c2) → D{1}(V1, V2) ¬(a<sup>1</sup> ≥ b<sup>1</sup> ∧ a<sup>2</sup> ≥ b<sup>2</sup> → c<sup>1</sup> > c2) → D{2}(V1, V2) ¬(a<sup>1</sup> ≥ b<sup>1</sup> ∧ a<sup>2</sup> ≥ b<sup>2</sup> → c<sup>1</sup> > c2) → D{1,2}(V1, V2) ¬(a<sup>1</sup> < b1) → D{1}(V1, V2) ¬(a<sup>2</sup> < b2) → D{2}(V1, V2) ¬(a<sup>1</sup> < b<sup>1</sup> ∧ a<sup>2</sup> < b2) ∧ ¬(a<sup>1</sup> ≥ b<sup>1</sup> ∧ a<sup>2</sup> ≥ b2) → D{1,2}(V1, V2) D{1}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ D{2}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ D{1,2}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ Tr (V1, V ′ <sup>1</sup> ) ∧ V<sup>2</sup> = V ′ <sup>2</sup> → D{1}(V1, V2) D{1}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ D{2}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ D{1,2}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ V<sup>1</sup> = V ′ <sup>1</sup> ∧ Tr (V2, V ′ <sup>2</sup> ) → D{2}(V1, V2) D{1}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ D{2}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ D{1,2}(V ′ <sup>1</sup> , V ′ <sup>2</sup> ) ∧ Tr (V1, V ′ <sup>1</sup> ) ∧ Tr (V2, V ′ <sup>2</sup> ) → D{1,2}(V1, V2)

Fig. 2: CHC encoding of Fig. 1a.

the hyperproperty along every valid alignment (as opposed to some in the direct encoding). Importantly, due to this conjunctive nature, doomed states lend themselves to a Horn encoding. If an initial state is identifed as doomed (i.e., the CHCs are unsatisfable), then the property is violated and a counterexample can be retrieved. Otherwise, if the set of initial states does not intersect the set of doomed states, then the hyperproperty is proved. Moreover, given an interpretation of the unknown predicates in which the initial states are not doomed, an alignment and a corresponding inductive invariant can be retrieved.

Based on this insight, in Sec. 4, we develop a general transformation of formulas of a certain form, to an equi-satisfable set of CHCs. Furthermore, we provide a transformation of solutions between the two formulations (in both directions). The frst-order formulas to which the transformation is applicable follow the overall structure of the formulas in Fig. 1b, but are somewhat more general. For example, some of the unknown predicates may have additional arguments, which turn out to be useful when considering a broader class of hyperproperties beyond k-safety (∀ ∗∃ ∗ ).

In Sec. 5 we apply the transformation of Sec. 4 to reduce k-safety verifcation to CHC-SAT. When applying the transformation on the formulas encoding our running example (Fig. 1b), we obtain the set of CHCs depicted in Fig. 2 over unknown predicates D{1}, D{2}, D{1,2}.

In the CHCs of Fig. 2, an unknown predicate D<sup>u</sup> represents states that are "doomed" if schedule u is chosen. The frst CHC requires that no initial state that satisfes the pre-condition is completely doomed, i.e., for every such state there is a schedule for which it is not doomed. The remaining CHCs encode the properties of doomed states for each schedule. For example, the CHCs where D{1} is in the head (right hand side of the implication) imply that a state is doomed for schedule {1} if: (a) it violates the post-condition, (b) it already exited the loop and hence trace 1 cannot be the only trace to be scheduled, or (c) it is the pre-state of a transition taken by 1 leading to a post-state that is doomed for every choice u.

A solution to the CHCs in Fig. 2 can be obtained from the solution to the formulas in Fig. 1b by D<sup>u</sup> △= ¬(Inv ∧ Au) for every u ∈ {{1}, {2}, {1, 2}}.

More generally, in Sec. 4, we show a bi-directional transformation of solutions.

#### 2.4 Beyond k-Safety

Our transformation to CHCs is not limited to an encoding of k-safety, but also generalizes to hyperproperties that use ∀ ∗∃ <sup>∗</sup> quantifcation over traces, as presented in Sec. 6.

Hyperproperties with existential trace quantifcation become meaningful in the presence of nondeterminism in the program. For an example of such a property, consider a nondeterministic variant of squaresSum where the assignment c += a \* a is replaced by if (\*) c += a \* a. That is, the increment of c may nondeterministically be skipped. We may now wish to verify that, if [a2, b2) = [a1, b1), then for every trace from input [a1, b1) there exists a trace from input [a2, b2) such that when both terminate, c<sup>1</sup> ̸= c2. This is a ∀∃-hyperproperty.

To verify such properties, a "witness" function is needed to map the universally quantifed traces to the corresponding existentially quantifed traces such that the body of the formula holds for the combination of the traces. Even if a witness function is known, to verify that the combination of the traces satisfes the body of the formula, we still need to fnd a proper alignment of the traces and an inductive invariant. As in the case of k-safety, these components are all interdependent, making it desirable to search for all of them together.

In general, the witness function for the existentially quantifed traces may need to depend on the full universally quantifed traces. However, [8] defnes a sound but incomplete game semantics, in which the witness function essentially constructs the existentially quantifed traces step-by-step, in response to moves of a "falsifer" who reveals the universally quantifed traces step-by-step.

We show in Sec. 6.1 that the problem of searching for a step-by-step witness function, an alignment and a (relational) inductive invariant can be encoded in frst-order logic, and the encoding is amenable to our transformation to CHCs. This results in a sound and complete CHC encoding of the game semantics of [8] for transition systems whose branching degree is bounded by a constant, which we henceforth refer to as "fnite branching".

The idea in the ∀ ∗∃ ∗ -frst-order encoding is to let the unknown predicates A<sup>u</sup> specify not only the schedules chosen by the arbiter but also the choice of existentially quantifed traces for the witness function. To do so, we assign a unique label to each of the possible transitions, and use these labels to identify the transitions along the traces. In this encoding, instead of u denoting a schedule only, it now denotes both a schedule and a choice of labels identifying the next transitions in the existentially quantifed traces according to the witness function. Furthermore, the A<sup>u</sup> predicates receive additional arguments that represent the next labels along the universally quantifed traces.

For example, in the nondeterministic variant of squaresSum, there are at most two possible transitions in each control location. We therefore introduce two labels to distinguish between these possibilities: i for "increment" and s for "skip". The predicates that describe the schedules and the choices of existentially quantifed traces for the ∀∃-hyperproperty of interest are:

$$A\_{\{1\},\text{i}}, A\_{\{2\},\text{i}}, A\_{\{1,2\},\text{i}}, A\_{\{1\},\text{a}}, A\_{\{2\},\text{a}}, A\_{\{1,2\},\text{a}}.$$

They are defned over ⟨V1, V2, a⟩, where a ranges over the possible labels.

Note that in this encoding, the A<sup>u</sup> predicates are no longer defned over ⟨V1, V2⟩ only, but have additional arguments for the labels of the universally quantifed traces, while Inv does not. Thus, the reduction to CHCs applies our transformation in a more general setting than Fig. 1b. Furthermore, since u denotes both a schedule and a choice of labels for the existentially quantifed traces, the number of A<sup>u</sup> predicates depends on the number of labels. To ensure that there are fnitely many predicates, we require the transition system to have a fnite branching degree (otherwise, the space of possible labels becomes infnite).

Finally, in Sec. 6.2, we extend our approach to handle infnite (or unbounded) branching in the transition system, which can result, for example, from reading an input from an infnite domain. To do so, we introduce another frst-order encoding that roughly replaces the infnitely-many concrete choices of transitions by fnitely-many abstract choices. Unlike the cases of k-safety and ∀ ∗∃ ∗ hyperproperties with fnite branching, the resulting encoding is sound but incomplete w.r.t. the game semantics. By applying our transformation, we obtain a sound (albeit incomplete) reduction to CHC-SAT.

# 3 Background

We use frst-order logic to model systems and their properties. Throughout the paper, we fx a background frst-order theory T and denote its signature by Σ.

Transition Systems A (symbolic, labeled) transition system is a tuple TS = (V, a,Init, Tr ), where V is a vocabulary, i.e., a vector of (logical) variables, each associated with a sort from Σ, denoting state variables; a is a label variable; Init is a formula over Σ with free variables V , and Tr is a formula over Σ with free variables V ∪ {a} ∪ V ′ , where V ′ consists of the primed variants of V .

A state of TS is a valuation to V , and we denote by S the set of all such valuations; L is the set of values that a can take, called labels; S<sup>0</sup> ⊆ S is the set of initial states, which consists of all valuations that satisfy Init, and R ⊆ S×L×S is the transition relation, which consists of the valuations for the composite vocabulary V ∪ {a} ∪ V ′ that satisfy Tr . For simplicity, we assume that R is total, i.e., ∀s ∈ S ∃ℓ ∈ L, s′ ∈ S · (s, ℓ, s′ ) ∈ R 3 . We say that TS is deterministic when ∀s ∈ S, ℓ ∈ L · {s ′ | R(s, a, s′ )}  = 1 and that it has fnite branching when L is fnite. A trace of TS is an infnite sequence of states t = s0, s1, . . . such that for every i ≥ 0 there exists ℓ ∈ L such that (s<sup>i</sup> , ℓ, si+1) ∈ R. We denote by t[i] the i'th state in t. We further denote the set of traces that start from a state s by T(s), and the set of all traces of TS by T.

Hyperproperties and their specifcation. We consider a fragment of the relational logic OHyperLTL [6] , which we call ∀ ∗∃ ∗ -OHyperLTL with formulas of the form: φ = ψ → ∀π<sup>1</sup> : ξ1, . . . , π<sup>l</sup> : ξ<sup>l</sup> · ∃πl+1 : ξl+1, . . . , π<sup>k</sup> : ξ<sup>k</sup> · □ϕ

<sup>3</sup> w.l.g.; Tr can always be replaced by Tr∨((∀a ∀V ′ ·¬Tr )∧V ′ = V ), which corresponds to adding self loops to states that have no outgoing transition.

where π<sup>i</sup> are trace variables whose intended valuations are taken from T; ξ<sup>i</sup> are (non-temporal) formulas with free variables V that determine observation points along the k traces, where the traces must synchronize; ψ is a pre-condition that is assumed to hold initially; and ϕ needs to globally hold when all traces reach the observation points (which they must synchronize on before moving on). V<sup>j</sup> denotes a copy of V where all variables are indexed by j. We refer to the variables in V<sup>j</sup> as the state variables of the j'th trace (namely, π<sup>j</sup> ). When l = k, i.e., all quantifers are universal, φ is a k-safety property. A relational pre/post specifcation, as used in our motivating example, is a special case of a k-safety property where the observable points are the fnal states (which are augmented with self loops). For example, Fig. 1a presents the ∀ ∗∃ ∗ -OHyperLTL specifcation of the motivating example. When l < k, the formula also includes existential quantifers, extending expressiveness to include some hyperliveness properties. An example of a security hyperliveness property that can be expressed in ∀ ∗∃ ∗ - OHyperLTL is generalized non-interference (GNI) [38]. GNI requires that for any two traces π<sup>1</sup> and π<sup>2</sup> there exists a trace π<sup>3</sup> whose high (secret) inputs agree with π<sup>1</sup> and whose low (public) inputs and outputs agree with π2.

∀ ∗∃ ∗ -OHyperLTL formulas are interpreted over transition systems. Intuitively, φ holds in a transition system if from every k initial states that jointly satisfy the pre-condition ψ, for every l traces from the frst l states there exist corresponding k−l traces from the remaining k−l states s.t. the composed states of all traces globally satisfy ϕ, when the traces are projected to their observation points. Formally, given a transition system TS and φ as above, we refer to a tuple (s1, . . . , sk) of k states of TS as a composed state. A composed state defnes a valuation to V<sup>1</sup> ∪ . . . ∪ Vk, where s<sup>j</sup> is the valuation of V<sup>j</sup> . A composed state is initial if s<sup>i</sup> ∈ S<sup>0</sup> for every 1 ≤ i ≤ k. We say that TS |= φ if for every initial composed state s = (s1, . . . , sk) such that s |= ψ the following holds: for every t1, . . . , t<sup>l</sup> ∈ T(s1) × · · · × T(sl) there exist tl+1, . . . , t<sup>k</sup> ∈ T(sl+1) × · · · × T(sk) such that (|t1|)ξ<sup>1</sup> , . . . ,(|tk|)ξ<sup>k</sup> |= □ϕ, where (|t<sup>i</sup> |)ξi is the projection (fltering) of trace t<sup>i</sup> to states satisfying ξ<sup>i</sup> . The semantics of □ϕ is that t ′ 1 , . . . , t′ k |= □ϕ if ∀i ≤ min(|t ′ 1 |, . . . , |t ′ k |)·(t ′ 1 [i], . . . , t′ k [i]) |= ϕ. Note that the semantics is oblivious to the transition labels since labels are only implicit in traces. Labels become useful in Sec. 6, where we use them to identify transitions along traces.

Remark 1. To simplify the presentation we consider hyperproperties defned w.r.t. a single transition system. The extension to multiple transition systems is straightforward. Similarly, □ϕ can be generalized to any temporal safety property via the standard automata-theoretic approach to model checking.

Constrained Horn Clauses (CHCs) are defned over a signature Σ′ that extends Σ with a set P of (uninterpreted) predicates. Symbols in Σ are called interpreted, while the predicates in P are uninterpreted (sometimes called unknown). Firstorder formulas over Σ are called constraints. A CHC is a frst-order formula of the form ∀X · V <sup>i</sup> Pi(Xi) ∧ φ(X ) → H(XH) where X is a vector of (logical) variables; P<sup>i</sup> ∈ P (not necessarily distinct, i.e., it is possible that P<sup>i</sup><sup>1</sup> = P<sup>i</sup><sup>2</sup> for

α(V) → Inv(V) Inv(V) ∧ β(V) → ⊥ ∀ Inv(V) ∧ Au(V, W) ∧ γu(V, W) → ⊥ ∀ Inv(V) ∧ Au(V, W) ∧ δu(V, V ′ , W) → Inv(V ′ ) Inv(V) → \_ u∈U Au(V, W) (a) ^ u∈U Du(V, W) ∧ α(V) → ⊥ ∀ β(V) → Du(V, W) ∀ γu(V, W) → Du(V, W) ∀ ^ u′∈U Du′ (V ′ , W′ ) ∧ δu(V, V ′ , W) → Du(V, W) (b) <sup>∀</sup> <sup>=</sup> <sup>∀</sup><sup>u</sup> <sup>∈</sup> <sup>U</sup> 

Fig. 3: Formula scheme before (a) and after (b) the transformation.

i<sup>1</sup> ̸= i2); H is either ⊥ or a predicate from P; X<sup>i</sup> , X<sup>H</sup> ⊆ X ; and φ is a constraint. The universal quantifcation over X is often omitted.

A set of CHCs (or, more generally, frst-order formulas) is satisfable (modulo T ) if it has satisfying model M such that the projection of M onto Σ is a model of T . A solution to a set of CHCs maps every predicate in P to a formula over Σ that defnes it such that substituting all occurrences of the predicates by their defnitions results in formulas that are valid modulo T . If a set of CHCs has a solution then it is satisfable. However, the converse may not hold due to the limited expressive power of frst-order formulas.

# 4 General Transformation to CHCs

In this section we describe a satsifability-preserving transformation that lets us convert a set of formulas, which adheres to a specifc FOL scheme, to an equi-satisfable set of CHCs. An extended version, with step-by-step details of the transformation, appears in [34]. Later we show how verifcation of a ∀ ∗∃ ∗ - OHyperLTL property can be captured by a set of formulas of the aforementioned scheme, where this transformation allows us to then reason about the correctness of the ∀ ∗∃ ∗ -OHyperLTL property by deciding the satisfability of the CHCs.

Consider the scheme in Fig. 3a for a set of formulas over a signature Σ′ that extends the signature Σ of the background theory by unknown predicates Inv and {Au}u∈<sup>U</sup> , for some fnite set U. V, V ′ , W denote disjoint vocabularies, i.e., vectors of (logical) variables that are implicitly universally quantifed. A row prefxed by ∀ indicates |U| formulas, where u is substituted by all corresponding values from U. α, β, γu, δ<sup>u</sup> designate constraints (no occurrence of Inv or Au).

At a high level, formulas 1 and 4 in Fig. 3a use Inv to capture an inductive invariant of the "states" (valuations to V) reachable from α by "transitions" of δu, restricted according to a choice u ∈ U of an "arbiter" {Au}u. Formula 2 establishes the fact that the reachable states are disjoint from some "bad states" β. Formulas 3 allow to enforce that the arbiter meets certain requirements, and formula 5 ensures that the arbiter makes a choice for every "state" in Inv.

Example 1. For our running example, we have V = ⟨V1, V2⟩ = ⟨a1, b1, c1, a2, b2, c2⟩, V ′ = ⟨V ′ 1 , V ′ 2 ⟩ = ⟨a ′ 1 , b′ 1 , c′ 1 , a′ 2 , b′ 2 , c′ 2 ⟩, and W = ⟨⟩ (The extra vocabulary W will

come into use later in the paper). U is the set of arbitration choices {{1}, {2}, {1, 2}}, and the corresponding completion of the constraint holes α, β, γu, δ<sup>u</sup> is easily discernible. (Note that a constraint on the right of → corresponds to its negation on the left.)

Note that the last formula in Fig. 3a is not a CHC since its head is a disjunction of unknown predicates. To remedy this shortcoming, we transform the formulas in Fig. 3a into the set of CHCs in Fig. 3b. The CHCs obtained for the running example are included in the extended version of the paper [34]. The transformation ensures:

Theorem 1. The set of formulas in Fig. 3a is equi-satisfable to the system of CHCs in Fig. 3b. Furthermore, there is an efcient translation of models of the former to models of the latter, and vice versa.

Proof. The extended version of the paper [34] includes a stepwise transformation that shows how the CHCs in Fig. 3b are obtained from the formulas in Fig. 3a, where each step preserves equi-satisfability and models. Here, due to space constraints, we only describe the fnal translation between models, which we have verifed with Z3:

$$\begin{array}{l|l} \text{Given } Inv, A\_u \mid \vdash \text{[Fig. 3a]} & \text{Given } D\_u \vdash \text{[Fig. 3b]} \\ \hline D\_u(\mathcal{V}, \mathcal{W}) \triangleq \neg(Inv(\mathcal{V}) \land A\_u(\mathcal{V}, \mathcal{W})) & Inv(\mathcal{V}) \triangleq \forall \mathcal{W} \cdot \bigvee\_{u \in U} \neg D\_u(\mathcal{V}, \mathcal{W}) \\ \hline & A\_u(\mathcal{V}, \mathcal{W}) \triangleq \neg D\_u(\mathcal{V}, \mathcal{W}) \end{array}$$

# 5 Encoding k-Safety Verifcation as CHCs

In this section we address the problem of verifying k-safety properties via a CHC encoding. To this end, we start with a natural, non-Horn, encoding of the problem, as described in the previous section and previous works [41,45,8], and apply our transformation to obtain an equi-satisfable system of CHCs.

Consider the k-safty formula: φ = ψ → ∀π<sup>1</sup> : ξ1, . . . , π<sup>k</sup> : ξ<sup>k</sup> · □ϕ This formula holds in a transition system TS if, starting from initial composed states that satisfy the pre-condition ψ, the observable states along every tuple of k traces satisfy ϕ, when the observable states are reached synchronously. Note that a pre-post specifcation, as used in our motivating example, is a special case of such a formula where the observable states are the fnal states. Verifying φ corresponds to fnding (1) an alignment of the traces that synchronizes the observation points defned by ξ1, . . . , ξk, and (2) an inductive invariant that establishes that ϕ holds whenever ξ1, . . . , ξ<sup>k</sup> hold. Note that the invariant needs to be inductive along the aligned traces, including intermediate states between observable points. As diferent alignments give rise to diferent inductive invariants, it is desirable to fnd both of them simultaneously [41].

As before, we model the alignment using an arbiter that schedules a subset ∅ ̸= M ⊆ {1, . . . , k} of the traces to make a step based on the current composed state s1· · · sk. The arbiter may be nondeterministic, but it must choose at least

$$\begin{array}{c} \bigwedge\_{i} Int(\mathcal{V}\_{i}) \wedge \psi(\mathcal{V}) \to Inv(\mathcal{V}) \\ Inv(\mathcal{V}) \wedge Rad(\mathcal{V}) \to \bot \\ \hline \big[\begin{array}{c} \big[\begin{array}{c} Inv(\mathcal{V}) \wedge A\_{M}(\mathcal{V}) \wedge \neg valid\_{M}(\mathcal{V}) \to \bot \\ Inv(\mathcal{V}) \wedge A\_{M}(\mathcal{V}) \wedge \delta\_{M}(\mathcal{V}, \mathcal{V}) \to Inv(\mathcal{V}) \end{array} \end{array} \Big] \begin{array}{c} \bigwedge\_{M} D\_{M}(\mathcal{V}) \wedge \bigwedge\_{i} Int(V\_{i}) \wedge \psi(\mathcal{V}) \to \bot \\ \hline \bigwedge\_{M} D\_{M}(\mathcal{V}) \to Inv(\mathcal{V}) \end{array} \Big] \\\ luv(\mathcal{V}) \to \bigvee\_{M} A\_{M}(\mathcal{V}) \\\ \big(\big[\begin{array}{c} \big(\big[\begin{array}{c} \big(\big(\mathcal{V}\right) \Rightarrow \big(\big(\mathcal{V}\right) \end{array} \Big) \big) \big(\mathcal{V}\big) \end{array} \Big) \begin{array}{c} \big(\big(\mathcal{V}\big) \rightarrow \big(\big(\mathcal{V}\right) \rightarrow \bot \\\\ Inv(\mathcal{V}) \end{array} \Big) \end{array}$$

Fig. 4: k-safety formula scheme before (a) and after (b) the transformation.

one set M. Furthermore, the arbiter must respect the synchronization of the observation points: it must not let a trace proceed beyond its observation point before the other traces reached theirs. This motivates the following defnition.

Defnition 1 (valid schedules). M is a valid schedule for a composed state s1· · · s<sup>k</sup> if either of the following two conditions holds: 1. ∀i ∈ M · s<sup>i</sup> ̸|= ξ<sup>i</sup> 2. ∀i ∈ M · s<sup>i</sup> |= ξ<sup>i</sup> and M = {1, ..., k}.

Intuitively, the observation points act as a "barrier". All traces must reach the observation point before any of them can progress past it; and when they do, they do it simultaneously.<sup>4</sup>

To reason about composed states, we defne a vocabulary V = V<sup>1</sup> ∪ . . . ∪ V<sup>k</sup> that consists of the set of state variables of all traces. We encode the arbiter using a family of unknown predicates {AM(V)}<sup>M</sup> for every ∅ ̸= M ⊆ {1, . . . , k} and the inductive invariant using an unknown predicate Inv(V). We express the situation where all traces reach an observable state but ϕ does not hold using the constraint: Bad(V) △= V i ξi(Vi) ∧ ¬ϕ(V). The joint steps of the traces as determined by the schedule M are given by the following constraint:

$$\begin{array}{c} \Delta\_{M}(\mathcal{V}, \mathcal{V}', a\_{1}, \dots, a\_{k}) \stackrel{\triangle}{=} \bigwedge\_{i \in M} Tr(V\_{i}, a\_{i}, V'\_{i}) \wedge \bigwedge\_{i \notin M} V\_{i} = V'\_{i} \\ \delta\_{M}(\mathcal{V}, \mathcal{V}') \stackrel{\triangle}{=} \exists a\_{1}, \dots, a\_{k} \cdot \Delta\_{M}(\mathcal{V}, \mathcal{V}', a\_{1}, \dots, a\_{k}) \end{array}$$

Note that the label variables are existentially quantifed<sup>5</sup> , indicating that any labeled transition can be used. The defnition of a valid schedule is captured by:

$$valid\_M(\mathcal{V}) \stackrel{\triangle}{=} \begin{cases} \bigwedge\_{i \in M} \neg \xi\_i(V\_i) & M \neq \{1, \dots, k\} \\ \left(\bigwedge\_{i \in M} \neg \xi\_i(V\_i)\right) \vee \left(\bigwedge\_{i \in M} \xi\_i(V\_i)\right) & M = \{1, \dots, k\} \end{cases} \tag{1}$$

Fig. 4a formalizes the joint requirements of the arbiter and the inductive invariant that ensures that φ holds. The following theorem summarizes the soundness of the encoding, which is a slight generalization of the encoding in [41] (where only pre/post specifcations are considered):

<sup>4</sup> The requirement that all traces leave the observation point in tandem saves the need to record which of them already made a step since the last observation point.

<sup>5</sup> Since δ<sup>M</sup> appears on the left-hand side of an implication, existential quantifers can be pushed outside as universal quantifers, resulting in quantifer-free bodies.

```
1 sum = 0;
   2 b = *;
   3 if (b > 0) {
   4 i = 0;
   5 while (i < n - 1) {
   6 sum = sum + A[ i ];
   7 i ++;
   8 }
   9 }
                           10 else {
                           11 i = 1;
                           12 while (i < n) {
                           13 y = *;
                           14 sum = sum + A [i] + y ;
                           15 i ++;
                           16 }
                           17 }
(A1 = A2 ∧ n1 = n2) → ∀π1 : pc = 5 ∃π2 : pc = 5 ∨ pc = 12 · □(b2 ≤ 0 ∧ sum1 = sum2)
```
Fig. 5: Example for a ∀∃ hyperproperty.

#### Theorem 2. The set of formulas in Fig. 4a is satisfable if TS |= φ.

Example 2. Applying the scheme of Fig. 4a to the program and ∀ ∗∃ ∗ -OHyperLTL specifcation of the 2-safety property from Fig. 1a results in Fig. 1b, except for moving constraints to the right hand side of the implication when it assists readability. Note that in this example, the observation points ξ<sup>i</sup> of both traces correspond to the condition for exiting the loop (which is the negated loop condition). As a result valid {i} △= a<sup>i</sup> < b<sup>i</sup> for i ∈ {1, 2} and valid {1,2} △= (a<sup>1</sup> < b<sup>1</sup> ∧ a<sup>2</sup> < b2) ∨ (¬(a<sup>1</sup> < b1) ∧ ¬(a<sup>2</sup> < b2)).

The set of formulas in Fig. 4a fts the general scheme of Fig. 3a; Thus, it is amenable to our general satisfability-preserving transformation, the CHCs in Fig. 4(b). Since the transformation is satisfability preserving, we obtain the following as a corollary of Thm. 1 and 2:

Corollary 1. The system of CHCs in Fig. 4b is satisfable if TS |= φ.

Where AM(V) in Fig. 4a describes the states where choosing schedule M leads to successful verifcation with Inv as an inductive invariant, DM(V) in Fig. 4b can be understood as describing states where choosing M would prevent the verifcation from going through in the sense that no inductive invariant would exist. In other words, these states are "doomed" if M is chosen, hence the choice of notation. If the set of CHCs in Fig. 4b is satisfable, it proves that initial states that satisfy the pre-condition are not doomed. This intuition can be interpreted in a dual manner: if the initial states are not doomed, then there exists an alignment for which a safe inductive invariant exist.

#### 6 Encoding ∀ ∗∃ <sup>∗</sup> Hyperproperties as CHCs

In this section we consider the more general case of ∀ ∗∃ ∗ -OHyperLTL specifcations. Throughout the section, TS is a transition system, and we fx a formula:

$$\varphi = \; \psi \to \forall \pi\_1 : \xi\_1, \dots, \pi\_l : \xi\_l \cdot \exists \pi\_{l+1} : \xi\_{l+1}, \dots, \pi\_k : \xi\_k \cdot \Box \phi$$

In order to encode the problem of deciding if TS |= φ as a satisfability problem, we follow [8], and consider a game semantics, which is natural due to the alternation of quantifers. The ∀ and ∃ quantifers are "demonic" and "angelic", thus controlled by the falsifer and the verifer, respecitvely.

In the following, we introduce the game semantics of [8] for ∀ ∗∃ ∗ -OHyperLTL. We then encode truth of φ in TS under the game semantics as a satisfability problem, and use the transformation from Sec. 4 to obtain a system of CHCs that is satisfable if TS satisfes φ according to the game semantics.

Example 3. To illustrate the game semantics, we use the example in Fig. 5, which accompanies this section. The presented program computes the sum of an array slice, nondeterministically choosing between the slice A[0..n−2] and A[1..n−1]. For the second variant, an arbitrary integer can be added to each summand. This allows the program to fulfll the specifcation at the bottom, which requires that for every execution there is a corresponding execution of the second variant (where b<sup>2</sup> ≤ 0) such that the sums at lines 5 and 12 align at every iteration. The specifcation is valid because y at line 13 can always be chosen to compensate for the deviation due to the index i not being the same.

Considering the game semantics, the falsifer frst has to choose a value for b, which can be either positive or nonpositive. If it is nonpositive, then the verifer wins the game vacuously because ξ<sup>1</sup> △= (pc = 5) is never reached. If the choice is positive, then the verifer must choose nonpositive to satisfy b<sup>2</sup> ≤ 0 from the specifcation. In subsequent steps, the verifer must select a scheduling that will align pc<sup>1</sup> = 5 and pc = 12 at every iteration, and select a value for y such that after both assignments (lines 6 and 14) sum<sup>1</sup> = sum<sup>2</sup> is satisfed. When following these choices, the verifer manages to satisfy sum<sup>1</sup> = sum<sup>2</sup> at all observation points, which gives it a winning play.

Safety games are played between a verifer, whose goal is to avoid bad states, and a falsifer who tries to reach a bad state. Formally, the game is a tuple G = (VS, FS, S0, δ<sup>V</sup> , δ<sup>F</sup> , B) where VS are verifer states, in which the verifer moves, and FS are falsifer states, in which the falsifer moves, and VS ∩FS = ∅. The game states are S = VS ∪ FS. S<sup>0</sup> ⊆ S is a set of initial states, and B ⊆ S is a set of bad states. δ<sup>V</sup> ⊆ VS × S defnes the possible moves of the verifer and δ<sup>F</sup> ⊆ FS × S—of the falsifer. It is assumed that δ<sup>V</sup> , δ<sup>F</sup> are total i.e., there is at least one move for each player from every state. A play is a sequence of game states σ0, σ1, . . . such that σ<sup>0</sup> ∈ S0, and for every i ≥ 0, (σ<sup>i</sup> , σi+1) ∈ δ<sup>V</sup> ∪ δ<sup>F</sup> . The play is winning for the verifer if it is infnite and σ<sup>i</sup> ̸∈ B for every i ≥ 0. A (memoryless) strategy for the verifer is a function χ : VS → S such that (σ, χ(σ)) ∈ δ<sup>V</sup> for every σ ∈ VS. χ is a winning strategy for the verifer if all the plays in which the verifer moves according to χ are winning for the verifer.

Game semantics for ∀ ∗∃ ∗ -OHyperLTL Let φ be as above. The game that captures the semantics of φ is defned with respect to a deterministic labeled transition system TS = (V, a,Init, Tr ). (We can always determinize TS by extending the set of labels without afecting the semantics; this step may introduce infnitely many labels, which do not require any special treatment in the defnition of the game, but whose CHC encoding will be addressed in Sec. 6.2.)

The game for φ and TS proceeds in rounds, where in each round the falsifer makes a move and the verifer responds. The falsifer states are composed states (of k traces), and the verifer states augment them with a record of the falsifer's last move. The bad states are falsifer states where all traces are in their observation points but ϕ does not hold. The falsifer is responsible for choosing the transitions that defne the ∀ traces t1..l assigned to π1..l. The verifer responds by choosing the transitions of the ∃ traces tl+1..k assigned to πl+1..k. Here the labels of the transitions come into play: the players specify the transitions of choice by picking a label ℓ ∈ L for each trace. (Since TS is deterministic, transitions are uniquely identifed by labels.) The traces then need to be aligned s.t. they synchronize on their observation points defned by ξ<sup>i</sup> . The alignment does not afect the winner of the play, as long as it is a valid alignment. However, as in the case of k-safety, the alignment is instrumental for obtaining a winning strategy that has a simple description. As a result, the choice of the (valid) alignment is also left to the verifer. Altogether, a move of the falsifer consists of picking labels ℓl , . . . , ℓ<sup>l</sup> ∈ L for the ∀ trace variables; a move of the verifer consists of picking a valid subset ∅ ̸= M ⊆ {1, ..., k} of the traces to progress (as in Sec. 5) and also labels ℓl+1, . . . , ℓ<sup>k</sup> ∈ L for the ∃ trace variables, and proceeding to the resulting composed state.<sup>6</sup> In this manner, the verifer iteratively "reads of" the states of t1..l, properly aligned, and generates the traces tl+1..k, while avoiding the bad states. If the verifer can do so indefnitely, then this proves that φ holds.

Formally, the components of the game are as follows (here, M represents a valid schedule according to Defnition 1):

FS = S <sup>k</sup> VS = S <sup>k</sup> × L <sup>l</sup> S<sup>0</sup> = {s ∈ S k 0 | s |= ψ} B = {s ∈ FS | s ̸|= ϕ and s<sup>i</sup> |= ξ<sup>i</sup> for every 1 ≤ i ≤ k} δ<sup>F</sup> = {(s,(s, ℓ ∀ )) | s ∈ FS, ℓ <sup>∀</sup> ∈ L <sup>l</sup>} δ<sup>V</sup> = {((s, ℓ ∀ ), s ′ ) | s M,ℓ ⇝ s ′ for ℓ <sup>∃</sup> ∈ L <sup>k</sup>−<sup>l</sup>} The notation s M,ℓ ⇝ s ′ indicates that s ′ is obtained from s by taking the transition with label ℓ<sup>i</sup> from s<sup>i</sup> whenever i ∈ M, and stuttering otherwise, where ℓ = ⟨ℓ1, . . . , ℓk⟩. We refer to it as a transition of the composed system according to schedule M labeled ℓ. The labels are split into ℓ <sup>∀</sup> = ⟨ℓ1, .., ℓl⟩ and ℓ <sup>∃</sup> = ⟨ℓl+1, .., ℓk⟩. Formally, s M,ℓ ⇝ s ′ ⇐⇒ V <sup>i</sup>∈<sup>M</sup> <sup>R</sup>(s<sup>i</sup> , ℓ<sup>i</sup> , s′ i ) ∧ V <sup>i</sup≯∈<sup>M</sup> s<sup>i</sup> = s ′ i .

Example 4. In the example of Fig. 5, the labels of transitions are integer values that refect the choice of \* at lines 2 and 13 (and have no efect on other states). The verifer and falsifer specify their moves using these labels. For example, in order to ensure that sum<sup>1</sup> = sum<sup>2</sup> is satisfed at every iteration, the verifer selects a transition label ℓ = A[i − 1] − A[i] in line 13, which sets the value of y accordingly; after both assignments at lines 6 and 14, sum<sup>1</sup> = sum<sup>2</sup> holds.

The game semantics of ∀ ∗∃ ∗ -OHyperLTL is based on the winner in the verifcation game:

Defnition 2 (Game Semantics for ∀ ∗∃ ∗ -OHyperLTL [8]). Let TS be a transition system and φ a ∀ ∗∃ ∗ -OHyperLTL formula. TS satisfes φ according

<sup>6</sup> In [8], steps of the verifer are split to two. Our defnition is more precise in the sense that a winning strategy in the game of [8] implies a winning strategy in our game.

to the game semantics, denoted TS |=<sup>G</sup> φ, if the verifer has a winning strategy in the verifcation game GTS,φ.

Theorem 3 (as shown in [8]). If TS |=<sup>G</sup> φ then TS |= φ.

#### 6.1 CHC Encoding of the Game with Finite Branching

To encode the verifcation game for φ and TS, we introduce unknown predicates {Au}u∈<sup>U</sup> that describe the strategy of the verifer as well as an unknown predicate Inv that encodes an inductive invariant that ensures that the strategy is winning. We frst consider the case where the set of labels L is fnite, i.e., TS has a fnite branching. This makes it possible to defne U as the set of all possible concrete choices of the verifer and introduce a predicate A<sup>u</sup> per every possible choice of the verifer. To do so, we defne U = M × L k−l , where M = P({1, . . . , k}) \ {∅} is the set of possible schedules, and L <sup>k</sup>−<sup>l</sup> are the choice labels for constructing the traces assigned to {πi}i=l+1..k. Note that U is fnite in this case. For each u = ⟨M, ℓ ∃ ⟩ ∈ U, the predicate A<sup>u</sup> describes the verifer states in which the verifer chooses u for its move. Recall that verifer states consist of both the previous state of the verifer, captured by the composed state vocabulary V defned as before, and the last move of the falsifer, captured by label variables ⟨a1, . . . , al⟩. We denote L <sup>∀</sup> = ⟨a1, . . . , al⟩, L <sup>∃</sup> = ⟨al+1, . . . , ak⟩ and L = L <sup>∀</sup> ∪ L<sup>∃</sup> = ⟨a1, . . . , ak⟩. Then, the A<sup>u</sup> predicates are defned over V ∪ L<sup>∀</sup> . The Inv predicate is defned over V only, as it describes a set of falsifer states.

The formulas in Fig. 6a formalize the requirements that ensure that {Au}<sup>u</sup> defnes a winning strategy for the verifer, while accounting for the alternating choices of the falsifer (ℓ ∀ ) and verifer (⟨M, ℓ ∃ ⟩) in every round, where

$$\begin{aligned} \Delta\_M(\mathcal{V}, \mathcal{V}', \mathcal{L}) &= \bigwedge\_{i \in M} Tr(V\_i, a\_i, V'\_i) \wedge \bigwedge\_{i \notin M} V\_i = V'\_i \\ \delta\_{M, \mathbb{Z}^\exists}(\mathcal{V}, \mathcal{V}', \mathcal{L}^\forall) &= \Delta\_M(\mathcal{V}, \mathcal{V}', \mathcal{L}) \big[ \mathcal{L}^\exists \to \overline{\mathcal{L}}' ] \qquad \operatorname{Bad}(\mathcal{V}) = \bigwedge\_i \xi\_i(V\_i) \wedge \neg \phi(\mathcal{V}) \end{aligned}$$

<sup>∆</sup><sup>M</sup> is the formula expression of M,ℓ ⇝ from above. That is, s, <sup>s</sup> ′ , ℓ (valuations to V, V ′ ,L) satisfy ∆<sup>M</sup> if the composed system according to M has a transition from s to s ′ labeled ℓ. δM,ℓ<sup>∃</sup> is then the projection of ∆<sup>M</sup> to a concrete choice of labels ℓ ∃ for the existentially quantifed traces; the labels for the universals, captured by L ∀ , remain free.

#### Theorem 4. The set of formulas in Fig. 6a is satisfable if TS |=<sup>G</sup> φ.

Proof. A solution for Fig. 6a induces a winning strategy χ for the verifer in the game for φ and TS: χ(s, ℓ ∀ ) = s ′ for s |= Inv, where s ′ is reached by choosing ⟨M, ℓ ∃ ⟩ (i.e., s, s ′ , ℓ ∀ |= δM,ℓ<sup>∃</sup> ) such that s, ℓ ∀ |= AM,ℓ<sup>∃</sup> ; such s ′ must exist because the last formula states that there must always be a choice for the verifer in falsifer states that satisfy Inv. For s ̸|= Inv, χ(s, ℓ ∀ ) is defned arbitrarily. In the other direction, given a winning strategy for the verifer, we defne the interpretation of Inv to be its winning region and the interpretation of AM,ℓ<sup>∃</sup> to consist of the falsifer states (s, ℓ ∀ ) where the strategy chooses s ′ such that s, ℓ ∀ |= AM,ℓ<sup>∃</sup> .

Fig. 6: A game formula scheme before (a) and after (b) the transformation, where ∀ = ∀⟨M, ℓ ∃ ⟩ ∈ U.

Remark 2. For k-safety properties, the encoding in Fig. 6a, based on the game semantics, is equivalent to the encoding in Fig. 4a (Sec. 5). In particular, in this case, the set L ∃ is empty, which means that ℓ <sup>∃</sup> = ⟨⟩, resulting in a game with fnite branching, namely only the choices of the schedule M. Note that for such properties, the benefts of the game semantics are less obvious since if TS |= φ, then every strategy is winning for the verifer.

Encoding safety games in general The game encoding in Fig. 6a and Thm. 4 are stated here for the specifc safety games corresponding to ∀ ∗∃ ∗ -OHyperLTL verifcation in order to avoid additional notational burden. However, the result is applicable to a more general class of safety games where the moves of the players are organized in rounds, each of which comprises of a move of the falsifer, followed by a move of the verifer. Furthermore, the states of the verifer are "intermediate states" defned as VS = FS × Ω, where Ω is a set of auxiliary states used to record the last falsifer move. The initial and bad states are falsifer states. The verifer moves to a new state according to the previous state together with the auxiliary state, while the falsifer is only allowed to choose the auxiliary part of the state. Therefore, δ<sup>F</sup> ⊆ {⟨s, ˆ ⟨s, ω ˆ ⟩⟩ | sˆ ∈ FS, ω ∈ Ω}. The encoding extends to such games, where Init(Vi) ∧ ψ(V) is replaced by an encoding of S0; Bad is replaced by an encoding of B; δM,ℓ<sup>∃</sup> (V, V ′ ,L ∀ ) is replaced by an encoding of δ<sup>V</sup> ◦ δ<sup>F</sup> as a formula where the falsifer state variables and the choices of the falsifer are free, and validM(V) is replaced by a guard encoded over the same free variables that ensures that the verifer step is applicable. Accordingly, our subsequent results (including the CHC encoding) extend to any such game.

Applying our transformation to the formulas in Fig. 6a results in the CHCs in Fig. 6b. Intuitively, AM,ℓ<sup>∃</sup> describe the winning strategy for the verifer: for "safe" states, represented by Inv, and given a move made by the falsifer, if the verifer chooses to move according to ⟨M, ℓ ∃ ⟩, then it stays in the "safe" region. In contrast, DM,ℓ<sup>∃</sup> represents "doomed" states. Namely, if the verifer chooses to move according to ⟨M, ℓ ∃ ⟩ from a state in DM,ℓ<sup>∃</sup> , then the falsifer can force reaching a bad state for every choice of the verifer in the next steps of the game.

### Corollary 2. The set of CHCs in Fig. 6b is satisfable if TS |=<sup>G</sup> φ.

Example 5. The example in Fig. 5 fts the case of fnite branching if we assume that the integer values in the array A and those of sum and t are bounded modulo 2 <sup>m</sup>, and so are the labels L. This means that the falsifer has 2<sup>m</sup> possible steps at each game state, and the verifer has 3·2 <sup>m</sup> (3 is the number of possible schedules out of {1, 2}). In the next subsection we explain how to encode the problem when the integers are considered to be unbounded.

#### 6.2 CHC Encoding of the Game with Infnite Branching

The set of formulas in Fig. 6a, and the corresponding system of CHCs in Fig. 6b is well defned when the set U is fnite. However, if L is infnite, so is U. In this case, instead of using L k−l to specify the traces chosen by the verifer, we defne a fnite, abstract set of composed labels, denoted L ♯ , to be used by the verifer (the falsifer will continue to use the concrete labels to specify his transitions of choice). Each abstract label in L ♯ is a relational predicate p with free variables V (the composed vocabulary) that relates the states of diferent traces. Thus, the vector of individual existential choices ℓ <sup>∃</sup> of the verifer is now replaced with a single choice of a (relational) predicate p ∈ L <sup>♯</sup> over all the copies. In contrast to the use of concrete labels to specify the (unique) next transition for each trace individually, an abstract label p ∈ L <sup>♯</sup> determines the next transitions for the ∃ traces by relating their post-states to the rest of the composed post-state.

Specifcally, given a set of labels ℓ ∀ for the ∀ traces and a schedule M, a predicate p ∈ L ♯ is used as a restriction (inspired by the homonymous concept from [8]) of the transitions of the composed system according to schedule M with ∀-choices ℓ, restricting the set of aforementioned transitions to those where the composed post-state satisfes p.

Example 6. In Fig. 5, at line 13, a nondeterministic integer value is assigned to variable t. Since the set of integers is infnite, assigning a unique label ℓ to each integer results in an infnite set L. To specify the choices of the verifer, we therefore defne a fnite set of abstract labels. An example of such a set is L <sup>♯</sup> = {sum<sup>1</sup> = sum2, sum<sup>1</sup> = y2, sum<sup>1</sup> < y2, sum<sup>1</sup> = sum<sup>2</sup> + A2[i2] + y2}.The restriction sum<sup>1</sup> = sum<sup>2</sup> can result in an empty set of transitions (we will return to this point later in the section); but the restrictions sum<sup>1</sup> = y2, sum<sup>1</sup> < y<sup>2</sup> and sum<sup>1</sup> = sum<sup>2</sup> +A2[i2]+y<sup>2</sup> always defne a nonempty set of transitions when pc<sup>2</sup> = 13 and when a schedule {2} ⊆ M is chosen: those transitions that choose a value for y<sup>2</sup> such that the predicate holds after the transition; there is always at least one such value. In fact, for sum<sup>1</sup> = y<sup>2</sup> and sum<sup>1</sup> = sum<sup>2</sup> +A2[i2]+y<sup>2</sup> there is exactly one such value, while for sum<sup>1</sup> < y2, the set of values (transitions) is infnite. Note that there are transitions that are not selected by any restriction (those that assign to y<sup>2</sup> a value such that none of the predicates hold).

Thus, the abstract labels defne a space of underapproximations of the transitions of the composed system. This is an underapproximation since some (combinations of) individual transitions of TS may not be allowed by any p ∈ L ♯ .

The verifer uses p ∈ L ♯ to specify the transitions of the traces assigned to the existentially quantifed variables πl+1..k. We then require that all of the composed post-states reached by the verifer's choice ⟨M, p⟩ are winning for the verifer. This amounts to proving that all restricted traces satisfy □ϕ, which would mean that there exist traces that do, as long as the restrictions do not lead to an empty set of traces. Therefore, to ensure soundness of the encoding, we require that the restrictions be nonempty. Nonemptiness of the restrictions also ensures that the choices of the falsifer are never restricted, since the choices of the falsifer are always singletons (based on the concrete labels).

Rather than limiting the set of predicates used as abstract labels, we ensure nonemptiness by applying the restrictions only when the resulting set of transitions is nonempty; otherwise, the full set of transitions is considered. Technically, this is accounted for by special considerations in the construction of the CHC encoding, as detailed below.

CHC encoding We adapt the formulas in Fig. 6a to the case of abstract labels. We defne U = M × L ♯ . The formulas from Fig. 6a carry over, except that the defnition of δM,ℓ from the fnite branching case is now replaced with δM,p, which captures the transitions according to the abstract labels, as defned below.

For a schedule ∅ ̸= M ⊆ {1..k} and p ∈ L ♯ , we defne allowedM,p, a formula that is satisfed by s, ℓ <sup>∀</sup> when some transition is possible from s with scheduling M and ∀-choice ℓ ∀ such that the target composed state satisfes p. This means that the restriction to p is nonempty. δM,p then applies the restriction of the composed post-state to p only when allowed (otherwise all transitions remain):

allowedM,p(V,L ∀ ) △= ∃V′ ,L ∃ · ∆M(V, V ′ ,L) ∧ p(V ′ )

δM,p(V, V ′ ,L ∀ ) △= ∃L<sup>∃</sup> · ∆M(V, V ′ ,L) ∧ allowedM,p(V,L ∀ ) → p(V ′ ) 

The resulting encoding is sound, but, unlike the case of fnite branching, not complete.

Theorem 5. If the set of formulas in Fig. 6a adapted to L ♯ is satisfable, then TS |=<sup>G</sup> φ.

Example 7. Going back to the example in Fig. 5, choosing schedule M = {2} and restriction ℓ <sup>♯</sup> = (sum<sup>1</sup> = sum<sup>2</sup> + A2[i2] + y2) when pc<sup>2</sup> = 13 ensures that the unique value of y<sup>2</sup> that satisfes the restriction is selected. With this value chosen, the assignment of the next line will produce a value of sum<sup>2</sup> that is equal to that of sum1. This gives rise to the following winning strategy (at every iteration): (i) schedule {1} with any restriction until pc<sup>1</sup> = 7; (ii) schedule {2} until pc<sup>2</sup> = 13, then schedule {2} again with ℓ <sup>♯</sup> = (sum<sup>1</sup> = sum<sup>2</sup> + A2[i2] + y2), then {2} again with any restriction; (iii) conclude the iteration by scheduling {1, 2}. As explained, the inductive invariant sum<sup>1</sup> = sum<sup>2</sup> is preserved in this behavior, and there are no "stuck" states (since, by construction of δM,p, empty restrictions are lifted to the full set of transitions).

As a corollary of Thm. 5, satisfability of the aforementioned formulas ensures that TS |= φ. To obtain an equi-satisfable CHC encoding, we apply the transformation of Sec. 4. The resulting CHC encoding consists of the formulas in Fig. 6b adapted to use L ♯ in the same way the formulas in Fig. 6a are adapted.

Corollary 3. If the set of CHCs in Fig. 6b adapted to L ♯ is satisfable, then TS |=<sup>G</sup> φ.

# 7 Evaluation

We implemented our CHC-encoding approach in a tool called HyHorn, on top of Z3 [23] (4.12.0) through its Python API, using Spacer [35,31] as a CHC solver. HyHorn takes as input a CFG, or several CFGs, whose transitions are annotated with two-vocabulary frst-order formulas, and constructs a formula expressing the transition relation Tr . The specifcation is provided as: (i) a quantifer prefx ∀∀, ∀∃, or ∀∀∃, (ii) observation points ξ<sup>i</sup> and (iii) safety condition ϕ that must hold globally in all observations. From that, the CHC encoding (Sec. 5, Sec. 6) is constructed and passed to Spacer for solving. HyHorn supports all frst-order theories supported by Spacer (in our experiments, we used the theories of integer arithmetic and arrays). HyHorn further provides the option to apply predicate abstraction with a user-provided set of predicates, same as [8]. The abstraction is incorporated into the CHC encoding using the implicit abstraction encoding [13]. Notably, many of the benchmarks shown here are solved by HyHorn even without an abstraction, that is, directly over the concrete state.

In the area of hyperproperty verifcation, there are already several tools present, and the objective of our evaluation is to compare with such. Still, the feld is not mature enough to have a standardized specifcation format (as is the case with SMTLIB and SV-COMP, to name a few). As a result, each tool has its own, opinionated, format, which varies from logical formulas to control-fow graphs. This makes it technically difcult to compare results of multiple solutions. In particular, benchmarks taken from previous work come in a range of formats, dictated by the tools that introduced them. A few of the benchmarks were translated by previous authors and, thanks to their eforts, are available in more than one format. For the majority of them, manual work is required for translating the benchmarks, and, more importantly, there is no one accepted translation, and the translation can introduce artifacts in the evaluation.

This forced us to prioritize the comparisons in our experiments. We chose to focus on comparison with the most closely related tools to our work. These



Table 1: Experimental results for k-safety properties. Time is measured in seconds. "—" represents timeouts after 20 minutes. "/" denotes benchmarks not present in the respective tool's suite.

In benchmark names, [FV19] refers to [27]; [BF22] refers to [8].

are HyPA [8], Pdsc [41], and PCsat [45]. HyPA is the most recent tool, and has already collected benchmarks from various previous papers (including Weaver [27]); Pdsc and PCsat both use the same frst-order encoding as our starting point and thus are also relevant. HyPA's benchmarks include, in particular, ∀ ∗∃ ∗ examples such as GNI, and Pdsc targets non-trivial alignments, and, as such all of its benchmarks have non-lockstep alignments.

Benchmarks For the evaluation of our approach we use the full sets of benchmarks from HyPA [8] and Pdsc [41]. The benchmarks of HyPA are divided into k-safety benchmarks, which are adopted from [43,27,41,45], and ∀ ∗∃ <sup>∗</sup> benchmarks, which include refnement properties for compiler optimizations, general refnement of nondeterministic programs and generalized non-interference (GNI). For two benchmarks, we include both a simplifed version as given in [8], as well as the original example. The benchmarks of Pdsc include more non-lockstep examples, as well as all of the comparator benchmarks from [43]. The comparator examples consist of both safe and unsafe instances. Weaver [27] considers 12 additional (sequential) k-safety benchmarks. As an additional test case, we manually translated the running example from Weaver, which is a 3-safety property with a nontrivial alignment, and tested it with HyHorn – HyHorn solved it in 2.25 seconds when provided with a few simple predicates (inequalities between program variables). We believe that being the running example makes it a good representative of the remaining 12. This brings our benchmark suite to a total of 112 k-safety examples (16 in Table 1 plus 96 comparator benchmarks).

Experiments To demonstrate the efectiveness of HyHorn we compare to HyPA [8], the most recent approach of formal verifcation of ∀ ∗∃ ∗ -hyperproperties, which

employs a construction using automata. To exhibit the benefts of the direct CHC encoding we also compare the k-safety examples to PCSat [45] and Pdsc [41]. Both encode the k-safety problem using FOL formulas as in Fig. 4a. PCSat uses a specialized solver for pfwCSP (a fragment of FOL that includes these formulas), while Pdsc solves the FOL formulas by enumerating alignments and using a CHC solver for each alignment. We do not compare to game solvers since, as reported by [8], state-of-the-art infnite-state game solvers, such as [26,2], which work without user-provided predicates, are unable to solve the benchmarks we consider.

We run HyHorn on the full set of benchmarks, and each of the other tools on the ones included in their benchmark suite. This is because each tool has its own input format: HyPA and Pdsc each has its own representation for the transition system and the property; PCSat accepts pfwCSP instances that are constructed manually. Some of the benchmarks are common to several tools.

All experiments are run on an AMD EPYC 74F3 with 32GB of memory. HyPA and PCSat are executed in Docker using their published artifacts<sup>7</sup> .

Results The performance measurements of the tools for the k-safety benchmarks and for the ∀ ∗∃ <sup>∗</sup> benchmarks are shown in Table 1. The results for the comparator examples are deferred to the extended version of the paper [34]. HyHorn is tested in two modes: with predicate abstraction ("PA") and without ("concrete"). HyPA and Pdsc require predefned predicates (the same predicates are used in all tools), while PCSat does not, but uses hints to solve 'array insert' and 'squares sum'. HyHorn solves almost all of the benchmarks with PA in under a second, outperforming previous approaches by up to two orders of magnitude; and also solves most of the benchmarks quickly without PA, esp. the ∀ ∗∃ <sup>∗</sup> properties. In particular, HyHorn solves the two array benchmarks, while HyPA and PCSat do not support arrays and only solve simplifed versions with integers. The runtime of HyHorn (both with and without predicates) on the comparator examples is similar to the runtime of Pdsc (see [34]), where HyHorn solves some benchmarks that Pdsc does not. (The other tools do not include these benchmarks.) On the unsafe examples, HyHorn provides a concrete counterexample, while Pdsc is only able to determine that there is no inductive invariant and alignment expressible with the given set of predicates.

# 8 Related Work

There is a large body of work studying verifcation of hyperproperties. While earlier verifcation techniques mostly focus on k-safety properties, or specifc examples such as program equivalence, monotonocity, determinism [5,44,3,30,43,47] [24,27,41,1], lately verifcation of non-safety hyperproperties has been studied [4,16,45,7,8]. Below we discuss the works closest to ours.

<sup>7</sup> We evaluated HyHorn in Docker as well. There were no meaningful diferences in runtime.

k-Safety Automatic verifcation of k-safety properties can be achieved by reducing the problem to a standard safety verifcation problem by means of selfcomposition [5], product-programs [3], and their derivatives [47,24]. Recently, however, it was identifed that the alignment of the diferent copies has a substantial efect over the complexity of the verifcation problem [41,27,12]. Our approach is most related to the technique of Shemer et al. [41], which uses a semantic alignment that chooses which copy of the system performs a move based on the composed state of the diferent copies. They suggest an algorithm that iterates through the set of possible semantic alignments, such that in each iteration a CHC solver tries to prove the property, with the chosen alignmnet, using predicate abstraction. Unlike [41], HyHorn delegates the search for the alignment to the CHC solver, together with the search for the invariant, making the algorithm less dependent on predicate abstraction. Moreover, while [41] is restricted to k-safety only, our technique can handle k-safety as well as the more general ∀ ∗∃ ∗ -OHyperLTL.

∀ ∗∃ <sup>∗</sup> Hyperproprties Recently, verifcation of ∀ ∗∃ <sup>∗</sup> hyperproperties has been studied, targeting both fnite and infnite systems [45,16,8]. Unno et al. [45] present an approach based on an encoding of hyperproperties verifcation as satisfability of formulas in FOL that extend Horn form with disjunctions, existential quantifcation and well founded relations. Deciding satisfability of the generated set of formulas is based on a variant of the CEGIS framework. Hy-Horn is diferent as it encodes ∀ ∗∃ ∗ -OHyperLTL verifcation as a set of CHCs, which does not require a specialized solver and can use any of-the-shelf CHC solver. Coenen et.al. [16] suggested a game-based approach for verifcation of ∀ ∗∃ <sup>∗</sup> properties over fnite-state systems, which was then extended by Beutner et al. [8] to handle infnite-state systems. Similarly to [8], we use game semantics to solve ∀ ∗∃ <sup>∗</sup> problems, but do not require building the game-graph in order to solve the game, instead reducing the game solution to satisfability of CHCs. It is important to note that in the case of infnite branching degree, while the approach in [8] explicitly checks for emptiness of restrictions in hindsight, i.e., after they are used in a strategy, and removes them iteratively if needed, HyHorn embeds the emptiness requirements into the set of CHCs. Recently, [7] extended the game-based approach to use prophecy variables as a way to achieve completeness of the reduction to games. Extending our approach to this case is a promising avenue for future research.

Relational CHCs [40] present a method for discovering relational solutions to CHCs. Their setting is diferent: the inputs are CHCs that serve as the defnition of the transitions, and synchronization is between sets of unknown predicates; at the current state, only lock-step semantics is considered. Furthermore, their algorithm extends and modifes Spacer [35], while our approach can use any CHC solver without modifcation.

Infnite-State Game Solving Our approach for verifying ∀ ∗∃ <sup>∗</sup> hyperproperties is based on the game semantics of ∀ ∗∃ ∗ -OHyperLTL proposed in [16,8]. However, we do not propose a general game solving algorithm. Instead, we use the game semantics to come up with a frst-order encoding of hyperproperty verifcation problems, which is then reduced to CHC solving. This allows us to use any CHC solver when solving the hyperproperty game. There is a large body of work on solving infnite-state games [21,9,46,26]. The game solving approach in [46] uses three-valued predicate abstraction to reduce the problem to fnitestate game solving and requires to iteratively refne the controllable predecessor operator when computing candidate winning states. The approach in [26] targets games defned over the theory of linear real arithmetic and is based on an unrolling of the game and the use of Craig interpolants [18] to synthesize a winning strategy. The game solver in [2] is not restricted to a given FOL theory, but requires an interpolation procedure in order to compute sub-goals that are used to inductively split a game into sub-games. As reported by [8], game solving approaches [26,2], which work without a provided set of predicates, are unable to handle the infnite-state games for the benchmarks we consider. Moreover, the approaches in [26,16,2,8] cannot handle games that are defned using formulas over the theory of arrays, which are part of our benchmark. The approach of [9] to solving games over infnite graphs is based on reduction of games (including safety games) to CHCs. However, unlike the reduction presented in this paper, in [9] the games are encoded in a diferent fragment of Horn, namely ∀∃-Horn where the head predicates can contain existential quantifers. More recently (and concurrently with our work), [25] proposed a new reduction of game solving to CHC solving. Their approach handles safety games in which the branching degree of the "safe" player (the verifer in our setting) is bounded. In contrast, our encoding supports also infnite branching with the restrictions mechanism. Moreover, they do not support predicate abstraction, which is crucial for solving some of our benchmarks.

Restrictions as Underapproximations The use of restrictions as underapproxmations of the transition relation, inspired by [8], corresponds to the use of must hyper-transitions [36] in abstract transition systems [42,19] and games [20,22]. Similarly to [29,17], we use such underapproximations to replace an existential quantifer by universal quantifcation within the restriction.

# 9 Conclusion

We introduced a translation of a family of non-Horn frst-order formulas to CHCs. This translation led to the frst CHC encoding of a simultaneous inference of an invariant and an alignment for verifying k-safety properties. While the transformation itself is rather simple, identifying it was not straightforward and alluded previous works on the topic. We have further extended the CHC encoding to infer a witness function for existentially quantifed traces arising in the verifcation of ∀ ∗∃ ∗ -OHyperLTL properties. Our experiments exhibit signifcant improvement over state-of-the-art hyperproperty verifers thanks to the existence of advanced of-the-shelf CHC solvers, whose efcacy is expected to improve even

further. The approach shows promising capabilities in solving (many) hyeprproperty verifcation problems completely automatically. In some cases, predicates still have to be provided by the user, a limitation that we hope to overcome in the future by automatic inference of predicates. Applying (or extending) the transformation to obtain CHC encoding for other verifcation fragments is an interesting direction for future work.

Acknowledgment The research leading to these results has received funding from the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant agreement No [759102-SVIS]). This research was partially supported by the Israeli Science Foundation (ISF) grant No. 2875/21 and No. 2117/23, and by the NSF-BSF grant No. 2018675.

# References


<sup>240</sup> Shachar Itzhaky, Sharon Shoham, and Yakir Vizel


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Program Analysis**

# Maximal Quantifed Precondition Synthesis for Linear Array Loops

Sumanth Prabhu S1,2(B) , Grigory Fedyukovich<sup>3</sup> , and Deepak D'Souza (B) 2(B)

> <sup>1</sup> Tata Consultancy Services Research, Pune, India 2 Indian Institute of Science, Bengaluru, India

<sup>3</sup> Florida State University, Tallahassee, USA

sumanth.prabhu@tcs.com, grigory@cs.fsu.edu, deepakd@iisc.ac.in

Abstract. Precondition inference is an important problem with many applications in verifcation and testing. Finding preconditions can be tricky as programs often have loops and arrays, which necessitates fnding quantifed inductive invariants. However, existing techniques have limitations in fnding such invariants, especially when preconditions are missing. Further, maximal (or weakest) preconditions are often required to maximize the usefulness of preconditions. So the inferred inductive invariants have to be adequately weak. To address these challenges, we present an approach for maximal quantifed precondition inference using an infer-check-weaken framework. Preconditions and inductive invariants are inferred by a novel technique called range abduction, and then checked for maximality and weakened if required. Range abduction attempts to propagate the given quantifed postcondition backwards and then strengthen or weaken it as needed to establish inductiveness. Weakening is done in a syntax-guided fashion. Our evaluation performed on a set of public benchmarks demonstrates that the technique signifcantly outperforms existing techniques in fnding maximal preconditions and inductive invariants.

# 1 Introduction

Many practical problems in software development, verifcation, and testing rely on good and nontrivial preconditions for programs. Preconditions can be considered as a constraint on a program's input or used to flter out input values of a program at run-time. While performing verifcation in a backward fashion, preconditions are used to summarize loops and functions. The maximal (or logically weakest) precondition is desirable in all these applications. Such preconditions can be derived in various methods [45,14,54,13,53,25,3,46].

However, precondition inference is known to be difcult for programs with unbounded loops, as it requires reasoning about possible behaviors in any any loop iteration. This necessitates the inference of inductive invariants that describe a set of states from which a new iteration can begin and cannot escape. This task becomes particularly challenging in the presence of data structures like arbitrarily-sized arrays. When reasoning about array elements, solvers are

Fig. 1: An overview of our infer-check-weaken framework.

expected to support quantifers, but existing techniques [30,40,34,32,22] have many limitations.

We present a new technique to automatically infer maximal quantifed preconditions for deterministic programs that manipulate arrays and have linear array loops. These loops are non-nested and terminating loops with unique counter variables. The postconditions can have either universal or existential quantifcation. Since such programs can model many practical programs, several techniques target them, but for assertion checking and not precondition inference [39,10]. Moreover, we show in this paper that precondition inference is undecidable for this class of programs.

An overview of our algorithm is shown in Fig. 1. The algorithm operates in an "infer-check-weaken" framework. Our algorithm views the problem as solving a system of constrained Horn clauses (CHCs), which are logical systems to represent the verifcation conditions of programs, with a missing precondition . A valid solution for this system is inferred by an abduction-based algorithm, i.e., by systematically answering the questions like "what state at the beginning of the iteration could yield a given state at the end of the iteration?" The solution is then checked for maximality by inferring another precondition () for a system that uses the same CHC encoding of the loop and the complemented (i.e., negated) postcondition. If the solution is not maximal, it is weakened incrementally in a counter-example-guided loop.

The inference algorithm begins with the weakest possible candidate solution and propagates the given quantifed postcondition towards the program's entry point. In the process, it strengthens the candidate solution using our novel technique called range abduction. Range abduction fnds a strengthening of quantifed formulas by reduction (wherever possible) to abduction over quantifer-free formulas. The obtained formulas are combined with the range formula [22] that essentially represents a boundary between the indices of arrays that are already processed and indices that are yet to be processed. Such a predicate can be obtained using lightweight static analysis over the structure of the CHCs. The inference algorithm uses the Houdini technique [24] to weaken a solution.

Intuitively, range abduction for linear array loops seeks to pose two integer abduction queries over indices that are modifed and indices that are not modifed. Integer abduction has been used in invariant inference [17,18], precondition inference [27,16], and specifcation synthesis [2,50]. On the lower level, abduction

is often implemented using quantifer elimination, but in our setting the formulas must use quantifers over array indices that should not be eliminated. Range abduction is designed specifcally for this application.

Although efcient, range abduction does not guarantee maximality, and our inference is followed by two additional steps: maximality checking and weakening. The maximality checker tries to determine whether all the states outside the current precondition lead to a violation of the assertion. If they do, the current precondition is maximal. Otherwise, there is at least one state that can be added to the current precondition, and hence an attempt to weaken the precondition is made. The weakening module weakens a precondition and infers an inductive invariant for it using a syntax-guided-synthesis based method.

A prior framework [50] to fnd specifcations (including preconditions) follows a similar approach by iteratively inferring solutions. But it is based on integer abduction and a maximality checking using an SMT solver, and it is applicable only to array-free programs. Furthermore, it does not guarantee maximality in some cases [29]. We experimentally observed that extending the SMT-based maximality checking algorithm of [50] with quantifed formulas over arrays makes the tool diverge. This motivated us to design a new maximality checker by using of the complement system and range abduction.

We have implemented our algorithm in a tool called PreQSyn, which takes CHCs as input. On a challenging set of 32 benchmarks, PreQSyn signifcantly outperforms a prior maximal quantifed precondition inference tool P-Gen [54]. PreQSyn automatically found 31 preconditions and proved 21 of them to be maximal, while in contrast P-Gen found only 2 maximal preconditions and in most cases did not fnd any preconditions. We also show that a variety of existing array verifcation tools like VeriAbs [15], Spacer [32], and FreqHorn [22], fnd it hard to even verify the preconditions we discovered for these benchmarks. Our tool can not only solve them by fnding preconditions, but also fnds the maximal ones in most of the cases.

The core contributions of this paper are:


In the rest of the paper, we motivate the problem with an example in Sect. 2. The necessary background for abduction and CHCs are provided in Sect. 3. A proof of undecidability of the problem is in Sect. 4. Sect. 5 presents an overview of our inference algorithm and an illustration on the example. The details of range abduction are in Sect. 6, while Sect. 7 has the maximality checking and weakening algorithms. Our experimental evaluation can be found in Sect. 8, related work in Sect. 9, and conclude with limitations and future work in Sect. 10.

```
int N = nondetInt ();
int A[N] , B[N] , C[N];
assume ( pre(A, B, C, N )); // goal : find maximal pre
for (int i = 0; i < N; i++)
  if (2*i < N) C[i] = i;
  else A[i] = C[i];
assert (∀j. 0 ≤ j < N =⇒ A[j] == B[j]); // postcondition
```
Fig. 2: C-like example with a universally quantifed postcondition and no precondition.

$$\mathsf{pre}(N, A, B, C) \land i = 0 \Longrightarrow \mathsf{in}\,\mathsf{w}(i, N, A, B, C) \qquad (C\_1).$$

$$\mathsf{inv}(i, N, A, B, C) \land i < N \land 2 \ast \, i < N \land C' = \mathsf{store}(C, i, \mathbf{i}) \land \mathbf{i}' = \mathbf{i} + 1 \Longrightarrow \mathsf{inv}(i', N, A, B, C') \qquad (C\_2)$$

$$\mathsf{in}\,\mathsf{w}(i, N, A, B, C) \land i < N \land 2 \ast i \ge N \land A' = \mathrm{store}(A, i, C[i]) \land i' = i + 1 \Longrightarrow \mathsf{in}\,\mathsf{w}(i', N, A', B, C) \qquad (C\_3)$$

$$\mathsf{inv}\,\mathsf{w}(i, N, A, B, C) \land \neg(\mathsf{i} < N) \land \neg(\forall j. \,0 \le j < N \implies A[j] = B[j]) \implies \bot \tag{C4}$$

Fig. 3: CHC encoding of program in Fig. 2.

```
int N = nondetInt ();
int A[N] , B[N] , C[N];
assume ( cpre (A, B, C, N ));
for (int i = 0; i < N; i++)
  if (2*i < N) C[i] = i;
  else A[i] = C[i];
assert (∃j. 0 ≤ j < N ∧ A[j] != B[j]); // complemented post
```
Fig. 4: The program used to check maximality; the postcondition is complemented and has no precondition.

# 2 Motivating Example

We motivate the problem with the program shown in Fig. 2 with three fnitelength statically allocated arrays , , and , each of the size . The arrays are accessed sequentially in the loop: the cells in the frst half of are assigned their corresponding indices, and the remaining elements of are copied to the corresponding positions in . The program ends with the postcondition stating the pairwise equality of and . Our goal is to fnd the maximal precondition under which the postcondition holds. Intuitively, such a precondition must be universally quantifed because it must express that arrays , , and are properly initialized up to an arbitrary length .

Further, in order to prove that the postcondition indeed holds after the loop has terminated, we have to show that there exists an inductive invariant that is also universally quantifed. To confrm that the precondition is logically the weakest, we need to formally prove that any attempt to extend it by a single point leads to a violation of the postcondition. Thus, the solution we target should have two properties: 1) it should allow us to fnd an inductive invariant

for the loop, and 2) any of its weakening results in a counterexample that violates the assertion.

The only publicly existing tool to fnd quantifed precondition, P-Gen [54], which is based on predicate abstraction, is unable to solve this program. The last candidate precondition it tries to refne is = 3 ∧ [0] = [0] ∧ [1] = [1] ∧ [2] = [2], which does not constrain the value of array , thus allowing the program to violate the postcondition, e.g., when [2] ̸= [2] initially and [2] = [2] in the else-branch when = 2.

Fig. 3 shows a system of CHCs over relations and , representing the verifcation conditions of the program in Fig. 2. For brevity, we do not mention the universal quantifcation over all program variables including arrays, which is implicit. In particular, the frst CHC identifes the initial value of the counter but does not give any constraints over , , or (which are essentially deferred to ). The next two CHCs encode the loop body, corresponding to the two possible branches in the body of the loop. The last CHC encodes that no state satisfying the negation of the assertion is reachable.

The missing precondition makes the CHC system in Fig. 3 diferent from the CHC systems that appear in verifcation tasks. Hence, existing CHC solvers are not directly applicable here as they can return the strongest solution: ⊥. For instance, Spacer [32] (Z3 v4.12.2) returns the solution ↦→ , , , . ⊥ and ↦→ , , , , . ⊥. Such vacuous solutions are not of much use in the applications mentioned earlier.

The CHC system also represents a maximal specifcation problem, with being the specifcation of an initialization function. However, existing maximal precondition synthesis techniques [2,50,29] do not support synthesizing quantifed preconditions over arrays.

Our algorithm takes the input CHC system and works in an infer-checkweaken fashion as shown in Fig 1. First, the infer module strengthens and weakens the postcondition from the last CHC via range abduction and Houdini, resp., to fnd the following precondition (detailed illustration follows in Sect. 5.2):

$$A \cdot N, A, B, C. \forall j (0 \le j < N \land 2 \ast j < N \implies A[j] = B[j]) \land \forall j (0 \le j < N \land 2 \ast j \ge N \implies B[j] = C[j]).$$

We note that this is the maximal precondition for this problem instance, and in general we may not always fnd the maximal precondition in the frst iteration. In any case, we need to check the maximality of the inferred precondition. Our maximality checker does this by trying to fnd a precondition for the complement of the postcondition (called the "complement program", see Fig 4). This is achieved by calling the infer module again, albeit with an existentially quantifed postcondition. By using the existentially quantifed structure of the postcondition, the infer module discovers the following precondition (see Sect. 6.4 for details):

$$\forall \lambda N, A, B, C. \exists j (0 \le j < N \land 2 \ast j \ge N \land B[j] \ne C[j]).$$

The maximality checker now tries to determine whether all the points that are outside the precondition of the original program are indeed in the precondition of the complement program. For the example program, this is encoded as the 250 Sumanth Prabhu S, Grigory Fedyukovich, and Deepak D'Souza

following formula:

$$\begin{aligned} \exists \, \forall A, B, N. \left( \neg \left( \forall j (0 \le j < N \land 2 \ast j < N \implies A[j] = B[j]) \right) \land \forall j (0 \le j < N \land 2 \ast j \ge N \implies B[j] = C[j]) \right) \\ \implies \exists j (0 \le j < N \land 2 \ast j \ge N \land B[j] \neq C[j]) ). \end{aligned}$$

If the formula was valid, the current precondition would be maximal since all the states outside would violate the property (as they would be in the precondition of the complement program). In this example, the implication is not valid, because = 3, = [0, 0, 0], = [1, 0, 0], = [0, 0, 0] is a counter-example to validity. Our approach then weakens the precondition of the complement program based on the counterexample to the validity check:

$$\lambda N, A, B, C. \exists j (0 \le j < N \land 2 \ast j \ge N \implies B[j] \ne C[j]) \lor \exists j (0 \le j < N \land 2 \ast j < N \land A[j] \ne B[j]).$$

The checker now conducts a successful validity check, and the algorithm terminates.

# 3 Background

This paper builds largely on foundations of Satisfability Modulo Theories (SMT) problems. SMT aims to determine the existence of an assignment to variables of a frst-order logic formula that makes it true. We will be dealing with the logical setting *L* of linear integer arithmetic (LIA) with arrays. The signature of the logic includes a fnite set of uninterpreted relation symbols *R* . Each symbol in *R* has an associated arity , and an associated type which indicates a type (integer or array) for each argument of the relation.

We write (1, . . . , ) (where each is a variable with an associated integer/array type) to denote a formula of this logic, that does not use any of the relation symbols in *R* , and whose free variables are among {1, . . . , }. For convenience, we also write (⃗) to denote the same. For a formula (⃗), and an assignment which maps the variables in ⃗ to concrete integers/arrays, we write |= to denote that evaluates to ⊤ under , and say satisfes , or that is a model of . A formula is logically weaker than a formula (denoted =⇒ ), if every model of also satisfes . Hence =⇒ ⊥ denotes that is unsatisfable.

An interpretation for a relation symbol ∈ *R* is defned as a map of the form <sup>1</sup> . . . .(1, . . . , ), where is a well-typed frst-order formula that does not contain any symbols from *R* .

We now present formal defnitions of the concepts that will be used in the rest of the paper.

#### 3.1 Abduction

Defnition 1. Let ⃗ and ⃗ be vectors of variables such that the variables in ⃗ are also present in ⃗. Let (⃗) and (⃗) be formulas without any relation symbols from *R* , with free variables in ⃗. Let be an uninterpreted relation in *R* of arity equal to the length of ⃗. Consider a formula of the form (⃗) ∧ (⃗) =⇒ (⃗). The abduction problem is to fnd an interpretation for , such that:

$$\begin{array}{rcl} 1. \ \varphi(\vec{x}) \land \alpha(\vec{y}) \implies \bot, \; and \\ 2. \ \varphi(\vec{x}) \land \alpha(\vec{y}) \implies \beta(\vec{y}). \end{array}$$

Intuitively, the problem of abduction is to fnd a formula that together with entails the formula in a non-trivial manner. One can see that for a given abduction problem there may be multiple solutions, but we are interested in a maximal one (i.e. logically weakest), whenever a solution exists. The techniques in [17,16,2] compute such a maximal solution for frst order theories that admit quantifer elimination. This solution is succinctly presented in the lemma below.

Lemma 1. Let (⃗)∧(⃗) =⇒ (⃗) be an abduction problem where the underlying frst order theory has a method QE(⃗, ) whose result is a ⃗-free formula constructed by (existential) quantifer elimination of variables ⃗ from the formula . Suppose that the given instance has a solution. Then, the following formula (⃗) forms a maximal solution for the abduction problem:

$$
\varphi(\vec{x}) \stackrel{\text{def}}{=} \neg(\text{QE}(\vec{y} \backslash \vec{x}, \alpha \wedge \neg \beta))\,.
$$

Example 1. Consider an instance of the abduction problem () ∧ = 0 =⇒ > . Then () is computed as follows:

$$\begin{aligned} \varphi(x) &= \neg(\text{QE}(\{x, y\} \mid \{x\}, y = 0 \land x \le y)) = \\ \neg(\text{QE}(y, y = 0 \land x \le y)) &= \neg(x \le 0) = x > 0. \end{aligned}$$

#### 3.2 Modeling Programs With Constrained Horn Clauses

Constrained Horn clauses (CHCs) [37,28,57,35,51,23,31,50] are becoming increasingly popular as an intermediate logical representation of programs and their proof obligations. Dealing directly with CHCs as opposed to program statements is convenient and allows for easier creation and handling of various SMT formulas and constructed invariants.

Defnition 2. A CHC (in the logic *L*) is a formula in *L* that has the form of one of the following three implications:

$$\forall \vec{x\_1}. \left(\varphi\_1(\vec{x\_1}) \implies r\_1(\vec{x\_1})\right) \tag{1}$$

$$\forall \vec{x}\_1, \vec{x}\_2. \left( r\_1(\vec{x}\_1) \land \varphi\_2(\vec{x}\_1, \vec{x}\_2) \implies r\_2(\vec{x}\_2) \right) \tag{2}$$

$$\forall \vec{x}\_1. \left( r\_1(\vec{x}\_1) \land \varphi\_3(\vec{x}\_1) \implies \bot \right) \tag{3}$$

where:

 $1 - r\_1, r\_2 \in \mathbb{Q}$   $are \text{ } uninterpreted \text{ } relation \text{ } symbols, \text{ } where \, r\_1 \text{ } and \, r\_2 \text{ } may \text{ } coincide.$ 


We introduce some auxiliary notation below for convenience. For a CHC :

#### 252 Sumanth Prabhu S, Grigory Fedyukovich, and Deepak D'Souza


A CHC of type (1) is called a fact, and of type (3) is called a query. For simplicity, for a query , we write rel(head()) = rel(⊥) = ⊥. In the literature, the CHCs we are considering are called linear as there is at most one relation symbol in the body of a CHC. A system of CHCs is a fnite non-empty set of CHCs.

We assume that our precondition inference problem is represented by a system of CHCs without any facts<sup>4</sup> and there is a designated relation (or ) that appears in rel(body()) for some CHC in and doesn't appear in rel(head( ′ )) for any other CHC ′ in . Furthermore, we assume that there is a single query in with a constraint of the form ∧ , where ¬ is the postcondition in the inference problem.

CHCs allow for fexibility of program encoding. For instance, it is safe to assume that each is in Conjunctive Normal Form (CNF). For if had the following form:

$$r\_1(\vec{x}\_1) \land (\varphi\_1(\vec{x}\_1, \vec{x}\_2) \lor \varphi\_2(\vec{x}\_1, \vec{x}\_2)) \implies r\_2(\vec{x}\_2),$$

it can be transformed into two CHCs:

$$\begin{aligned} r\_1(\vec{x}\_1) \land \varphi\_1(\vec{x}\_1, \vec{x}\_2) &\implies r\_2(\vec{x}\_2),\\ r\_1(\vec{x}\_1) \land \varphi\_2(\vec{x}\_1, \vec{x}\_2) &\implies r\_2(\vec{x}\_2). \end{aligned}$$

Defnition 3 (CHC Solution and Satisfability). A solution to a system of CHCs is a map *M* that provides an interpretation for each relation symbol in *R* , such that for each CHC in , ( body() =⇒ head() ) [*M* /*R* ] 5 is valid. In this case we say *M* is inductive at . We say is satisfable if there exists a solution to it.

Defnition 4 (Maximal Precondition). Let be a system of CHCs for a precondition inference problem. We call a solution *M* to (precondition) maximal if there is no solution *M* ′ to with *M* ′ () strictly logically weaker (i.e. w.r.t. the implication partial order) than *M* (). *M* () is also called the weakest precondition.

<sup>4</sup> A fact CHC represents the initial condition of the program. Since is in the place of initial condition in our task, there will not be a fact CHC.

<sup>5</sup> For a formula , terms/formulas and , we write [/] to denote after all instances of are replaced by . For a set of terms/formulas and a mapping *M* from to other terms/formulas, [*M* /] denotes the simultaneous replacement of all 1, 2, . . . ∈ by *M* (1), *M* (2), . . ., respectively.

We now defne certain terms that will be used in weakening of a precondition (Sect. 7).

Defnition 5 (Complement System). Given a system of CHCs , we defne a complement system to be the system obtained from by replacing by in the query CHC.

Defnition 6 (CHC Extension). Given a system of CHCs with and an interpretation for , we defne , an extension of w.r.t , to be the system obtained from by replacing by in the CHC ∈ , where rel(body()) = {}

Lemma 2. Given with and its extension , if *M* is a solution to then *M* = { ∈ *R* . if = then else *M*()} is a solution to .

To encode program executions, we borrow the notion of CHC unrolling from [21]. Essentially, a CHC unrolling is a symbolic representation of a set of program executions starting from a state satisfying . If the unrolling is satisfable then the execution terminates in the postcondition.

Defnition 7 (Unrolling of CHCs). Given an extended CHC system over *R* , let 0, . . . , be a + 1-length sequence of CHCs in , with <sup>0</sup> being a fact, being a query, and rel(head()) = rel(body(+1)) for each . Then, a -length unrolling of is defned as below:

$$\pi\_{\{C\_0,\dots,C\_k\}} \stackrel{\text{def}}{=} \bigwedge\_{0 \le i < k} body(C\_i)(\vec{x}\_i, x\_{i+1}^\star) \wedge (body(C\_k)[\neg \rho/\rho])(\vec{x}\_k)$$

Example 2. Consider the CHC system from Fig. 3. Let be:

$$(\forall j. \, 0 \le j < N \implies A[j] = B[j] = 0 \land C[j] = 1) \land N = 1$$

Then ⟨1,2,4⟩ , which is a 3-length unrolling of , is the following satisfable formula:

$$\begin{aligned} \pi\_{(C\_1, C\_2, C\_4)} &= \{ \forall j. 0 \le j < N \implies A[j] = B[j] = 0 \land C[j] = 1 \} \land N = 1 \land i = 0 \land \land \} \\ &i < N \land 2 \ast i < N \land C' = store(C, i, i) \land i' = i + 1 \land \\ &\neg (i' < N) \land \left( \forall j. 0 \le j < N \implies A[j] = B[j] \right) .\end{aligned}$$

Our technique addresses deterministic programs. A non-deterministic program in our context has an initial state that can both satisfy and violate the postcondition. More formally,

Defnition 8 (Non-deterministic Modulo Postcondition CHCs). Let be a system of CHCs that has and extendable by a formula . We call non-deterministic modulo postcondition if there exists an uniquely satisfable formula for which there are at least two satisfable unrollings ⟨0,...,ℓ⟩ and ⟨0,...,⟩ corresponding to extensions and , respectively. Otherwise, we say is deterministic<sup>6</sup>

.

<sup>6</sup> An example is presented in [49].

254 Sumanth Prabhu S, Grigory Fedyukovich, and Deepak D'Souza

We assume that the CHCs are representing terminating programs. Hence, for any initial state of a program encoded in CHCs, there exists an unrolling either satisfying or violating the postcondition.

Defnition 9 (Terminating CHCs). Let be a system of CHCs with and extendable by a formula . We say is terminating if there does not exist an infnite-length unrolling for <sup>⊤</sup> and <sup>⊤</sup> (i.e. and are extended by = ⊤).

#### 3.3 Linear Array Loop Programs

Though our algorithms work at the level of CHCs, we are motivated to target CHCs representing linear array loop programs (or "linear loops" in short) that model real-world programs in existing array program verifcation works [9,39,10]. These are terminating programs with non-nested loops. We now present the syntax of a linear loop.

$$\begin{aligned} \textit{program} & \rightarrow \mathtt{assume}(pre(V, A)); \textit{stmts}; \textit{post};\\ \textit{stmts} & \rightarrow \textit{assign} \mid \textit{forloop} \mid \textit{stmts}; \textit{stmts} \\ \textit{assign} & \rightarrow v = f(V, A) \; \mid \, a[i] = f(V, A) \; \mid \, \mathtt{if}(\phi(V)) \; \{\textit{assign}\} \, \mathtt{else} \; \{\textit{assign}\} \\ & \mid \, assign; \textit{assign} \\ \textit{forloop} & \rightarrow \mathtt{for} \; (i = l(V); \; c(i, V); \; i = h(i)) \; \{\textit{assign}\} \\ & \mid post & \rightarrow \mathtt{assert}(\forall x. R(x, V) \implies Q(x, V, A)) \; \mid \, \mathtt{assert}(\exists x. R(x, V) \land Q(x, V, A)) \end{aligned}$$

Here and are disjoint sets of integer and array variables, respectively, ∈ is a loop counter, ̸= ∈ is an integer variable, is a term over and such that any access to is done by , is an integer term over , ℎ is an integer term over which results in a monotonically increasing (or decreasing) assignment, and is a guard of the form < or > for some integer term over , and is a boolean predicate. The postcondition is given as a condition in assert, where is a predicate in LIA over quantifed and integer variables that represent a range of array elements, and is a property over an array with array read-access done only by . For example, the formula ∀. 0 ≤ < =⇒ [] = 42 is in this form, where is an array variable.

The precondition (and inductive invariants) inferred by our algorithm will be of the same quantifcation as the postcondition. Further, it can be conjunctions in case of universal quantifcation and disjunctions in case of existential quantifcation. Specifcally, we consider preconditions (and inductive invariants) of the form described in (4). Such a form has been found efective in inferring inductive invariants in the existing works for array programs like [38,33,32,22].

$$\bigwedge\left(\forall x.\,R(x,V)\implies Q(x,V,A)\right)\quad\text{or}\quad\bigvee\left(\exists x.\,R(x,V)\land Q(x,V,A)\right)\tag{4}$$

A formal description of CHCs that represent linear loop programs is given in Sect. 6.1.

# 4 Undecidability of Maximal Precondition Inference for Linear Loops

Although linear loops and postconditions have syntactic restrictions, inference of maximal preconditions for such programs in the considered form (i.e. (4)) is still undecidable. In this section we prove this result.

We reduce the halting problem of two-counter machines [42] to the maximal precondition inference problem. Recall that a two counter machine = (1, 2, ) has two counters <sup>1</sup> and 2, which are initially set to 0, and a fnite set of instructions = {1, . . . , }, where each instruction is of type inc, decjz, and a designated halt instruction <sup>ℎ</sup> . Given a two-counter machine = (1, 2, ), deciding whether it halts, i.e. the halt instruction <sup>ℎ</sup> ∈ is reached, is undecidable.

Theorem 1. The problem of computing the maximal precondition for linear array loop programs in the form described in (4) is uncomputable.

Proof Sketch We construct a linear array loop program with a single loop whose body simulates the execution of one transition of a two-counter machine, and an array records the locations the machine can reach after the transition.<sup>7</sup>

The undecidability of the problem notwithstanding, many real-life programs, like industrial battery controllers [9], adhere to linear array loop structures. Consequently, techniques like [39,10] have been developed to address such programs, but focusing on assertion checking rather than precondition inference. The existing precondition inference technique [54] fnds it challenging to infer a precondition for such programs (details in Section 8). Motivated by these challenges, we propose a sound technique that infers maximal preconditions.

# 5 Inferring Preconditions and Invariants by Abduction

In this section, we give an overview of our approach for abductive inference of preconditions and inductive invariants. We frst explain its basic principles, and then demonstrate them on the running example.

#### 5.1 Overview

We assume that the input system of CHCs represents a precondition inference problem, i.e. it has no facts, a single query, and a designated relation ( or ) for the precondition. Since we are interested in the precondition inference for array programs, we assume that the query has a quantifed constraint .

The high-level algorithm is given in Algorithm 1. It is called InferAbd and is inspired by an earlier work on specifcation synthesis [50]. InferAbd incrementally attempts to discover an interpretation for each uninterpreted predicate

<sup>7</sup> All proofs are in [49].

Algorithm 1: InferAbd(, *M* , )

Input: – set of CHCs over *R* , ⊆ *R* – current subset (initially empty) of relations with invariants/preconditions, *M* – mapping from *R* to predicates, initially . ⊤ Output: *M* – invariants/preconditions of <sup>1</sup> if = ∅ then 2 ← { | ∃. rel(body()) = ∧ rel(head()) = ⊥ ∧ ∈ } 3 ← { | ∈ ∧ rel(body()) ∈ }; <sup>4</sup> while ∃ ∈ . CheckSAT(¬ ( body() =⇒ head() ) [*M* /*R* ]) do 5 let be ( body() =⇒ head() ) [*M* /*R* ]; <sup>6</sup> *M* (rel(body())) ← *M* (rel(body()))∧ Abduce(, args(body()), ); <sup>7</sup> *M* (rel(body())) ←Houdini(, *M* , ); 8 if No *M* (·) was strengthened or weakened then 9 if = *R* then return *M* ; 10 ← { | ∃ ∈ . rel(body()) = ∧ rel(head()) ∈ }; <sup>11</sup> return InferAbd(, *M* , );

in *R* by propagating the assertion backward, strengthening it when needed to establish inductiveness, or weakening if something went wrong during the inference of inductive invariants.

InferAbd (Algorithm 1) constructs a solution *M* for a system of CHCs recursively. *M* initially maps all the predicates in *R* to ⊤. At each call, the algorithm searches for a CHC (line 3) such that *M* is not inductive at . This inductiveness check is reduced to a satisfability check, which is performed by an SMT solver (line 4). If is satisfable then *M* is not inductive at the corresponding , and thus *M* needs strengthening.

Note that in the frst call of InferAbd, the initial *M* is inductive for all the CHCs except the query, thus the interpretations will be created for the predicates that appear in the body of the query. In the subsequent calls, these interpretations could be either strengthened or propagated through the bodies of the CHCs where they appear in the heads, towards the precondition.

In InferAbd, we write ← Abduce(, ⃗, ) to denote an invocation of a new abduction algorithm (Algorithm 2) to obtain a formula over variables ⃗ that makes valid. InferAbd uses Abduce as existing abduction solvers have limited support for arrays. In order to support arrays and quantifers, Abduce abstracts quantifed formulas over arrays and integers into quantifer-free formulas only over integers. To do this, Abduce considers two abduction queries for a CHC in : 1) for the array element that is being rewritten (if any), and 2) for all other elements that are not changed. The formal description of Abduce is in Sect 6 along with illustration. However, by doing this "arrays-to-integer" reduction, Abduce could introduce some imprecision, which is fxed by running the Houdini algorithm (details in Sect 6.3).

InferAbd may not terminate because the series of strengthening predicates obtained in each iteration may diverge. But the recursion in InferAbd can be Maximal Quantified Precondition Synthesis for Linear Array Loops 257

easily augmented by a threshold condition that forces the termination with an unknown result after reaching a predetermined recursion depth.

Theorem 2. Whenever Algorithm 1 terminates, it returns a solution *M* to .

#### 5.2 Approach in Action

We demonstrate the precondition inference approach on the example from Sect. 2 and Fig. 3.

Synthesizing an invariant for The algorithm begins with obtaining an initial candidate interpretation to from the query CHC. The predicate is the query constraint (i.e. the postcondition ¬) with a slight modifcation:

$$i \mathfrak{inv} \stackrel{\text{cand}}{\mapsto} \lambda i, N, A, B, C. \forall j (0 \le j < N \land j < i \implies A[j] = B[j]).$$

The modifcation includes dropping the loop condition and strengthening it by conjuncting a range formula [21] to the antecedent ( < here). In simple terms, the range formula is a predicate that represents the boundary between indices that are modifed and not modifed. It can be ( < ) or ( > ), based on whether the loop counter is increasing or decreasing, respectively (formal defnition in 11).

Our algorithm then checks if any of the CHCs in Worklist are not valid. In this case, the second CHC is not valid. The algorithm then follows backward reasoning and attempts to update the current interpretation of by abductive strengthening to make it inductive using a series of SMT checks and quantifer elimination queries.

The algorithm does abductive strengthening by posing two queries. The frst one is to accommodate the write to the -th element of the array. This strengthening for the second CHC is posed as an abduction query for <sup>1</sup> that is constructed by restricting to only a single cell of the array that is rewritten in the loop:

$$
\psi\_1(A, B, C, j) \land A'[j] = A[j] \land B'[j] = B[j] \land C'[j] = j \implies A'[j] = B'[j].
$$

Here, all the array terms (like [], ′ [], [], etc.) are further replaced by fresh integer variables which allows us to use a standard abduction solver and get the following solution:

$$
\psi\_1 \mapsto \lambda A, B, C, j. \, A[j] = B[j].
$$

Intuitively, <sup>1</sup> gives the weakest precondition on [], [] and [] before the -th loop iteration, such that the desired postcondition holds for ′ [] and ′ [] after the iteration.

The second abduction query accommodates all the other elements in the range 0≤ < ∧ ̸= that are unafected in the -th iteration:

$$
\psi\_2(A, B, C, j) \land A'[j] = A[j] \land C'[j] = C[j] \land B'[j] = B[j] \implies A'[j] = B'[j].
$$

258 Sumanth Prabhu S, Grigory Fedyukovich, and Deepak D'Souza

The delta w.r.t. the frst query is shown in bold. This query also has the same solution as 1.

$$
\psi\_2 \mapsto \lambda A, B, C, j. \, A[j] = B[j].
$$

To build the new invariant from <sup>2</sup> and <sup>1</sup> to the new invariant candidate, we split the array range into two segments based on the range formula, and its negation:

$$\begin{aligned} i\mathfrak{m}\mathfrak{v} \stackrel{\text{cand}}{\mapsto} \lambda A, B, C, i. \forall j (0 \le j < N \land j < i \implies A[j] = B[j]) \land \\ \forall j (0 \le j < N \land j < i \land 2 \ast j < N \implies A[j] = B[j]) \land \\ \forall j (0 \le j < N \land j \ge i \land 2 \ast j < N \implies A[j] = B[j]). \end{aligned}$$

The second conjunct is derived from <sup>2</sup> and the range formula ( < ), whereas the third conjunct is from <sup>1</sup> and negation of the range formula ( ≥ ). If the CHC has any additional constraints (like 2 \* < here) that will be added in the antecedent as well.

While validating this candidate, the algorithm goes over the CHCs again and checks the implications: it now turns out to be not inductive for the third CHC. The algorithm thus repeats the abductive strengthening and poses two queries:

$$\begin{aligned} \psi\_3(A, B, C, j) \land B'[j] &= B[j] \land C'[j] = C[j] \land A'[j] = C[j] \implies A'[j] = B'[j],\\ \psi\_4(A, B, C, j) \land B'[j] &= B[j] \land C'[j] = C[j] \land A'[j] = A[j] \implies A'[j] = B'[j], \end{aligned}$$

getting the following next candidate, that is subsequently validated:

$$\begin{aligned} \mathsf{inv} \mapsto \lambda A, B, C, i. \forall j (0 \le j < N \land j < i \implies A[j] = B[j]) \land \\ \forall j (0 \le j < N \land j < i \land 2 \ast j < N \implies A[j] = B[j]) \land \\ \forall j (0 \le j < N \land j \ge i \land 2 \ast j < N \implies A[j] = B[j]) \land \\ \forall j (0 \le j < N \land j < i \land 2 \ast j \ge N \implies A[j] = B[j]) \land \\ \forall j (0 \le j < N \land j \ge i \land 2 \ast j \ge N \implies B[j] = C[j]) \land \end{aligned}$$

Synthesizing Finally, the precondition is obtained from the solution for . Because the frst CHC initializes the counter to zero, all the conjuncts with < simplify to true and the rest simplifes to:

$$\begin{aligned} \mathbf{pre} &\mapsto \lambda N, A, B, C. \forall j (0 \le j < N \land 2 \ast j < N \implies A[j] = B[j]) \land \\ &\forall j (0 \le j < N \land 2 \ast j \ge N \implies B[j] = C[j]). \end{aligned}$$

# 6 Range Abduction

In this section, we present our technique called range abduction for inferring quantifed invariants, and subsequently, quantifed preconditions. We defne the Abduce method for quantifed formulas over arrays and linear arithmetic that can be used in the general algorithm of abductive invariant synthesis. Its core

features include the capability to selectively apply quantifer elimination, such that it keeps all quantifers that are explicit in the abducible formula. As its main computational vehicle, the method uses quantifer elimination over linear arithmetic on formulas produced from the actual abducibles by over-approximating (as precisely as possible) the array computation.

#### 6.1 Preliminaries

CHCs We frst formally describe the CHC structure that we support corresponding to linear loops. We assume that the inputs are given as CHCs, where bodies are in CNF (otherwise, it can be transformed following Sect. 3.2). For each CHC, we consider two disjoint vectors of source (resp., destination) variables, ⃗ and ⃗ (resp., ⃗ ′ and ⃗ ′ ), such that only ⃗ (resp., ⃗ ′ ) consists of array variables.

We allow only a single index to access elements of all arrays ∈ ⃗ in each CHC , and without loss of generality we assume that it is an integer variable ∈ ⃗ (usually, a loop counter).<sup>8</sup> For simplicity, we also introduce a set of temporary integer variables ⃗ that store some elements selected from arrays and can be used in other parts of (e.g., to compute the next value to be written to an array ′ via some function ). Thus, we assume that only three possible types of constraints are used to equate arrays (or their elements), and that they appear in recursive CHCs, that is:

$$\begin{aligned} \mathsf{in}\mathbf{v}\_1(\vec{v}, \vec{a}) \land \left[ (a' = a \land)^\* \right] \land \left[ (t = a[i] \land)^\* \right] \land \\ \left[ (b' = \mathrm{store}(b, i, f(\vec{v}, \vec{t})) \land)^\* \right] \land \varphi(\vec{v}, \vec{v}', \vec{t}) \implies \mathsf{in}\mathbf{v}\_2(\vec{v}', \vec{a}') \end{aligned} \tag{5}$$

where \* is Kleene star, , ∈ ⃗, ′ , ′ ∈ ⃗ ′ , ∈ ⃗, and is over only non-array variables. Note that sequences of stores (e.g., nested) could be supported after some sort of a CHC normalization, e.g., by introducing temporary uninterpreted predicates and splitting . Symbols <sup>1</sup> and <sup>2</sup> might refer to the same predicate.

Queries There is only a single query among CHCs, and it has the form of either of the two implications:

$$\mathsf{inv}(\vec{v}, \vec{a}) \land \varphi(\vec{v}) \land \exists x. (R(x, \vec{v}) \land Q(x, \vec{v}, \vec{a})) \implies \bot \tag{6}$$

$$\mathsf{inv}(\vec{v},\vec{a}) \land \varphi(\vec{v}) \land \forall x. (R(x,\vec{v}) \implies Q(x,\vec{v},\vec{a})) \implies \bot \tag{7}$$

In the body of the query, there is a quantifer-free conjunct and a quantifed formula with subformulas and . Formula could represent the termination condition of the array processing loop/recursion (captured in the other CHCs). The subformulas and could represent, respectively, a range of elements in

<sup>8</sup> In practice, the restrictions about array accesses and the shape of the CHC can be relaxed, but requires a more careful handling than we propose in this paper. Our implementation has it, but the paper omits it to maintain the simplicity of presentation.

an array (giving a condition over possible index of the array), and a property over an array element (indexed using the variable). We restrict read-accesses of arrays to the quantifed variable only.

The formula in the query determines an initial candidate interpretation for the predicate in the query. For instance, in (6) and (7), respectively:

$$\vec{u}m\vec{v} \stackrel{\text{cand}}{\mapsto} \lambda\vec{v}, \vec{a}. \forall x. \left(R(x, \vec{v}) \implies \neg Q(x, \vec{v}, \vec{a})\right) \tag{8}$$

$$
\vec{u}m\vec{v} \stackrel{\text{end}}{\leftrightarrow} \lambda \vec{v}, \vec{a}. \; \exists x. \; (R(x, \vec{v}) \land \neg Q(x, \vec{v}, \vec{a})) \tag{9}
$$

Applying Algorithm 1 We assume that an iteration of the algorithm deals with a mapping *M* and the following CHC, where *M* (1) might be currently ⊤, but *M* (2) is quantifed:

$$\dot{\mathbf{n}}\mathbf{n}\mathbf{v}\_1(\vec{v},\vec{a}) \land \varphi(\vec{v},\vec{a},\vec{v}',\vec{a}') \implies \dot{\mathbf{n}}\mathbf{n}\mathbf{v}\_2(\vec{v},\vec{a})\tag{10}$$

Abductive strengthening is needed when the following implication is not valid on substitutions of interpretations of <sup>1</sup> and <sup>2</sup> (line 6 of Algorithm 1), thus necessitating to fnd , such that the following is valid:

$$
\psi \land \varphi \implies \mathcal{M}(inv\_2) \tag{11}
$$

Intuitively, for our algorithm reuses the quantifed structure of *M* (2). For all quantifer-free conjuncts of *M* (2), strengthening is done following the simple abduction, like e.g., in [17]. For quantifed formulas, the algorithm is trickier. In the rest of this section, we assume that algorithms are strengthening w.r.t. formulas <sup>∃</sup> and <sup>∀</sup> having the forms, respectively (9) and (8).

#### 6.2 Core Technique

The basic principle behind our quantifed abductive strengthening is in the preservation of the range. That is, if the quantifed formula on the right side of (11) has form (9) or (8), then it intuitively means that some property (, ⃗, ⃗) should hold either for all elements of array(s) (when quantifcation is universal), or some elements of arrays ⃗ (when quantifcation is existential), determined by (, ⃗). Thus, an interpretation of predicate on the left side of (11) should also constrain all elements of (some) arrays belonging to the same range.

Since by our syntax restrictions we allow elements of arrays ∈ ⃗ to be rewritten using only a single index , each constraint ′ = store(, , (⃗,⃗)) can be safely replaced in the CHC body as:

$$b'[i] = f(\vec{v}, \vec{t}) \qquad \text{and} \qquad \forall j. \\ i \neq j \implies b'[i] = b[i]$$

In the following, we are going to use the auxiliary mapping to reduce abduction over array and integer variables to purely integer abduction.

Defnition 10. Let ⃗, ⃗, and ⃗ ′ be sets of array variables, integer variables, and integer terms, respectively, all of the same cardinality. A bijection *SS* : ⃗ ′ → ⃗ is called select-substitution w.r.t. index , if for every ∈ ⃗, there exists ∈ ⃗ such that *SS*([]) = .

Algorithm 2: Abduce(, ⃗, )


The pseudocode of our range abduction is given in Algorithm 2. Below we discuss its details.

Universally-quantifed formulas (8) The abduction query of the form (11) can be decomposed (line 1) into two stronger abduction queries, 1, 2:

$$\psi\_1 \land \left[ (b'[i] = f(\vec{v}, \vec{t}) \land)^\* \right] \dots \implies (R(i, \vec{v}) \implies Q(i, \vec{v}, \vec{a})) \tag{12}$$

$$\psi\_2 \land \left[ (b'[i] = b[i] \land)^\* \right] \dots \implies \left( R(i, \vec{v}) \implies Q(i, \vec{v}, \vec{a}) \right) \tag{13}$$

Since *M* (2) is universally-quantifed and due to our syntactic restrictions, only the -th elements of any source arrays are relevant for the abduction query. Thus, without loss of generality, our algorithm lowers the (possibly) universallyquantifed formula in *M* (2) to a quantifer-free formula over the -th array element, and further replaces all the array access terms of the form [] to integer terms using a select-substitution *SS*, essentially boiling down to two abduction queries over pure integer arithmetic with abducibles ′ <sup>1</sup> and ′ 2 (lines 3, 4).

After the abduction solver returns ′ <sup>1</sup> and ′ 2 for the integer arithmetic queries, the *SS*<sup>−</sup><sup>1</sup> mapping is applied to replace integer terms by array terms [] to get <sup>1</sup> and <sup>2</sup> that constitute solutions to queries (12) and (13)(line 5).

It remains fnally to re-introduce the universal quantifer for to 1[/] and 2[/] to get a solution to our main abduction query (11). There are several ways to do it. One way is to not introduce quantifers for <sup>1</sup> as the query (12) captures the efect of a single store to an -th element of an array. For 2, then, the quantifer's range will span over all the original range except . However, this way, seemingly obvious, does not work in practice because the produced invariant is unlikely to be inductive.

Another way is to split the range into two segments with the border at . It would intuitively correspond to the range formula computation of [22], i.e., the sub-array that has already been processed in the loop encoded by the CHC, and the sub-array that remains to be processed. The former restricts the range of <sup>2</sup> (lines 10, 14) and the latter of <sup>1</sup> (lines 9, 13). More formally:

Defnition 11. For an inductive CHC with loop counter , where is in the interval [, ], and a free variable , the range formula is < when ≥ is inductive at , and > when ≤ is inductive at .

In Algorithm 2, is the range formula returned by ComputeRangeFormula. Additionally, GetCondition adds predicates that are present in the constraint of the CHC (like 2 \* < ) after substituting the loop counters in them by the quantifed variables.

Existentially-quantifed formulas (9) Similar to the universally-quantifed case, the abduction query (11) for existential quantifcation will be decomposed into two abduction queries. Queries (12) and (13) in this case have the form:

$$\begin{aligned} \psi\_1 \land \left[ \left( b'[i] = f(\vec{v}, \vec{t}) \wedge \right)^\* \right] \dots & \implies \left( R(i, \vec{v}) \wedge Q(i, \vec{v}, \vec{a}) \right) \\ \psi\_2 \land \left[ \left( b'[i] = b[i] \wedge \right)^\* \right] \dots & \implies \left( R(i, \vec{v}) \wedge Q(i, \vec{v}, \vec{a}) \right) \end{aligned}$$

The remainder of the algorithm in this case is the same as in the universallyquantifed case with the exception that we disjoin two quantifed solutions for the abduction queries before checking if it is inductive.

#### 6.3 Houdini Algorithm

The strengthening performed by Algorithm 2 might result in a too strong candidate invariant for already validated CHCs. To resolve this, Algorithm 1 weakens the candidate invariants by using an existing algorithm called Houdini [24](line 7). Given a set of relations and a mapping *M* , Houdini recursively weakens *M* until it is inductive at each CHC whose rel(head()) ∈ . It does this by fnding a counterexample to inductiveness and dropping the conjuncts that don't satisfy the counterexample.

#### 6.4 Illustration of Existentially Quantifed Precondition Inference

We end this section by illustrating Algorithm 1 on an existentially quantifed postcondition from Fig. 4. The CHCs of this program are given in Fig. 5.

The algorithm chooses an initial candidate for from the query. The loop condition is dropped like universal quantifcation, but the range formula is not

(, , , ) ∧ = 0 =⇒ (,,,,) (,,,,) ∧< ∧2 \* < ∧ ′= (, , ) ∧ ′ =+1 =⇒ ( ′ ,,,,′ ) (,,,,) ∧< ∧2\*≥ ∧ ′= (, , []) ∧ ′ =+1 =⇒ ( ′ ,,′ ,,) (,,,,) ∧ ¬(<) ∧ ¬(∃. 0 ≤ < ∧[]̸=[]) =⇒ ⊥

Fig. 5: CHC encoding of program in Fig. 4.

conjuncted for existential postcondition as this often results in a too strong precondition, viz. ⊥.

$$i \newline inv \stackrel{c \text{und}}{\leftrightarrow} \lambda i, N, A, B, C. \exists j. \newline 0 \le j < N \land A[j] \ne B[j].$$

Algorithm 1 now checks if either the second or third CHC in the is not inductive. Since the third CHC is not inductive, Abduce is called. The result of two abduction queries corresponding to -th element and non -th element, i.e. <sup>1</sup> and 2, will be [] ̸= [] and [] ̸= [], respectively. Further, quantifcation and range formulas are added, which will result in the candidate:

 cand ↦→ ,,,,. ∃. 0≤ < ∧ []̸=[]∧ ( ∃. 0≤ < ∧ ≥ ∧ 2 \* ≥ ∧ []̸=[]∨ ∃. 0≤ < ∧ < ∧ 2 \* ≥ ∧ []̸=[] )

Now, the Houdini algorithm fnds that the candidate is not inductive at the third CHC. For instance, it fnds a counterexample to validity of the form:

$$\begin{aligned} a[j] \neq b[j] \text{ for } j = i \text{ , otherwise } a[j] = b[j] \\ b[j] \neq c[j] \text{ for } j = i + 2 \text{ , otherwise } b[j] = c[j] \end{aligned}$$

It drops the conjunct ∃. 0 ≤ < ∧ [] ̸= [] that does not satisfy the counterexample. The rest are found to be inductive at the third and second CHCs.

$$\begin{aligned} \mathsf{inv} \mapsto \lambda i, N, A, B, C. \exists j. \, 0 \le j < N \land j \ge i \land 2 \ast j \ge N \land B[j] \ne C[j] \lor \\ \exists j. \, 0 \le j < N \land j < i \land 2 \ast j \ge N \land A[j] \ne B[j] \end{aligned}$$

Finally, the precondition is computed from the frst CHC by substituting = 0, resulting in:

$$\mathbf{cpr} \mapsto \lambda i, N, A, B, C. \exists j. \, 0 \le j < N \land 2 \ast j \ge N \land B[j] \ne C[j].$$

# 7 Maximal Preconditions

The interpretation of generated by Algorithm 1 is guaranteed to be a precondition by Theorem 2, but it could be non-maximal. That is, it may exclude

# Algorithm 3: MaximalPrecond(, )

Input: – set of CHCs over *R* , ∈ *R* – precondition relation Output: *M* () – Maximal precondition for <sup>1</sup> *M* ← InferAbd(, { ∈ *R* . ⊤}); <sup>2</sup> *M* ← InferAbd(, { ∈ *R* . ⊤}); 3 ← ¬( ¬*M* () =⇒ *M* () ) ; <sup>4</sup> while CheckSAT() do <sup>5</sup> ← GetModel(); // is of the form ⋀ 0≤≤ = <sup>6</sup> ← UnrollCHC(, ); 7 if then <sup>8</sup> *M* ← Weaken(, *M* () ∨ ) 9 else <sup>10</sup> *M* ← Weaken(, *M* () ∨ ) 11 ← ¬( ¬*M* () =⇒ *M* () ) ; 12 return *M* ();

some initial states from which the postcondition holds. In this section, we propose a technique that checks whether a precondition is maximal (i.e. logically weakest). If not, it incrementally weakens the precondition in a loop until it becomes maximal.

#### 7.1 Overview

Algorithm 3 gives a description of the maximality checker. Given a precondition inference problem via a system of CHCs , it returns a maximal precondition on termination. It frst generates a precondition for using Algorithm 1. In order to check whether the precondition is maximal, the algorithm infers another precondition for the complement CHC system (line 2). Recall from Defnition 5 that this system has the same structure as except the postcondition in the query is complemented. To avoid confusion, we consider of this system is substituted by another uninterpreted relation with the same arity . For example, Fig 5 is the complement CHC system of Fig 3.

The maximality check is performed next by checking whether all the states that are outside *M* () are in *M* ()(line 4). Intuitively, if all the states in ¬*M* () are in *M* () then those states violate the postcondition as *M* () is the precondition of the complement postcondition. The validity check is reduced to a satisfability check by negation and the model to the satisfability check is called a counterexample-to-maximality, or CTM.

The algorithm uses the CTM to determine which of or has to be weakened by invoking the method UnrollCHC (line 6). Intuitively, UnrollCHC performs a task similar to executing the program represented by CHCs with as the initial state. More precisely, UnrollCHC will fnd unrollings (Defnition 7) of diferent lengths for the extensions and and terminates when an unrolling is satisfable. It then returns whether the unrolling was from , or . For a deterministic CHC system (Defnition 8), a satisfable unrolling exists either for , or .

In the next step, the algorithm will weaken if the unrolling is from , or if the unrolling is from . The weakening is performed by Weaken, which will be called with an appropriate CHC system and the current interpretation for the precondition(lines 10, 8). Weaken will generalize the precondition and fnd inductive invariants. This loop of checking for CTM and weakening one of the precondition continues till the maximal precondition is found.

Theorem 3. The precondition returned by Algorithm 3, when it terminates, is maximal when is deterministic and terminating.

Example 3. In Sect 5.2 and Sect 6.4, Algorithm 1 found the following interpretations for and :

$$\begin{aligned} \mathbf{pre} &\mapsto \lambda N, A, B, C. \forall j. \, 0 \le j < N \land 2 \, \*j < N \Longrightarrow A[j] = B[j] \land \\ &\forall j. \, 0 \le j < N \land 2 \, \*j \ge N \Longrightarrow B[j] = C[j]. \end{aligned}$$

$$\mathbf{cpr} \mathbf{e} \mapsto \lambda i, N, A, B, C. \exists j. \, 0 \le j < N \land 2 \ast j \ge N \land B[j] \ne C[j].$$

The reader may notice that is not maximal, hence it is not possible to check whether is maximal. We now illustrate how Algorithm 3 determines this.

After fnding the interpretations, Algorithm 3 checks the following formula:

$$\neg\left(\forall j.\,0\le j
$$\implies$$

$$\exists j.\,0\le j$$
$$

Since this formula is satisfable, the algorithm deduces that at least one among *M* () and *M*() is not maximal. Suppose it gets the following satisfability model, or CTM:

$$N = 1 \land A[0] = 0 \land B[0] = 1 \land C[0] = 0.$$

UnrollCHC fnds that the CHCs violate the property when the CTM is the initial state. Hence, , the precondition of negation of the property, can be weakened by at least one point, viz. CTM.

Algorithm 4: Weaken(, *M* () ∨ )

Input: – set of CHCs over *R* , *M* () ∨ Output: *M* ′ – a solution to with *M* () ∨ =⇒ *M* ′ () <sup>1</sup> ← ConstructGrammar(, *M* ()); 2 while ⊤ do <sup>3</sup> ← NextCandidate(); <sup>4</sup> if CheckSAT(¬ ( =⇒ ) ) then continue; 5 ← *M* () ∨ ; 6 for ∈ [0 · · · ] where *R* = {<sup>0</sup> = , <sup>1</sup> · · · } do 7 *M* ′ () ← InvInfer(, *M* ′ , ) or for 0; <sup>8</sup> if ∃ ∈ . CheckSAT(¬ ( body() =⇒ head() ) [*M* ′ /*R* ]) then continue; 9 return *M* ′ ;

#### 7.2 Weakening of Precondition

Once the precondition that has to be weakened is determined, a trivial weakening is to add the CTM to the current interpretation. However, this may cause nontermination as there can be infnitely many CTMs. In this section, we propose a heuristic in Algorithm 4 that can accelerate the weakening process.

Algorithm 4 works in two stages. First, it fnds a formula that is generally weaker than the trivial solution *M* ()∨ (lines 3- 5). To do this, it enumerates (line 3) a formula from an input grammar (a sample grammar is given in [49] ) and then checks if it is weaker. Then, it fnds inductive invariants *M* ′ (line 7) for the extended system (recall Defnition 6) using a slightly modifed version of range abduction (algorithmic description is in [49]). By Lemma 2, and *M* ′ together forms a solution to the input system .

Theorem 4. Algorithm 4 returns a solution *M* ′ to , and *M* () ∨ =⇒ *M* ′ ()

Example 4. We continue illustration of Example 3. Algorithm 4 is called with a complement CHC system (Fig 5) and *M* () ↦→ ,,,,. ∃. 0 ≤ < ∧ 2 \* ≥ ∧ []̸=[] and = = 1 ∧ [0] = 0 ∧ [0] = 1 ∧ [] = 0. Suppose that the algorithm samples as ∃. 0≤ < ∧ 2 \* < ∧ []̸=[] based on the constraints from query and second CHC. Since the check at line 4 passes, will be assigned:

$$\exists j. \, 0 \le j < N \land 2 \ast j \ge N \land B[j] \ne C[j] \lor \exists j. \, 0 \le j < N \land 2 \ast j < N \land A[j] \ne B[j].$$

InvInfer uses the postcondition to compute *M* [+1] and to compute *M* [−1]. It then adds < to the former, ≥ to the latter, and disjuncts them (due to existential quantifcation) to get:

$$\begin{aligned} \mathsf{inv} \mapsto \lambda i, N, A, B, C. \exists j. 0 \le j < i \land A[j] \ne B[j] \lor \\ \exists j. i \le j < N \land 2 \ast j \ge N \land B[j] \ne C[j] \lor \\ \exists j. i \le j < N \land 2 \ast j < N \land A[j] \ne B[j] \end{aligned}$$

Maximal Quantified Precondition Synthesis for Linear Array Loops 267

Since this is inductive at all CHCs, the algorithm returns with and . Algorithm 3 will perform its check and fnds that is maximal.

# 8 Evaluation

Tool We implemented our algorithms in a tool called PreQSyn on top of the FreqHorn framework [22]. Our tool takes as input a precondition-inference problem encoded as a set of CHCs. It uses Z3 [44] to solve SMT queries. Quantifer elimination is performed using the solver from [20] that uses model-based projection [5]. On a successful execution, our tool infers maximal preconditions and inductive invariants for the loops.

Evaluation Goals We evaluate PreQSyn on the following research questions:


Benchmarks and Confguration We use 32 precondition inference problems with 29 universal and 3 existential quantifed postconditions. Since none of the benchmarks from [54] had quantifed postconditions, we derived a majority (26/32) of benchmarks from the existing verifcation benchmarks of [22] that have been collected from various sources like SV-COMP. In particular, we considered 48 benchmarks from the public repository of [22] that have multiple loops, i.e., the frst loop has an array initialization, and the other loops involve various types of array processing like copying, modifying, fltering, and searching among the elements. We then excluded the frst (initialization) loop from each benchmark, thus targeting the necessity of synthesizing a quantifed precondition that would intuitively describe how the arrays need to be initialized in order to meet the postcondition. We further excluded benchmarks that gave repetitive problems (8/48) and did not meet our syntactic restrictions (viz. had non-quantifed postconditions (6/48), had nested loops (5/48), or had non-linear expressions (3/48)). We added 6 new benchmarks to test diferent features of our tool.

We performed the experiments on an Ubuntu 20.04 machine with a 2.5 GHz processor and 16 GB memory. A timeout of 100s was given to all the tools.

RQ1 PreQSyn inferred a precondition for 31/32 benchmarks. The failed benchmark timed out in the inductiveness check. Out of 31 preconditions, 22 were proved to be maximal automatically. All the successful benchmarks were completed within 5 seconds. Overall, PreQSyn solved CHC tasks numbering 31 with universally quantifed and 30 existentially quantifed postconditions corresponding to and .

268 Sumanth Prabhu S, Grigory Fedyukovich, and Deepak D'Souza

On manual inspection of 9 benchmarks for which PreQSyn found a precondition but was unable to prove maximality, 5 were found to be non-deterministic. However, the inferred preconditions for them were sufciently weak. The rest 4 failed in diferent stages of weakening . Among these benchmarks, we found that 3/4 preconditions (i.e. ) were actually maximal.<sup>9</sup>

RQ2 We ran P-Gen (with Z3 v2.0 as its SMT solver) on semantically equivalent C programs manually constructed from the CHCs. P-Gen found only 2/32 preconditions as maximal. Both of them were existentially quantifed. It timed out on 5/32 benchmarks. On the remaining 25/32 benchmarks it exited without fnding a precondition. Overall, PreQSyn inferred signifcantly more preconditions than P-Gen due to the generalization capability of range abduction.

RQ3 We tried to replace our invariant inference technique by an existing one, thus evaluating the need to discover our invariants. Existing state-of-the-art CHC solving tools can handle arrays, to some extent, namely: Spacer [32](Z3 v4.8.10), a PDR-based invariant inference tool, and FreqHorn [22] (v.0.6), a SyGuS based invariant inference tool. So we pose the simpler problem of inferring invariants with preconditions to them. Furthermore, we also pose this as an assertion checking problem to VeriAbs [15] (v1.4.2), a portfolio solver that targets linear loops and the gold winner of SV-COMP 2022 ReachSafety Category [4] and the winner of array category since several years.

To create invariant inference and verifcation problems, we consider 42 precondition inference problems corresponding to and for which Pre-QSyn was able to fnd the maximal preconditions. The 42 precondition inference problems were converted manually to verifcation problems by using the maximal preconditions. For Spacer, the CHCs were annotated by the maximal interpretations of and , for VeriAbs, semantically equivalent C programs with maximal preconditions as loops, and for FreqHorn original CHCs were provided as input.

Out of 42 problems, 21 each of universally and existentially quantifed postconditions, VeriAbs solved 37, FreqHorn solved 20, and Spacer solved 11.

RQ4 We disabled Houdini algorithm from line 7 of Algorithm 1 and PreQSyn found preconditions for 27 benchmarks compared to 31 with the range abduction algorithm. Out of 27, only 6 were proved maximal. We conclude that weakening by Houdini is useful, especially when postconditions are existentially quantifed. We extended the SMT-based maximality checking algorithm from [50], but it was unsuccessful in proving the 21 problems that our maximality checker proved.

# 9 Related Work

The problem of precondition inference appears in multiple applications and has been the subject of numerous works. Broadly, these works can be classifed as

<sup>9</sup> Detailed results of evaluation with timings can be found in [49].

static [45,14,54,13], dynamic [53,25,3,41], and a mix of both [46]. Our technique falls in the frst category. The two works closest to ours are [14] and [54] which compute maximal quantifed preconditions for array programs, using abstract interpretation and CEGAR-based predicate abstraction, respectively. Unlike the technique in [14], our work does not require predefned abstract domains. The technique in [54] computes over-approximations of safe and unsafe states (i.e. over-approximations of and ) and then refnes them till they become disjoint. The over-approximations are computed using predicate abstraction and the predicates required for the refnement of the abstraction are derived from a set of heuristic rules. Our technique difers from theirs in several ways: we rely on abduction-based techniques to infer necessary predicates, while they rely on minimal unsat cores; we infer quantifed inductive invariants that witness the correctness of the inferred preconditions, while their technique does not; fnally, we target quantifed postconditions while they consider only quantiferfree postconditions.

The problem of inferring universally quantifed inductive invariants has received considerable attention. The inference is made using methods such as abstract interpretation [30], predicate abstraction using Skolem constants [40] and interpolation [34], an extension of IC3 for arrays [32], and syntax-guided synthesis [22]. These techniques, apart from being restricted to universal quantifcation, also expect a precondition. Our technique overcomes these limitations by inferring preconditions including existentially quantifed ones.

Many techniques verify programs with arrays by transforming them to a sound abstraction without explicitly generating inductive invariants. The abstraction can be obtained by considering all the array elements as a single cell [7], or multiple fxed cells and then converting to array-free nonlinear CHCs [43], overapproximating unknown loop bounds to a smaller known bound [39], accelerating entire transition relations [8], using CHC transformations [6,35] and induction based techniques [9,10,11]. The portfolio solver VeriAbs [1] used in our experiment predominantly used the shrinking [39] technique to verify, which does not generate invariants. The tool also has induction-based techniques [9,10,11] that implicitly generate invariants, but are not given to the user. RAPID [26] translates the semantics of the input program into formulas in trace logic. Then the formulas are verifed using a theorem prover. Though sound lemmas are used to translate loops, it currently does not support the extraction of invariants from the lemmas. Apart from the inability to generate explicit invariants, all of these techniques need preconditions to verify the programs.

Our technique works on CHCs, which has gained much attention in recent years for diferent verifcation and inference tasks [57,36,51,21,19,50,32,22]. Most of these techniques do not handle arrays, and when they do, do not generate maximal preconditions.

The core part of our algorithm uses abductive inference. Abduction has been used for programs without arrays to infer invariants [17,18], preconditions [27,16], and specifcations [2,50]. The technique in [56] fnds specifcation over uninterpreted functions by overcoming the limitation of integer abduction engines through a data-driven approach. In contrast, our technique extends the abduction itself for quantifed formulas over arrays.

Recent works in specifcation synthesis uses artifacts like input-output examples, comments in the code, partial code snippets, and user-supplied constraints and languages to infer specifcations [12,55,47]. In comparison, our work uses the entire program and postcondition expressed as a logical formula to fnd maximal preconditions.

# 10 Limitations and Future Work

The restriction on array access statements simplifes the conversion between array and integer terms in range abduction. However, this can be relaxed to support terms like [[]], [+1] among others, by enhancing the select-substitution (recall Def 10).

The restriction on form of postconditions, inductive invariants and preconditions is required for efective range abduction and SMT checks. Our approach can easily support alternating quantifers, if the structure of the postcondition is close to the inductive invariant.

For non-deterministic programs, Algorithm 4 will not terminate when a CTM has two satisfable unrollings: and (refer Defnition 8). Hence, the maximality check will be inconclusive. Nevertheless, Algorithm 1 can still generate preconditions (with inductive invariants) for such programs, often maximal ones as observed in our experiments. We extend our approach to non-deterministic CHCs in [48].

In the case of non-terminating programs, an initial state with non-terminating execution can be added to either or , as it will have inductive invariants for both. If added to the latter, the maximality check could wrongly conclude that is maximal when it's not. Therefore, relaxing this restriction afects the soundness of the maximality check. An interesting future direction for maximality checking would be to extend the work presented in [29] to incorporate array handling.

# Data Availability and Artifact

The artifact accompanying the paper is publicly available at [52].

# Acknowledgement

We would like to thank Pamina Georgiou for helping us with RAPID [26] tool.

# References

1. Afzal, M., Asia, A., Chauhan, A., Chimdyalwar, B., Darke, P., Datar, A., Kumar, S., Venkatesh, R.: Veriabs: Verifcation by abstraction and test generation. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 1138–1141. IEEE (2019)


<sup>272</sup> Sumanth Prabhu S, Grigory Fedyukovich, and Deepak D'Souza


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Verifed Inlining and Specialisation for PureCake

Hrutvik Kanabar1(Q) , Kacper Korban<sup>2</sup> , and Magnus O. Myreen<sup>2</sup>

> <sup>1</sup> University of Kent, Canterbury, UK hrk32@cantab.ac.uk <sup>2</sup> Chalmers University of Technology, Gothenburg, Sweden kacper.f.korban@gmail.com myreen@chalmers.se

Abstract. Inlining is a crucial optimisation when compiling functional programming languages. This paper describes how we have implemented and verifed function inlining and loop specialisation for PureCake, a verifed compiler for a Haskell-like (purely functional, lazy) programming language. A novel aspect of our formalisation is that we justify inlining by pushing and pulling let-bindings. All of our work has been mechanised in the HOL4 interactive theorem prover.

Keywords: verifed compilation · function inlining · loop optimisation · functional programming · machine-checked proofs

# 1 Introduction

It can be tricky to generate high-quality code from lazy, purely functional programs for a number of reasons. One of these reasons is that functional programming encourages a brief declarative style that makes heavy use of shorthands (e.g., for partially-applied functions) and higher-order functions [8]. Producing good code from such input requires a well-developed inliner, as noted [17] by the developers of the Glasgow Haskell Compiler (GHC):

"One of the trickiest aspects of a compiler for a functional language is the handling of inlining. [...] Efective inlining is particularly crucial in getting good performance."

This paper is about implementing and verifying an inliner that can specialise loops for PureCake, an end-to-end verifed compiler for a Haskell-like language [10].

The inliner by example. The following simple example demonstrates what our inliner does. Imagine that a programmer is to write a function that increments every element of a list of integers. The programmer should write:

suc\_list = map (+1)

Here, the programmer has relied on the library function map below to perform the necessary list traversal.

276 H. Kanabar et al.

map f [] = [] map f (x:xs) = f x : map f xs

To generate high-quality code for suc\_list, the compiler must both inline and specialise map. Our inliner takes the defnition of suc\_list above and produces the following code.

```
suc_list =
  let map' xs =
    case xs of
      [] -> []
      (y:ys) -> y + 1 : map' ys
  in map'
```
In particular, the inliner has combined the following code transformations:


Contributions. Our work adds verifed inlining and loop specialisation to Pure-Cake. Our inliner is capable of optimisations such as the one above. More specifically, we make the following contributions:


All of our work is mechanised using the HOL4 interactive theorem prover, and our development is open-source.<sup>3</sup> To the best of our knowledge, ours is the frst verifed inliner for a lazy functional programming language, and the frst verifed loop specialiser for any functional language.

# 2 The Inliner by Example

We begin with a high-level explanation of how our inliner works, before diving into verifcation details in later sections. We will show the transformations the

<sup>3</sup> https://github.com/cakeml/pure, see also our artifact hosted on Zenodo [9].

inliner performs step-by-step. As a running example, we use the code from the previous section with one modifcation: we lift (+1) to a separate function add1 for clarity. The input code after this modifcation is as follows:

```
suc_list = map add1
add1 i = i + 1
map f [] = []
map f (x:xs) = f x : map f xs
main = ...
```
Our inliner is installed very early in the PureCake compiler, directly after parsing and binding group analysis. Binding group analysis processes the program above to the code below, breaking up the mutually recursive bindings into a nesting of let-expressions. Note that there is no dependency between add1 and map, so their defnitions could be reordered; for this example we put add1 frst.

```
18 let add1 i = i + 1 in
19 let map f l = case l of
20 [] -> []
21 (x:xs) -> f x : map f xs in
22 let suc_list = map add1 in
23 let main = ... in main
```
The inliner receives this program as input. As it traverses the program, it records known defnitions that it may wish to inline later on. In particular, it maintains a mapping from names to their defnitions, which starts of empty. Therefore, after processing line 18 (i.e., the defnition of add1), the mapping contains only the defnition of add1, that is, \i -> i + 1.

The inliner then moves to line 19, the let-expression that defnes map. The defnition of map is recursive, so the inliner analyses it to determine whether any of its arguments remain constant over all recursive calls. In the case of map, it fnds that the frst argument, f, remains constant. This means that it can loop specialise map to produce the following equivalent defnition.

```
let map f =
    let map' l = case l of
                   [] -> []
                   (x:xs) -> f x : map' xs
    in map'
in ...
```
Our inliner does not alter the defnition of map in the program, but it does add this equivalent defnition to its mapping of known defnitions. We will very soon see why it is useful to pull out the constant argument f.

#### 278 H. Kanabar et al.

The inliner moves on to the defnition of suc\_list on line 22.

let suc\_list = map add1 in ...

After pulling out the constant argument f above, the inliner considers map to be a single-argument function. Therefore, the application map add1 here seems fully applied and the inliner will rewrite it. First, it transforms map add1 into the following.

```
let suc_list =
    let f = add1 in
    let map' l = case l of
                   [] -> []
                   (x:xs) -> f x : map' xs
    in map'
in ...
```
Notice the use of a binding let f = add1 to assign the constant argument f of map. Then, the inliner recurses into this expression, replacing f by add1 in the second row of the pattern match:

(x:xs) -> add1 x : map' xs

The inliner recurses again into the modifed subexpression add1 x, and realises that add1 (which is mapped to \i -> i + 1) is fully applied. Therefore, it inlines add1 too:

(x:xs) -> (let i = x in i + 1) : map' xs

Once again, the inliner recurses on the modifed subexpression, turning the innermost i into x:

(x:xs) -> (let i = x in x + 1) : map' xs

The fnal code produced by the inliner is below. The defnition of suc\_list has been rewritten so extensively that it now resembles a copy of map which has been specialised to the add1 function.

```
41 let add1 i = i + 1 in
42 let map f l = case l of
43 [] -> []
44 (x:xs) -> f x : map f xs in
45 let suc_list =
46 let f = add1 in
47 let map' l = case l of
48 [] -> []
49 (x:xs) -> (let i = x in x + 1) : map' xs
```
<sup>50</sup> in map' <sup>51</sup> in let main = ... in main

> Some dead code remains, e.g., let f = add1 (line 46) and let i = x (line 49). We perform a simple dead code elimination pass immediately after the inliner to remove these.

> Single-pass optimisation. Note that our inliner does not make multiple passes over input code, in contrast to the presentation above. It performs a single topdown pass over its input, calling itself recursively only on function applications or variables that it has successfully rewritten. The depth of this recursion is bounded by a simple user-confgurable recursion limit.

# 3 Setting: PureCake

We implement and verify our inlining and specialisation optimisations as part of the verifed compiler PureCake. In this section, we describe both the PureCake project at a high level, and the key aspects of its formalisation on which we rely.

What is PureCake? PureCake [10] is an end-to-end verifed compiler for a Haskell-like language known as PureLang. Here, a "Haskell-like" language is one which: is purely functional with monadic efects; evaluates lazily; and has a syntax resembling that of Haskell. PureCake compiles PureLang to the CakeML language, which is call-by-value and ML-like, and has an end-to-end verifed compiler [12,14]. CakeML targets machine code, so PureCake and CakeML can be composed to produce end-to-end guarantees for the compilation of PureLang to machine code [10, §6].

The PureCake compiler is designed to be realistic: it accepts a featureful input language and generates performant code. This makes it an ideal setting for verifed inlining and specialisation optimisations. We add these to PureCake as PureLang-to-PureLang transformations.

Formalisation details. PureLang is formalised using two ASTs: compiler expressions and semantic expressions, denoted ce and e respectively [10, §3.2]. The compiler implementation uses compiler expressions, and their semantics is given by desugaring into semantic expressions (denoted desugar, of type ce → e).

The call-by-name operational semantics of PureLang is defned over its simpler semantic expressions [10, §3.3]. This semantics admits an equational theory [10, §3.4] which is sound and complete with respect to contextual equivalence. Its equivalence relation, e<sup>1</sup> ∼= e2, is based on an untyped applicative bisimulation from Abramsky's lazy λ-calculus [1] and is proved congruent via Howe's method [7], i.e., expressions composed of equivalent subexpressions are themselves equivalent.

PureCake's compiler passes are verifed in two stages.

280 H. Kanabar et al.


Composition of the two stages produces the overall proof that the action of the compiler implementation preserves semantics. A key beneft of this approach is that heuristics remain an implementation detail in stage 2, and can be changed without incurring the signifcant proof obligations of stage 1.

Approach and paper outline. We can now describe more precisely the steps we took to add inlining and loop specialisation to the PureCake compiler.


# 4 Inlining as a Relational Envelope

In this section, we defne a relation which characterises all the inlinings that we wish to perform. We then prove that any code transformation contained within this relational envelope must preserve semantics.

#### 4.1 Understanding the relation

We begin by describing the intuition behind our relation.

Inlining is not substitution. Inlining is a more complex transformation than substitution or β-conversion. If we were to view inlining as a special case of these, we would generate unsatisfactory code. In particular, consider the example below: inlining based on substitution must replace all three occurrences of f with its defnition; inlining based on β-conversion would remove the let-binding.

let f i = 5 in f 1 : map f xs ++ map f ys

By contrast, a real inliner must be able to choose whether to inline a defnition per use of that defnition. In other words, the inliner should decide which usages of a given defnition are rewritten on a case-by-case basis. For the example above, a real inliner should produce the code below. Note that it chooses to inline the function f only at the usage which fully applies it.

let f i = 5 in (\i -> 5) 1 : map f xs ++ map f ys

Of course, a real inliner would further transform (\i -> 5) 1 into 5 (this is in fact a β-conversion). For clarity in this example, we do not show that step.

Inlining is a series of let transformations. The key intuition behind our inlining transformations is as follows. We push let-bindings into expressions as far as possible, rewrite the result, then pull the bindings out again. We illustrate this by example below, starting from the same initial code as above.

let f i = 5 in f 1 : map f xs ++ map f ys

We now push in the let-binding which defnes f to produce a series of equivalent expressions. First, we push it in one step past the list constructor (:):

(let f i = 5 in f 1) : (let f i = 5 in map f xs ++ map f ys)

Next, we push it in through the function application f 1:

(let f i = 5 in f) (let f i = 5 in 1) : (let f i = 5 in map f xs ++ map f ys)

Now, we choose to rewrite the use of f under the frst let f i = 5 to \i -> 5:

(let f i = 5 in (\i -> 5)) (let f i = 5 in 1) : (let f i = 5 in map f xs ++ map f ys)

Note that we have chosen not to perform any other rewrites of f, because other uses of f are not fully applied.

We can now reverse the pushing in of let-bindings, i.e., we pull them out instead. The fnal result is as follows, where f is inlined exactly as we wanted:

let f i = 5 in (\i -> 5) 1 : map f xs ++ map f ys

Stacking let transformations. Above, our example shows how we can inline a single let-binding: we push it inwards, use it for rewriting, and pull it outwards back to its original position. We can generalise this straightforwardly to handle

a list of let-bindings. This mimics the implementation of a real inliner, which must carry with it a collection of defnitions it may wish to inline.

Consider the following example, in which an inliner attempts to rewrite the expression g 3 + 7 and carries defnitions f i = 5; h i = 2; g i = f i + 1.

```
let f i = 5 in
let h i = 2 in
let g i = f i + 1 in
  g 3 + 7
```
Just as with a single let-binding, we can push in the stack of let-bindings, rewrite, and pull them out again. This produces the following expression.

let f i = 5 in let h i = 2 in let g i = f i + 1 in (\i -> (\i -> 5) i + 1) 3 + 7

The only complication in generalising to a stack of let-bindings is that some defnitions can depend on others. In the example above, the defnition of g depends on f. This is why we model the bindings as a list: this preserves scoping correctly, ensuring we do not break any dependencies between defnitions.

Note that this intuition of pushing in and pulling out of let-bindings applies only to the formalisation that justifes our inlining rewrites. The implementation of our inliner performs no such push/pull transformations: as one might expect, it merely carries around a simple (unordered) map of variable names to their defnitions. This map represents exactly the set of defnitions that the inliner may wish to use for rewriting at usage sites.

#### 4.2 Defning a Semantics-Preserving Envelope

We now describe an inductive relation, l ⊩ e<sup>1</sup> ⇝ e2, which characterises all of the inlining transformations that we perform. We prove that any transformation described by the relation lies within the equational theory of PureLang (∼=, § 3). Therefore, the relation describes only semantics-preserving transformations.

The relation l ⊩ e<sup>1</sup> ⇝ e<sup>2</sup> should be read as follows: expression e<sup>1</sup> can be transformed into expression e<sup>2</sup> under the defnitions in the list l. Both e<sup>1</sup> and e<sup>2</sup> are PureLang semantic expressions, and l is a list of defnitions. Each such defnition is of the form x ← e, associating name x with semantic expression e. We will frst describe the formal meaning of l ⊩ e<sup>1</sup> ⇝ e2, which is best understood via its soundness theorem, Theorem 1. Then in following subsections, we describe key parts of the defnition of ⇝.

Theorem 1 relates derivations of l ⊩ e<sup>1</sup> ⇝ e<sup>2</sup> with ∼=, PureLang's equational theory, assuming pre and lets\_ok. The defnitions of pre and lets\_ok are shown in Figure 1—they enforce distinct variable names between both the expression e<sup>1</sup> and each of the defnitions in l to avoid inadvertent clashes or capture.

$$\begin{array}{l} \mathsf{vars\\_of} \ l \stackrel{\text{def}}{=} \bigcup \left\{ \{x\} \cup \text{frees\urs} \ e \mid \mathsf{mem} \ (x \leftarrow e) \ l \right\} \\\\ \mathsf{pre\\_l} \ e \stackrel{\text{def}}{=} \mathsf{b} \mathsf{read} \mathsf{re} \mathsf{f} \ \wedge \text{ bound} \\\\ \mathsf{let\\_ok} \ [] \stackrel{\text{def}}{=} \mathsf{T} \\\\ \mathsf{let\\_ok} \ ((x \leftarrow e) :: l) \stackrel{\text{def}}{=} \\\\ x \notin \mathsf{freeze} \ e \ \wedge \text{ } \left\{ \{x\} \cup \text{frees\urs} \ e \right\} \notin \{ x \mid \exists e. \mathsf{mem} \ (x \leftarrow e) \ l \} \ \wedge \text{lets\\_ok} \\\\ \end{array}$$

Fig. 1. The defnition of pre and lets\_ok. Here, the # predicate returns true only for disjoint sets: s1#s<sup>2</sup> def = (s<sup>1</sup> ∩ s<sup>2</sup> = ∅).

Theorem 1. Soundness of l ⊩ e<sup>1</sup> ⇝ e2.

$$\vdash \begin{array}{c} l \Vdash e\_1 \rightsquigarrow e\_2 \land \text{ } \mathsf{pre} \ l \ e\_1 \land \text{ } \mathsf{let} \texttt{s}\\_\mathsf{ok} \ l \Rightarrow \mathsf{let} \texttt{s} \ l \ e\_1 \cong \mathsf{let} \texttt{s} \ l \ e\_2 \end{array}$$

where lets [] e def = e and lets ((x ← e ′ ) :: l) e def = let x = e ′ in (lets l e)

In particular, expressions e<sup>1</sup> and e<sup>2</sup> related in the context of defnitions l produce equal expressions (according to ∼=) under the stack of let-bindings corresponding to l. The latter correspondence is encapsulated by the defnition of lets, which nests let-bindings. This theorem is proved by induction over the derivation of l ⊩ e<sup>1</sup> ⇝ e2. In upcoming subsections, we will examine key rules of ⇝ and their cases in this inductive proof.

When the inliner is frst invoked, it is passed an entire PureLang program and has no knowledge of any defnitions. In other words, its mapping of variable names to known defnitions is empty, corresponding to the list l being empty ([]). In this case, we can simplify Theorem 1 by instantiating l 7→ [], and unfolding the defnitions of pre l and lets\_ok l. This produces the following theorem:

Theorem 2. Soundness of [] ⊩ e<sup>1</sup> ⇝ e2.

⊢ [] ⊩ e<sup>1</sup> ⇝ e<sup>2</sup> ∧ barendregt e<sup>1</sup> ∧ closed e<sup>1</sup> ⇒ e<sup>1</sup> ∼= e<sup>2</sup>

We can read this as follows: if we can transform some closed e<sup>1</sup> which satisfes barendregt to some e<sup>2</sup> according to ⇝, then e<sup>1</sup> and e<sup>2</sup> are equivalent. The barendregt predicate restricts the variable naming convention within e<sup>1</sup> to avoid problems with variable capture, because PureLang has explicit names. In particular, barendregt is the well known Barendregt variable convention that enforces unique free/bound variable names across an entire program [3].

The precise defnition of barendregt is not necessary here. Sufce it to say that in order to discharge this assumption, our inliner implementation will rely on a freshening pass. This pass α-renames programs such that they obey the Barendregt variable convention, and therefore satisfy barendregt.

284 H. Kanabar et al.

Refexivity. We must allow the inliner to choose whether to rewrite a usage site on a case-by-case basis (§ 4.1). Therefore, the inliner must be allowed not to inline, i.e., it must be able to leave an expression unchanged. Therefore the ⇝ relation has a refexivity rule:

$$\begin{array}{c}\hline\hline l \Vdash e \rightsquigarrow e\\\hline l \Vdash e \rightsquigarrow e\\\hline\end{array} \begin{array}{c} \text{REFL} \\\hline \hfil e \\\hline \crightsquigarrow \hfil e \Vdash e \Vdash e \Vdash e \Vdash e \Vdash e \Vdash e}$$

The refl case of the proof of Theorem 1 boils down to showing the equation lets l e ∼= lets l e, which is trivial due to refexivity of ∼=.

Inlining. The simplest rule for inlining uses a defnition found in the list l (where mem denotes list membership) to rewrite a variable:

$$\frac{\mathsf{mem}\ (x \leftarrow e)\ l}{l \Vdash \mathsf{var}\ x \leadsto e} \operatorname{INLINE}$$

In particular, if l associates name x with defnition e, then the variable var x can be replaced by expression e. The inline case of Theorem 1 requires establishing:

⊢ mem (x ← e) l ∧ lets\_ok l ∧ pre l (var x ) ⇒ lets l (var x ) ∼= lets l e

Proof outline. We frst derive a lemma that allows us to duplicate a let-binding from l, assuming lets\_ok (defned in Figure 1 such that it enables this lemma):

$$\vdash \text{ } \mathsf{mem} \ (x \leftarrow e) \ l \land \text{ } \mathsf{lets} \ \_{\mathsf{e}} \mathsf{k} \ l \Rightarrow \mathsf{lets} \ l \; e' \cong \mathsf{lets} \ l \; (\mathsf{let} \ x = e \ \mathsf{in} \ e') \quad \text{Lerr-upp} \ \mathsf{k}$$

Equipped with the Let-dup lemma, we proceed as follows:

$$\text{letss}\ l\ (\mathbf{var}\ x) \cong \text{letss}\ l\ (\mathbf{let}\ x = e\ \mathbf{in}\ \mathbf{var}\ x)\tag{\text{LET-Dup}}$$

$$\cong \mid \mathtt{lets} \, l \, e \tag{trivial}$$

⊓⊔

Let. We can now inline known defnitions, but we must be able to learn those defnitions in the frst place. The rule Let allows us to add a let-bound defnition to the stack l, using the append operator (++).

$$\frac{l \Vdash e\_1 \leadsto e\_1'}{l \Vdash (\textsf{let}\ x = e\_1 \text{ in } e\_2) \leadsto (\textsf{let}\ x = e\_1' \text{ in } e\_2')} \text{ Lerr}$$

Proof outline. Let case of Theorem 1.

lets l (let x = e<sup>1</sup> in e2) ∼= lets (l ++ (x ← e1)) e<sup>2</sup> (defnition of lets) ∼= lets (l ++ (x ← e1)) e ′ 2 (ih for e2) ∼= lets l (let x = e<sup>1</sup> in e ′ 2 ) (defnition of lets) ∼= let x = (lets l e1) in (lets l e′ 2 ) (push in lets) ∼= let x = (lets l e′ 1 ) in (lets l e′ 2 ) (ih for e1) ∼= lets l (let x = e ′ 1 in e ′ 2 ) (pull out lets) Above, we can push and pull lets through let because the precondition pre enforces sufciently distinct variable names.

Note that this rule records the unmodifed expression e<sup>1</sup> in the stack of known defnitions l. It could instead use the ⇝-transformed expression e ′ 1 . The proof strategy with this modifcation is essentially unchanged, except we must reverse our applications of the inductive hypotheses.

Congruences. We must be able to apply ⇝ within subexpressions. Therefore, we have several congruence rules, such as the following:

$$\begin{array}{llll} l \Vdash e\_1 \rightsquigarrow e'\_1 & l \Vdash e\_2 \rightsquigarrow e'\_2\\ \hline l \Vdash (e\_1 \cdot e\_2) \rightsquigarrow (e'\_1 \cdot e'\_2) & \text{APP-CONG} \\\\ \cline{4-4} \Vdash e\_i \rightsquigarrow e'\_i & l \Vdash e \rightsquigarrow (\lambda x. \; e')\\ \cline{4-4} \Vdash (\texttt{letters} \; \overline{x\_n = e\_n} \text{ in } e) \rightsquigarrow (\texttt{letters} \; \overline{x\_n = e'\_n} \text{ in } e') \end{array} \text{LAM-CONG}$$

Each such case in Theorem 1 requires showing that we can push/pull lets into/out of subexpressions. Once again, the precondition pre permits this by enforcing sufciently distinct variable names. The remainder of the proof follows from congruence of ∼=.

Simplifcation. The following rule allows ⇝ to carry out any transformation that preserves ∼=:

$$\frac{l \Vdash e\_1 \rightsquigarrow e\_2 \rightsquigarrow \quad e\_2 \cong \ e'\_2}{l \Vdash e\_1 \rightsquigarrow e'\_2} \text{  $\small SiMP$ }$$

The simp case in Theorem 1 is a direct consequence of the transitivity of ∼=.

This rule permits the inliner to modify (and in particular, simplify) generated expressions during its operation. There are two important uses of this ability:

– Turning fully applied λ-abstractions into a stack of let-bindings. This allows recursive applications of inlining (see rule trans below).

$$\begin{aligned} \left( \lambda x\_1. \; \lambda x\_2. \; \dots \; \lambda x\_n. \; e \right) \cdot e\_1 \cdot e\_2 \cdot \dots \cdot e\_n &\cong \\ \mathbf{lets} \; \left( x\_1 \leftarrow e\_1 \; \dots \; x\_2 \leftarrow e\_2 \; \dots \; \dots \; x\_n \leftarrow e\_n \right) \; e \end{aligned} \tag{1}$$

– Freshening names of bound variables (i.e., α-renaming). This happens directly before application of the rule trans below.

Transitivity. To permit recursion into recently inlined expressions, ⇝ has a transitivity rule:

$$\frac{l \Vdash e\_1 \leadsto e\_2 \qquad l \Vdash e\_2 \leadsto e\_3 \qquad \text{pre } l \ e\_2}{l \Vdash e\_1 \leadsto e\_3} \text{ TRANS}$$

In particular, e<sup>1</sup> can be transformed to e<sup>3</sup> if there is some intervening e<sup>2</sup> which can act as a stepping stone.

Unusually, we require precondition pre to hold of intermediate expression e2. This is demanded by the proof of Theorem 1, in which we can only instantiate inductive hypotheses if we frst establish pre. Unfortunately, l ⊩ e<sup>1</sup> ⇝ e<sup>2</sup> and pre l e<sup>1</sup> are not enough to derive pre l e2. Fortunately, we can freshen bound variable names (i.e., α-rename) sufciently to establish pre, and justify this freshening using rule simp above.

Specialisation. The ⇝ relation must be able to support loop specialisation, as described for the map function in § 2. Therefore, it has a rule spec which permits conversion of a letrec into a let, as long as there is a proof that the conversion preserves ∼=.

$$\begin{array}{c} l \Vdash e\_1 \leadsto e\_1' \qquad (\forall e. \mathtt{letrec} \, x = e\_1 \, \mathtt{in} \, e \cong \mathtt{let} \, x = e\_2 \, \mathtt{in} \, e) \\ l \Vdash (x \leftarrow e\_2) \Vdash e\_3 \leadsto e\_3' \qquad \text{disjoint } \mathtt{names} \, e\_2 \, e\_3 \qquad x \notin \mathtt{free} \, \mathtt{s} \, e\_2 \\\hline l \Vdash \mathtt{letrec} \, x = e\_1 \, \mathtt{in} \, e\_3 \leadsto \mathtt{letrec} \, x = e\_1' \, \mathtt{in} \, e\_3' \end{array} \text{spec} \mathbf{succ} \, e\_2$$

That is, if we can ∼=-convert some letrec x = e<sup>1</sup> to some let x = e2, then we can append x ← e<sup>2</sup> to the stack of known defnitions when processing letrec body e3. Again, we require restrictions on variable naming: the variables bound in e<sup>2</sup> and e<sup>3</sup> must be disjoint, and the bound variable x must not appear free in e2.

Proof outline. spec case of Theorem 1.


⊓⊔

# 5 Specialisation of Recursive Bindings

Our example in § 2 showed that our inliner can specialise applications of recursive functions such as map to known arguments such as add1. This is possible whenever constant arguments such as f can be pulled out of the recursion. That is, whenever we can transform recursive functions like map (left) into equivalent Verified Inlining and Specialisation for PureCake 287

code which makes the constant argument explicit using map' (right):

```
let map f l =
  case l of
    [] -> []
    (x:xs) -> f x : map f xs
                                let map f = let map' l =
                                              case l of
                                                [] -> []
                                                (x:xs) -> f x : map' xs
                                            in map'
```
In this section, we describe how we prove correctness of such transformations. Critically, our proofs can be used in the spec rule of ⇝ from the previous section.

#### 5.1 Understanding Specialisation

Like ⇝, our specialisation transformation is justifed using equational reasoning. We illustrate the equational steps below, again noting that the implementation is much more direct. We use the map example of § 1, eliding parts not relevant to specialisation. The input is therefore as follows:

let map f l = ... f x ... map f xs ...

We frst make a local copy of the recursive defnition map, named map':

let map = let map' f l = ... f x ... map' f xs ... in map'

We then η-expand the fnal usage of the copy map':

let map = let map' f l = ... f x ... map' f xs ... in \f l -> map' f l

Next, we pull out the new λ-abstractions to the top-level:

let map f l = let map' f l = ... f x ... map' f xs ... in map' f l

We then α-rename the constant argument in the copy (here, f becomes g):

let map f l = let map' g l = ... g x ... map' g xs ... in map' f l

The frst major step (transform 1 ) replaces the constant argument g with the known value to which the function map' is always applied, f:

let map f l = let map' g l = ... f x ... map' f xs ... in map' f l

The second major step (transform 2 ) deletes the now unused argument g. It removes the argument from both the defnition of map' and all calls to map':

288 H. Kanabar et al.

let map f l = let map' l = ... f x ... map' xs ... in map' l

We push back in some of the top-level λ-abstractions, in this case just l:

let map f = let map' l = ... f x ... map' xs ... in \l -> map' l

Finally, η-contraction removes the λ-abstraction over l:

let map f = let map' l = ... f x ... map' xs ... in map'

Most of the steps are straightforwardly justifed in PureLang's equational theory. However, the steps marked transform 1 and transform 2 are more involved. We discuss these below.

#### 5.2 Key Lemmas for Specialisation

Both transform 1 and transform 2 require a substitution-like traversal of the entire subexpression under consideration. It is not clear how to justify these traversals using simple equational reasoning in PureLang's theory. Therefore, we resort to more cumbersome simulation proofs to establish ∼= by appealing to its defnition in terms of PureLang's operational semantics.

For transform 1, we prove a theorem of the following form. Here call\_with\_arg holds only if every application of f in e is applied to var y after n arguments, and the names f and y are never rebound within e.

$$\begin{array}{l} \vdash \mathsf{call\\_with\\_arg } f \; \overline{x\_n} \; y \; e \; \wedge \; ...\\ \Rightarrow \; \mathsf{let\underline{rec}} \; f = (\lambda \; \overline{x\_n} \; \lambda y. \; e) \; \mathsf{in} \; ((\mathsf{var} \, f) \cdot \overline{e\_1} \; \cdot (\mathsf{var} \, w) \cdot \overline{e\_2} \, \mathsf{m})\\ \cong \; \mathsf{let\underline{rec}} \; f = (\lambda \; \overline{x\_n} \; \lambda y. \; e \left[ \mathsf{var} \, w/y \right]) \; \mathsf{in} \; ((\mathsf{var} \, f) \cdot \overline{e\_1} \; \overline{e\_1} \; \cdot (\mathsf{var} \, w) \cdot \overline{e\_2} \, \mathsf{m}) \end{array}$$

Though the variable w is free in the theorem above, it is a closed constant expression in most parts of the proof, which simplifes the derivation of this theorem. This is because ∼= is defned over open terms in terms of closing substitution and a relation over closed terms. The proof of this theorem is a large simulation based on the semantics of PureLang.

For transform 2, we prove a theorem with a similar shape. This time, remove\_ call\_arg is an inductive relation that ensures y never appears in e<sup>1</sup> and relates e<sup>1</sup> to a second expression e<sup>2</sup> in which the relevant argument has been removed from each application of f.

$$\begin{array}{l} \vdash \texttt{remove\\_call\\_arg } f \; \overline{x\_n} \; y \; \overline{z\_m} \; e\_1 \; e\_2 \; \land \dots \\ \Rightarrow \texttt{tree} \; f = (\lambda \; \overline{x\_n} \; \lambda y. \; \lambda \; \overline{z\_m} \; . \; e\_1) \; \text{in} \; ((\texttt{var} \; f) \cdot \overline{e\_{3 \cdot n}} \cdot (\texttt{var} \; y) \cdot \overline{e\_{4 \cdot m}}) \\ \cong \; \texttt{letrec} \; f = (\lambda \; \overline{x\_n} \; \lambda \; \overline{z\_m} \; . \; e\_2) \; \text{in} \; ((\texttt{var} \; f) \cdot \overline{e\_{3 \cdot n}} \cdot \overline{e\_{4 \cdot m}}) \end{array}$$

We prove this theorem by a large simulation too. The simulation strategy is necessary because letrec causes (potentially non-terminating) recursion.

# 6 Implementing a Correct Inliner

In this section, we describe the implementation of our inliner and the proof that its action lies within the ⇝ relation described in § 4. We also touch on three other transformations mentioned previously: specialisation, freshening of bound variables, and dead code elimination. Our inliner relies on all three.

# 6.1 Preliminaries

We implement our inliner within a state monad with the following type:

α M def = name set → (α, name set)

Here, name set is a set of variable names; we will see its usage shortly. This monad has standard return/bind operators, and we will use Haskell-style do-notation to show defnitions written within the monad.

The inliner itself has the following signature:

```
inline : (h : heuristic) → (k : num) → (m : (name 7→ ce)) → ce → ce M
```
In other words, the inliner transforms compiler expressions to compiler expressions within the state monad, requiring several other inputs:


### 6.2 Inliner implementation

The inliner traverses compiler expressions top-down. During the traversal, it performs two key operations: rewriting a variable to a known defnition from memory, and adding a new defnition to memory.

Rewriting a variable. There are two kinds of expressions in which the inliner will attempt to rewrite a variable. The frst is a lone variable (of the form var x ), and the second is an application of a variable to some arguments (of the form (var x ) · . . .). The latter case is used to inline fully applied functions only.

290 H. Kanabar et al.

In the lone variable case, the inliner is defned as follows:

$$\mathsf{inline}\_h^k m \text{ (var } x) \stackrel{\text{def}}{=} \begin{cases} \mathsf{return} \ (\mathsf{var} \, x) & x \notin \mathsf{dom}\, \mathsf{in} \, m \vee k = 0 \vee k \\ & m(x) = \lambda y. \; \dots \\ \mathsf{inline}\_h^{k-1} m \text{ } ce & m(x) = ce \end{cases}$$

That is, on encountering a free variable x the inliner does one of the following:


In the application case, the inliner is defned as follows:

$$\begin{aligned} \text{minim}\_{h}^{k}m \ ( (\mathbf{var} \, x) \cdot ce\_{1} \cdot \dots \cdot ce\_{n}) & \stackrel{\text{def}}{=} \mathbf{do} \\ [ce'\_{1}, \dots, \dots, ce'\_{n}] & \leftarrow \mathbf{map} \, (\text{inline}^{k}\_{h} \, m) \ [ce\_{1}, \dots, \dots, ce\_{n}]; \\ \text{if} \quad x \notin \text{domain} \, m \ \vee \ k &= 0 \text{ then return } ((\mathbf{var} \, x) \cdot ce'\_{1} \cdot \dots \cdot ce'\_{n}) \quad \text{else} \quad \text{do} \\ ce \leftarrow \text{fresh} \, m \ (m(x) \cdot ce'\_{1} \cdot \dots \cdot ce'\_{n}); \\ \text{case convert\\_to\\_let\, ce} & \text{of} \\ | \text{ None} \ \rightarrow \text{ return} \ ((\mathbf{var} \, x) \cdot ce'\_{1} \cdot \dots \cdot ce'\_{n}) \\ | \text{ Some\\_}{ce'} \ \rightarrow \text{ inline}^{k-1}\_{h} \, m \ \, ce' \end{aligned} \tag{2}$$

That is, on encountering a free variable x applied to n arguments the inliner does the following:


The conversion into let-bindings is critical: it allows the inliner to learn the defnitions of the applied arguments ce′ 1 , . . . , ce′ n for future inlining within the function body of m(x). Note that we only decrement the recursion limit when the size of the input expression may not have strictly decreased. This happens only when performing non-structural recursions, which only occur when we recurse into a defnition rewritten from memory.

Remembering a new defnition. The inliner can remember let- or letrecbound expressions.

In the let case, it is defned as follows:

$$\begin{aligned} \mathsf{inf}\mathit{in}\mathring{\mathsf{e}}\_{h}^{k}\ m\ (\mathsf{let}\ x = ce\_{1}\ \mathsf{in}\ \ ce\_{2}) & \stackrel{\scriptstyle \mathsf{d}d}{=} \mathsf{do} \\\ cc\_{1}^{\prime} &\leftarrow \mathsf{inf}\mathring{\mathsf{e}}\_{h}^{k}\ m\ \ ce\_{1}; \\\ \mathsf{let}\ m^{\prime} &= \mathsf{remember}\_{h}\ m\ (x \leftarrow ce\_{1}); \\\ cc\_{2}^{\prime} &\leftarrow \mathsf{inf}\mathring{\mathsf{e}}\_{h}^{k}\ m^{\prime}\ ce\_{2}; \\\ \mathsf{return}\ (\mathsf{let}\ x = ce\_{1}^{\prime}\ \mathsf{in}\ \ ce\_{2}^{\prime}) \end{aligned}$$

remember<sup>h</sup> m (x ← ce) def = if cheap ce ∧ h ce then m[x 7→ ce] else m

That is, the inliner recurses into ce<sup>1</sup> (without decrementing the recursion limit), before memorising the defnition x ← ce<sup>1</sup> and recursing into ce<sup>2</sup> with the augmented memory. The function remember records the defnition only when two conditions are satisfed: the defnition is cheap, and heuristic h returns true.

As the name suggests, cheap is a predicate that determines whether a defnition is cheap to compute, and so will not slow the program down or cause loss of value sharing when inlined. The defnition of cheap is as follows:

$$\mathsf{cheap\\_(var\ x)} \;= \; \mathsf{cheap\\_(}\lambda x.\; e) \;= \; \mathsf{cheap\\_(}op[] \;) \stackrel{\mathsf{aut}}{=} \mathsf{T} \qquad \mathsf{cheap\\_} \stackrel{\mathsf{st}}{=} \mathsf{F}$$

In the letrec case, the inliner must also perform specialisation. Its action is defned as follows:

```
inlinek
     h m (letrec x = ce1 in ce2)
                                      def = do
  ce′
    1 ← inlinek
               h m ce1;
  let m′ = remember_rech m (x ← ce1) ;
  ce′
    2 ← inlinek
               h m′
                     ce2;
  return (letrec x = ce′
                         1
                           in ce′
                                 2
                                  )
remember_rech m (x ← ce)
                               def =
  if ¬ can_specialise (x ← ce) ∨ ¬ h ce then m else
     let ([w
            a1
            1
               . . . wan
                     n
                        ], λ ym . ce′
                                     ) = extract_const_args (x ← ce)
     in [x 7→ specialise x [w
                               a1
                               1
                                  . . . wan
                                        n
                                           ] (λ ym . ce′
                                                        )]
```
This mirrors the let case almost exactly. The key diference is the use of remember\_rec instead of remember: this does not check cheap, but does attempt specialisation (and bails out if it fails). We examine specialisation in the upcoming subsection.

Heuristics. So far, we have only implemented one heuristic based on expression size: the inliner only remembers defnitions that are smaller than a userconfgurable bound. Our implementation can accept any heuristic function as an input, making it straightforward to support new kinds of heuristic.

Implementing specialisation. Above, specialise transforms a letrec-binding into a let-binding before adding it to memory. We rely on two helper functions: can\_specialise and extract\_const\_args.

The test can\_specialise simply checks if we are able to specialise a recursive body. The body must be a λ-abstraction with some constant arguments. Then, extract\_const\_args will extract these constant arguments. It accepts a defnition x ← ce, where we know ce is a λ-abstraction of the form λ x<sup>n</sup> . ce. It splits the formal parameters x<sup>n</sup> into x<sup>1</sup> . . . x<sup>m</sup> and xm+1 . . . xn, where m is the minimum number of arguments that x is invoked with recursively in body ce. It further annotates the x<sup>1</sup> . . . x<sup>m</sup> with annotations a<sup>1</sup> . . . am, which describe whether the arguments remain constant for each recursive call. In the implementation of inline above, this has produced the annotated variables w ai i and left the remainder of the λ-abstraction untouched (λ y<sup>m</sup> . ce′ ).

Then, specialise is defned as follows.

$$\begin{array}{l} \mathsf{specialise } f \ [w\_1^{a\_1} \ \ldots \ w\_n^{a\_n}] \ \, c e \ \stackrel{\text{def}}{=} \\ \mathsf{let} \ (\overline{x\_n}, ce') = \mathsf{specialise\\_each} \ x \ [w\_1^{a\_1} \ \ldots \ w\_n^{a\_n}] \ \, c e \ \mathsf{in} \\ \mathsf{let} \ (\overline{y\_i}, \overline{z\_j}) = \mathsf{drop\\_common\\_utilix} \ [w\_1^{a\_1} \ \ldots \ w\_n^{a\_n}] \ \overline{x\_n} \ \mathsf{in} \\ \lambda \ \overline{y\_i} \ . \ \mathsf{let} \ \mathsf{rec} \ f = (\lambda \ \overline{x\_n}, \, ce') \ \ \mathsf{in} \ (\mathsf{var} \ f) \cdot (\mathsf{var} \ z\_1) \ \ldots \ \cdot \ (\mathsf{var} \ z\_j) \end{array}$$

That is, it processes each annotated variable in turn, updating their call sites in body ce (i.e., performing transform 1 and transform 2 from § 5 simultaneously using specialise\_each), producing a new set of formal parameters xn. It determines which of these can be η-contracted (the fnal step in § 5) with a call to drop\_common\_sufx, and then returns the new letrec which accepts constant arguments y<sup>i</sup> at the top-level, and has η-contracted constant arguments z<sup>j</sup> applied directly already.

Freshening and Dead-Let Elimination. Our inliner assumes that its input expression has a variable naming convention which is sufcient to prevent it from accidentally capturing variables during operation. Therefore, we only give the inliner expressions which obey the Barendregt variable convention, which asserts unique bound variable names and disjoint bound/free names [3]. This is achieved by freshening (α-renaming) bound variables directly before inlining, and further freshening before recursing into subexpressions taken from the inliner's memory. For example, the inliner invokes freshen in eq. (2) (pg. 16) above. This is precisely why the inliner carries around a name set in its state monad: this set contains all variable names (whether bound or free) of the input expression. Freshening avoids names in this set when inventing fresh names, and returns an updated set each time it runs.

The output of the inliner also contains various unused let-bindings. We showed such bindings in the example of § 1 (namely, f and i). To remove such bindings, we run a dead-let elimination pass directly after the inliner.

Including these two auxiliary passes, the top-level defnition of the inliner is as follows:

$$\begin{aligned} \text{inlier}^k\_h \, ce &\stackrel{\text{def}}{=} \\ \text{let } (ce', \text{ } names) &= \text{fresh } ce \text{ (boundvars } ce) \text{ in} \\ \text{let } (ce\_i, \\_) &= \text{inline}^k\_h \, \mathcal{Q} \, ce' \text{ } names \text{ in} \\ \text{dead\\_let } \, ce\_i \end{aligned} \tag{3}$$

That is, the inliner freshens names, inlines defnitions top-down starting with an empty (∅) memory, then removes dead lets. Note that the top-level defnition expects to receive only closed expressions, which is why it only passes bound variables (boundvars) to freshen. This respects our invariant that the name set contains all bound and free variable names, as there are no free variables.

### 6.3 Inliner correctness

In this section, we prove that the inliner implementation is correct. In the context of PureCake's proof strategy as described in § 3:


We then compose these results to produce our fnal soundness theorem: the output expression of the inliner is equivalent to its corresponding input.

Theorem 3. inline satisfes ⇝.

⊢ inline<sup>k</sup> <sup>h</sup> m ce ns = (ce′ , ns′ ) ∧ memory\_relns l m ∧ barendregt (desugar ce) ∧ boundvars ce # domain m ∧ freevars ce ∪ boundvars ce ⊆ ns ∧ wf ce ⇒ l ⊩ (desugar ce) ⇝ (desugar ce′ )

That is, after desugaring compiler expressions into semantic expressions (desugar, see § 3), the action of the inliner for input ce, memory m, and name set ns lies within ⇝ for some stacked lets l when the following hold:


Proof outline. Induction over the implementation function inline. For each case of the proof, we apply rules of ⇝ to justify each atomic inlining operation. ⊓⊔ Theorem 4. Top-level correctness of inliner.

```
⊢ wf ce ∧ closed ce ⇒ (desugar ce) ∼=

                                           desugar (inlinerk
                                                          h
                                                            ce)
```
Proof outline. Composition of Theorem 3 above with Theorem 2 (pg. 9), the soundness theorem for ⇝. Unfolding the defnition of inliner, we use the soundness theorem of freshen, the closed assumption, and the application of inline to empty memory ∅ to discharge the preconditions on Theorem 3. ⊓⊔

# 7 Integration into the PureCake Compiler

We insert the inliner and its associated cleanup of dead let-bindings as Pure-Lang-to-PureLang transformations early in the PureCake compiler. In particular, directly after parsing and binding group analysis, as shown in Figure 2. Elimination of dead lets happens directly afterwards.

Unusually, the inliner runs before type inference. Ideally, it would take place afterwards: it changes program structure signifcantly, and type inference should execute on code resembling user input to allow direct error-reporting. The reasoning behind this design choice is PureCake's demand analysis, which facilitates strictness optimisations by annotating variables that can be evaluated eagerly. We found that running the inliner before demand analysis produces signifcantly better performance (§ 8, Figure 4). However, the soundness proof for demand analysis requires it to receive only well-typed input code. To run the inliner after type inference and before demand analysis, we would have to prove that it preserves well-typing, which is a signifcant undertaking due to PureLang's untyped AST. Future iterations of PureLang's AST are intended to be typed; therefore, we could consider proving type preservation in future work.

To update PureCake's compiler correctness theorem after integrating our inliner, we must establish that the inliner preserves both semantics and various syntactic invariants. We have already presented our proof of semantics preservation in § 6. The latter syntactic invariants guarantee that compiler expressions are closed and satisfy well-formedness properties which are checked as part of parsing. For example, PureLang forbids degenerate function applications to zero arguments: this can be expressed in the AST for PureLang compiler expressions but is ill-formed. Establishing preservation of the invariants is mostly mechanical, but quite tedious and long-winded.

# 8 Benchmarks

In this section we measure the efcacy of our inliner. In particular, we benchmark code generated by PureCake to determine how much the addition of the inliner improves runtime and memory overhead.

Fig. 2. High-level structure of the PureCake compiler. The inliner and its associated clean up are PureLang-to-PureLang passes which take place immediately after binding group analysis and before type inference.

Fig. 3. Graphs showing the performance impact of our inliner: the base-2 logarithm of a ratio of measurements (execution time or heap allocations) with/without the inliner enabled: log<sup>2</sup> (<sup>m</sup>disabled/<sup>m</sup>enabled). Error bars are too small to be visible.

Fig. 4. Graphs showing the performance impact of our inliner when executed after PureCake's demand analysis. Performance is clearly worse compared to Figure 3; therefore we do not pursue this approach.

Methodology. We evaluate the performance of several benchmark programs with and without the inliner enabled, using an Intel® Xeon® E-2186G and 64 GB RAM. We consider the same programs as presented by the PureCake developers in prior work [10, §7.1]. We also add a new suc\_list program, which repeatedly applies the suc\_list function shown in § 1 to a list of natural numbers. Like the PureCake developers, we measure wall-clock runtime and total heap allocations as reported by the CakeML runtime. Our measurements are facilitated by existing benchmarking scripts found in the PureCake development.

Results. Figure 3 shows our results, plotted as two bar graphs: the left shows runtime speedup, the right shows allocation reduction. In many cases, our inliner signifcantly improves performance; in all cases it does not worsen performance. The value for each plot is obtained by taking the base-2 logarithm of a ratio: the measurement without the inliner enabled (i.e., the longer duration or greater allocation) divided by the measurement with the inliner enabled. Expressed as


Table 1. Line counts for each part of our development. Verified Inlining and Specialisation for PureCake 297

a percentage, the most signifcant improvements are: a ∼20% reduction in the runtime of life, a ∼15% reduction in the allocations of suc\_list.

Inliner placement. We noted in § 7 that our inliner should run before PureCake's demand analysis. Here, we justify that design choice. In particular, we benchmark a version of the PureCake compiler which runs our inliner directly after demand analysis. The results are shown in Figure 4. The improvements in runtime and memory overhead are reduced for several benchmarks, and in some cases runtime even worsens overall. Therefore, our inliner should run before demand analysis for maximum beneft.

Code size and compile times. Simple measurements of code size show that our inliner can produce signifcantly larger CakeML programs (∼50% increase); however CakeML's efcient handling of inserted lets reduces the efect for binaries (< 15% overall increase). Compile times are unafected: these remain dominated by PureCake's type-checking and CakeML's register allocation.

Line counts. Our work adds to PureCake signifcantly. Table 1 shows line counts for each part of our development, measured using wc -l.

# 9 Related Work

Verifed inlining in functional languages. CakeML [12] compiles a subset of Standard ML (strict, impure) to several mainstream architectures with end-to-end guarantees. It performs function inlining in its second intermediate language, ClosLang, which has frst-class closures. A fow analysis discovers invocations of known functions, and simultaneously inlines closed functions which themselves do not contain closures. Use of de Bruijn indices sidesteps reasoning about shadowing and freshening. As in our work, recursive applications of inlining improve the performance of higher-order functions; we go one step further with specialisation and the inlining of open terms which can contain λ-abstractions.

CertiCoq [2] verifably compiles Gallina (the metalanguage of Coq) to C light, an intermediate language early in CompCert's pipeline. One of its passes [4] performs several shrink reductions simultaneously: transformations that only reduce code size. One such reduction is the inlining of functions which are applied exactly once; in this case, inlining is β-reduction, contrary to our discussion in § 4.1. Restriction to shrink reductions further removes the need for a recursion limit as code size strictly decreases on each recursive call. Their verifcation relies on a more general rewrite system which permits inlining of functions which are used multiple times. A separate pass [16] further inlines small non-recursive functions which can be applied multiple times; here a key concern is maintenance of A-normal form expressions. In all proofs, the Barendregt variable convention (i.e., barendregt) is used to avoid name clashes.

Pilsner [15] compiles a strict impure language to an idealised assembly, inlining select top-level functions in its intermediate representation. Recursive functions can be unrolled in this way, but not specialised. Again, the Barendregt variable convention is enforced. The focus here is on the novel proof technique of parametric inter-language simulations (PILS) to enable compositional compiler correctness, where PureCake focuses on mechanised whole-program compiler correctness for a realistic language.

Other verifed inlining passes. CompCert [13] compiles a subset of C99, performing function inlining in its register transfer language (RTL). This control fow graph (CFG) representation difers considerably from the functional PureLang; inlining considers only top-level function declarations in the RTL setting. Rather than using a recursion limit, CompCert guarantees termination by forbidding inlining of functions within their own bodies.

CompCert also performs lazy code motion [19] within RTL. A special case of this transformation is loop-invariant code motion, which loosely resembles our specialisation: both are concerned with moving constant expressions out of loops, but in our functional setting loops are expressed as recursive functions. Their verifcation uses translation validation [18]: an unverifed tool transforms code, and then per-run automation proves that semantics has been preserved.

The Plutus Tx language from the Cardano blockchain platform resembles a subset of Haskell, and is compiled to a custom language known as Plutus Core. The compiler is implemented as a GHC plugin: GHC machinery frst lowers Plutus Tx to a System F-like language, which is then optimised and compiled further. The compiler is verifed using translation certifcation [11], which aims to make translation validation approaches less brittle by combining automated and manual proof. As in PureCake, syntactic relations are used to encapsulate semantics-preserving transformations: automated proof shows that unverifed code transformations inhabit the relations, and manual proof shows that the relations preserve semantics. Translation certifcation is robust to evolving compiler implementations because the syntactic proofs are more amenable to automated verifcation than the semantic ones. A syntactic relation akin to § 4 justifes inlining; however, semantic verifcation is ongoing work at the time of writing. The Barendregt variable convention is enforced in this work too.

Verifed optimisation of realistic Haskell-like languages. The CoreSpec project<sup>4</sup> tackles verifed variants of Haskell as implemented by GHC. For example, GHC's dependent types extensions were proposed using formal specifcations of the syntax, semantics, and typing rules of GHC's Core language [20]. The unverifed tool hs-to-coq [6] translates Haskell code to Gallina (Coq's metalanguage), leveraging Coq's logic to enable equational reasoning about real-world programs. A future aim of the project is to derive Coq models of Core automatically from GHC's implementation, prove correctness of optimisations within Coq, and integrate the resulting verifed code back into GHC as a plugin. Where CoreSpec focuses on accurate modelling of GHC with the loss of some trust, PureCake instead sacrifces faithfulness for end-to-end guarantees.

GHC's arity analysis pass [5] η-expands functions to avoid excessive thunk allocations. Its mechanised proof of correctness for a simplifed Core language relies on an explicitly call-by-need semantics to show performance preservation, i.e., that η-expansion does not reduce value-sharing.

# 10 Summary and Future Work

This paper has described our work on a verifed inlining and loop specialisation pass for PureLang, a lazy functional programming language. First, we verifed a syntactic relation which defnes an envelope of permitted inlining transformations, independent of heuristic choices. We used a novel phrasing of inlining as the pushing in and pulling out of let-bindings to prove the relation sound using PureLang's equational theory. Our inliner implementation is then proven to remain within this envelope. We have integrated our work into the Pure-Cake compiler, an end-to-end verifed compiler, and demonstrated signifcant performance improvements. To the best of our knowledge, ours is the frst verifed function inliner for a lazy functional programming language, and the frst verifed loop specialiser for any functional language.

In future work, we intend to support loop unrolling and develop better heuristics that decide when to do inlining. Loop unrolling will probably involve augmenting the defnition of lets so that it can hold both let expressions and letrecs. Developing good heuristics will require some careful experimentation with the compiler implementation. We do not expect adjustment to the inliner's heuristics to impact our correctness proofs in any signifcant way, since the proofs are designed to be independent of heuristic choices.

Acknowledgements. Hrutvik Kanabar was supported by the UK Research Institute in Verifed Trustworthy Software Systems (VeTSS). Magnus Myreen was supported by Swedish Research Council grant 2021-05165.

Data availability statement. An artifact supporting the results presented in this paper is openly available on Zenodo [9]. The latest development version of PureCake is available on GitHub (https://github.com/cakeml/pure).

<sup>4</sup> https://deepspec.org/entry/Project/Haskell+CoreSpec

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Suspension Analysis and Selective Continuation-Passing Style for Universal Probabilistic Programming Languages

Daniel Lundén<sup>1</sup> () , Lars Hummelgren<sup>2</sup> , Jan Kudlicka<sup>3</sup> , Oscar Eriksson<sup>2</sup> , and David Broman2,<sup>4</sup>

<sup>1</sup> Oracle, Stockholm, Sweden, daniel.lunden@oracle.com <sup>2</sup> EECS and Digital Futures, KTH Royal Institute of Technology, Stockholm, Sweden, {larshum,oerikss,dbro}@kth.se <sup>3</sup> Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway, jan.kudlicka@bi.no <sup>4</sup> Computer Science Department, Stanford University, California, USA broman@stanford.edu

Abstract. Universal probabilistic programming languages (PPLs) make it relatively easy to encode and automatically solve statistical inference problems. To solve inference problems, PPL implementations often apply Monte Carlo inference algorithms that rely on execution suspension. State-of-the-art solutions enable execution suspension either through (i) continuation-passing style (CPS) transformations or (ii) efcient, but comparatively complex, low-level solutions that are often not available in high-level languages. CPS transformations introduce overhead due to unnecessary closure allocations—a problem the PPL community has generally overlooked. To reduce overhead, we develop a new efcient selective CPS approach for PPLs. Specifcally, we design a novel static suspension analysis technique that determines parts of programs that require suspension, given a particular inference algorithm. The analysis allows selectively CPS transforming the program only where necessary. We formally prove the correctness of the analysis and implement the analysis and transformation in the Miking CorePPL compiler. We evaluate the implementation for a large number of Monte Carlo inference algorithms on real-world models from phylogenetics, epidemiology, and topic modeling. The evaluation results demonstrate signifcant improvements across all models and inference algorithms.

Keywords: Probabilistic programming · Static analysis · Continuationpassing style.

# 1 Introduction

Probabilistic programming languages (PPLs), such as Anglican [50], Birch [36], WebPPL [18], Stan [10], Pyro [6], and Gen [11], make it possible to encode and solve statistical inference problems. Such inference problems are of signifcant interest in many research felds, including phylogenetics [43], computer vision [25], topic modeling [7], inverse graphics [20], and cognitive science [19]. A particularly appealing feature of PPLs is the separation between the inference problem specifcation (the language) and the inference algorithm used to solve the problem (the language implementation). This separation allows PPL users to focus solely on encoding their inference problems while inference algorithm experts deal with the intricacies of inference implementation.

Implementations of PPLs apply many diferent inference algorithms. Monte Carlo inference algorithms—such as Markov chain Monte Carlo (MCMC) [16] and sequential Monte Carlo (SMC) [12]—are popular due to their asymptotic correctness and relative ease of implementation for universal <sup>5</sup> PPLs. The central idea behind all Monte Carlo methods in PPLs is to execute probabilistic programs multiple times to generate samples that approximate the target distribution for the encoded inference problem. However, repeated execution is expensive, and PPL implementations must avoid unnecessary overhead.

Monte Carlo algorithms often need to suspend executions. For example, MCMC algorithms can suspend at random draws in the program to avoid unnecessary re-execution when proposing new executions, and SMC algorithms can suspend at likelihood updates to resample executions. Languages such as WebPPL [18] and Anglican [50], and the approach described by Ritchie et al. [41], apply continuation-passing style (CPS) transformations [3] to enable arbitrary suspension during execution. The main beneft of CPS transformations is that they are relatively easy to implement in functional programming languages. However, one disadvantage with CPS transformations is that highperformance low-level languages, without higher-order functions, do not support them. For this reason, there are also more direct low-level alternatives to CPS, including non-preemptive multitasking (e.g., coroutines [15]) and PPL controlfow graphs [30]. These more direct alternatives can additionally avoid much of the overhead resulting from CPS<sup>6</sup> , but are more complex to implement.

We consider how to bridge the performance gap between CPS-based PPLs and lower-level PPLs that rely on, e.g., direct implementation of coroutines. We consider optimizations at the CPS transformation level, and not the translation from CPS-based PPLs to lower-level representations. CPS overhead is a result of closure allocations for continuations. We make the important observation that PPLs do not require the arbitrary suspensions provided by full CPS transformations. Most Monte Carlo inference algorithms require suspension only in very specifc parts of programs. Current state-of-the-art CPS-based PPLs do not consider inference-specifc suspension requirements to reduce CPS overhead.

We design a new static suspension analysis and a new selective CPS transformation for PPLs that together signifcantly reduce runtime overhead com-

<sup>5</sup> A term that frst appeared in Goodman et al. [17], indicating expressive PPLs where the number and types of random variables are not always known statically.

<sup>6</sup> Note that CPS only results in overhead if programs reify the continuations at runtime to, e.g., suspend computations. Traditional CPS-based compilers often only use CPS as an intermediate form during compilation, which does not result in runtime overhead.

pared to a traditional full CPS transformation. Current state-of-the-art functional PPLs that use CPS for execution suspension can therefore greatly beneft from our new approach. The suspension analysis identifes all parts of programs that may require suspension as a result of applying a particular inference algorithm. We formalize the suspension analysis algorithm using a core PPL calculus equipped with a big-step operational semantics. Specifcally, the challenge lies in capturing how suspension requirements propagate through the program in the presence of higher-order functions. Furthermore, we formalize the selective CPS transformation and justify its correctness when guided by the suspension analysis. Prior work on selective CPS for general-purpose programming languages, e.g., by Nielsen [38] and Asai and Uehara [4], focuses on analyses based on type systems and type inference. In contrast, we instead build our suspension analysis using 0-CFA [46] and it operates directly on an untyped calculus.

Overall, we (i) prove that the suspension analysis is correct, (ii) show that the resulting selective CPS transformation gives signifcant performance gains compared to using a full CPS transformation, and (iii) show that the overall approach is directly applicable to a large set of inference algorithms. Specifcally, we evaluate the approach for the following inference algorithms: likelihood weighting, the SMC bootstrap particle flter, the SMC alive particle flter [24], aligned lightweight MCMC [29,49], and particle-independent Metropolis–Hastings [40]. We consider each inference algorithm for four real-world models from phylogenetics, epidemiology, and topic modeling.

We implement the suspension analysis and selective CPS transformation in Miking CorePPL [30,9]. Similarly to WebPPL and Anglican, the implementation supports the co-existence of many inference problems and applications of inference algorithms to these problems within the same program. However, compared to full CPS, such programs are more challenging to handle with selective CPS, as the CPS transformation of an inference problem also depends on the applied inference algorithm—diferent inference algorithms generally require diferent suspensions. To complicate things further, diferent inference problems may share some code, or the PPL user may apply two diferent inference algorithms to the same inference problem. The compiler must then apply diferent CPS transformations to diferent parts of the program, and sometimes even many diferent CPS transformations to separate copies of the same part of the program. To solve this, we develop an approach that, for any given Miking CorePPL program, extracts all possible inference problems and corresponding inference algorithm applications. This extraction procedure allows the correct application of selective CPS throughout the program.

In summary, we make the following contributions.


the suspension analysis signifcantly reduces runtime overhead resulting from unnecessary closure allocations (Section 5).

– We implement the suspension analysis and selective CPS transformation in the Miking CorePPL compiler. Unlike full CPS, selective CPS introduces challenges for probabilistic programs containing many inference problems and inference algorithm applications. We implement an approach that correctly applies selective CPS to such programs by extracting individual inference problems (Section 6).

Section 7 presents the evaluation and its results for the implementations in Miking CorePPL, Section 8 discusses related work in more detail, and Section 9 concludes. We frst consider a motivating example in Section 2 and introduce the underlying PPL calculus in Section 3.

An extended version of the paper is available at arXiv [31]. We use the † symbol in the text to indicate that more information (e.g., proofs) is available in the extended version.

# 2 A Motivating Example

This section introduces the running example in Fig. 1 and uses it to present the basic idea behind PPLs and how inference algorithms such as SMC and MCMC make use of CPS to suspend executions. Most importantly, we illustrate the motivation and key ideas behind selective CPS for PPLs.

Consider the probabilistic program in Fig. 1a, written in a functional-style PPL. The program encodes an inference problem for estimating the probability distribution over the bias of a coin, conditioned on the outcome of four experimental coin fips: true, true, false, and true (true = heads and false = tails). At line 1, we use the PPL-specifc assume construct to defne our prior belief in the bias a<sup>1</sup> of the coin. We set this prior belief to a Beta(2, 2) probability distribution, illustrated in Fig. 1b. In the illustration, 0 indicates a coin that always results in false, 1 a coin that always results in true, and 0.5 a fair coin. We see that our prior belief is quite evenly spread out, but with more probability mass towards a fair coin. To condition this prior distribution on the observed coin fips, we conceptually execute the program in Fig 1a infnitely many times, sampling values from the prior Beta distribution at assume (line 1) and, as a side efect, accumulating the product of weights given as argument to the PPLspecifc weight construct (line 4). We make the four consecutive calls weight (fBernoulli a<sup>1</sup> true), weight (fBernoulli a<sup>1</sup> true), weight (fBernoulli a<sup>1</sup> false), and weight (fBernoulli a<sup>1</sup> true) 7 , using the recursive function iter . The function application fBernoulli a<sup>1</sup> o gives the probability of the outcome o given a bias a<sup>1</sup> for the coin. I.e., fBernoulli a<sup>1</sup> true = a<sup>1</sup> and fBernoulli a<sup>1</sup> false = 1 − a1. So, for example, a sample a<sup>1</sup> = 0.4 gets the accumulated weight 0.4 · 0.4 · 0.6 · 0.4

<sup>7</sup> PPLs also commonly use a similar built-in function observe to update the weight. For example, observe (Bernoulli a1) true is equivalent to weight (fBernoulli a<sup>1</sup> true).

```
1 let a1 = assume (Beta 2 2) in
2 let rec iter = λobs.
3 if null obs then () else
4 weight (fBernoulli a1 (head obs));
5 iter (tail obs)
6 in
7 iter [true,true,false,true];
8 a1
```
#### (a) Program texample.

```
1 Suspensionassume(Beta 2 2, λa1.
2 let rec iter = λobs.
3 if null obs then () else
4 weight (fBernoulli(a1)
5 (head obs));
6 iter (tail obs)
7 in
8 iter [true,true,false,true];
9 a1)
   (d) Suspension at assume.
1 let a1 = assume (Beta 2 2) in
2 let rec iter = λk. λobs.
3 if null obs then k ()
4 else
5 Suspensionweight(
6 fBernoulli(a1) (head obs),
7 (λ_. iter k (tail obs)))
8 in
```

```
0 0.5 1
             (b) Beta(2,2).
          0 0.5 1
       (c) Distribution of texample.
1 let k7 = λt6.
2 let k8 = λt7.
3 Suspensionassume(t7, λa1.
4 let rec iter = λk1. λobs.
5 let k2 = λt1.
6 if t1 then k1 () else
7 let k3 = λt2.
8 let k4 = λt3.
9 let k5 = λt4.
10 Suspensionweight(t4, λ_.
11 let k6 = λt5.iter k1 t5 in
12 tailCPS k6 obs)
13 in t2 k5 t3
14 in headCPS k4 obs
15 in fBernoulliCPS k3 a1
16 in nullCPS k2 obs
17 in iter (λ_. a1)
18 [true,true,false,true])
19 in t6 k8 2
20 in BetaCPS k7 2
```

```
(e) Suspension at weight.
```
 iter (λ\_. a1) [true,true,false,true];

(f) Full CPS.

Fig. 1: A probabilistic program texample modeling the bias of a coin. Fig. (a) gives the program. The function fBernoulli is the probability mass function of the Bernoulli distribution. Fig. (b) illustrates the distribution for a<sup>1</sup> at line 1 in (a). Fig. (c) shows the set of (weighted) samples resulting from conceptually running texample infnitely many times. Fig. (d) and Fig. (e) show the selective CPS transformations required for suspension at assume and weight, respectively. Fig. (f) gives texample in full CPS, with suspensions at assume and weight. The CPS subscript indicates CPS-versions of intrinsic functions such as head and tail. and a<sup>1</sup> = 0.7 the accumulated weight 0.7·0.7·0.3·0.7. The end result is an infnite set of weighted samples of a<sup>1</sup> (the program returns a<sup>1</sup> at line 8) that approximate the posterior or target distribution of Fig. 1a, illustrated in Fig 1c. Note that, because we observed three true outcomes and only one false, the weights shift the probability mass towards 1 and narrows it slightly as we are now more sure about the bias of the coin. Increasing the number of experimental coin fips would make Fig. 1c more and more narrow.

We can approximate the infnite number of samples by running the program a large (but fnite) number of times. This basic inference algorithm is known as likelihood weighting. The problem with likelihood weighting is that it is only accurate enough for simple models. For complex models, it is common that only a few likelihood weighting samples (often only one) get much larger weights relative to the other samples, greatly reducing inference accuracy. Real-world models require more powerful inference algorithms based on, e.g., SMC or MCMC. A key requirement in both SMC and MCMC is the ability to suspend executions of probabilistic programs at calls to weight and/or assume. One way to enable suspensions is by writing programs in CPS. We frst illustrate a simple use of CPS to suspend at assume in Fig. 1d. Here, the program immediately returns an object Suspensionassume(Beta 2 2, k), indicating that execution stopped at an assume with the argument Beta 2 2 and a continuation k (i.e., the abstraction binding a1) that executes the remainder of the program. With likelihood weighting, we would simply sample a value a<sup>1</sup> from the Beta 2 2 distribution and resume execution by calling k a1. This call then runs the program until termination and results in the actual return value of the program, which is a1. Many MCMC inference algorithms reuse samples from previous executions at Suspensionassume, and the suspensions are thus useful to avoid unnecessary re-execution [41].

As a second example, we illustrate suspension at weight for, e.g., SMC inference in Fig. 1e. Here, we require suspensions in the middle of the recursive call to iter , and writing the program in CPS is more challenging. We rewrite the iter function to take a continuation k as argument, and call the continuation with the return value () at line 3 instead of directly returning () as in Fig. 1a at line 3. This continuation argument k is precisely what allows us to construct and return Suspensionweight objects at line 5. To illustrate the suspensions, consider executing the program with likelihood weighting. First, the program returns the object Suspensionweight(fBernoulli(a1) true, k′ ), where k ′ is the continuation that line 7 constructs. Likelihood weighting now updates the weight for the execution with the value fBernoulli(a1) true and resumes execution by calling k ′ (). Similarly, this next execution returns Suspensionweight(fBernoulli(a1) true, k′′) for the second recursive call to iter , and we again update the weight and resume by calling k ′′ (). We similarly encounter Suspensionweight(fBernoulli(a1) false, k′′′) and Suspensionweight(fBernoulli(a1) true, k′′′′) before the fnal call k ′′′′ () runs the program to termination and produces the actual return value a1. In SMC, we run many executions concurrently and wait until they all have returned a Suspensionweight object. At this point, we resample the executions according to their weights (the frst value in Suspensionweight), which discards executions with low weight and replicates executions with high weight. After resampling, we continue to the next suspension and resampling by calling the continuations.

PPL implementations enable suspensions at assume and/or weight through automatic and full CPS transformations. Fig. 1f illustrates such a transformation for Fig. 1a. We indicate CPS versions of intrinsic functions with the CPS subscript. Note that the full CPS transformation results in many additional closure allocations compared to Fig. 1d and Fig. 1e. As a result, runtime overhead increases signifcantly. The contribution in this paper is a static analysis that allows an automatic and selective CPS transformation of programs, as in Fig. 1d and Fig. 1e. With a selective transformation, we avoid many unnecessary closure allocations, and can signifcantly reduce runtime overhead while still allowing suspensions as required for a given inference algorithm.

# 3 Syntax and Semantics

This section introduces the PPL calculus used to formalize the suspension analysis in Section 4 and selective CPS transformation in Section 5. Section 3.1 gives the abstract syntax and Section 3.2 a big-step operational semantics. Section 3.3 introduces A-normal form—a prerequisite for both the suspension analysis and the selective CPS transformation.

# 3.1 Syntax

We build upon the standard untyped lambda calculus, representative of functional universal PPLs such as Anglican, WebPPL, and Miking CorePPL. We defne the abstract syntax below.

Defnition 1 (Terms, values, and environments). We defne terms t ∈ T and values v ∈ V as

t ::= x | c | λx. t | t t | let x = t in t v ::= c | ⟨λx. t, ρ⟩ | if t then t else t | assume t | weight t x, y ∈ X ρ ∈ P c ∈ C {false,true,()} ∪ R ∪ D ⊆ C. (1)

The countable set X contains variable names, C intrinsic values and operations, and D ⊂ C intrinsic probability distributions. The set P contains evaluation environments, i.e., maps from variables in X to values in V .

Defnition 2 (Target language terms). As a target language for the selective CPS transformation in Section 5, we additionally extend Defnition 1 to target language terms t ∈ T <sup>+</sup> by

$$\mathbf{t} \pm = \text{Suspension}\_{\mathbf{a}\mathbf{a}\mathbf{s}\mathbf{u}\mathbf{m}\mathbf{a}}(\mathbf{t}, \mathbf{t}) \mid \text{Suspension}\_{\mathbf{w}\mathbf{a}\mathbf{g}\mathbf{t}\mathbf{t}}(\mathbf{t}, \mathbf{t}). \tag{2}$$

Fig. 1a gives an example of a term in T, and Fig. 1d and Fig. 1e of terms in T <sup>+</sup>. However, note that the programs in Fig. 1 also use the list constructor [. . .] (not part of the above defnitions) to make the example more interesting.

In addition to the standard variable, abstraction, and application terms in the untyped lambda calculus, we include explicit let expressions for convenience. Furthermore, we use the syntactic sugar let rec f = λx.t<sup>1</sup> in t<sup>2</sup> to defne recursive functions (translating to an application of a call-by-value fxed-point combinator). We use t1; t<sup>2</sup> as a shorthand for (λ\_.t2) t1, where \_ means that we do not use the argument. That is, we evaluate t<sup>1</sup> for side efects only.

We include a set C of intrinsic operations and constants essential to inference problems encoded in PPLs. The set of intrinsics includes boolean truth values, the unit value, real numbers, and probability distributions. We can also add further operations and constants to C. For example, we can let + ∈ C to support addition of real numbers. To allow control fow to depend on intrinsic values, we include if expressions that use intrinsic booleans as condition.

We saw examples of the assume and weight constructs in Section 2. The assume construct takes distributions D ⊂ C as argument, and produces random variables distributed according to these distributions. For example, we can let N ∈ C be a function that constructs normal distributions. Then, assume (N 0 1), where N 0 1 ∈ D, defnes a random variable with a standard normal distribution. Partially constructed distributions, e.g., N 0, are also in C, but not in D (they are not yet proper distributions). As we saw in Section 2, the weight construct updates the likelihood with the real number given as argument, and allows conditioning on data (e.g., the four coin fips in Fig. 1).

#### 3.2 Semantics

We construct a call-by-value big-step operational semantics, based on Lundén et al. [29], describing how to evaluate terms t ∈ T. Such a semantics is a key component when formally defning the probability distributions corresponding to terms t ∈ T (e.g., the distribution in Fig. 1c corresponding to the program in Fig. 1a) and also when proving various properties of PPLs and their inference algorithms (e.g., inference correctness). See, e.g., the work by Borgström et al. [8] and Lundén et al. [28] for full formal treatments.

We use the semantics to formally defne suspension, and use this defnition to state the soundness of the suspension analysis in Section 4 (Theorem 1). We use a big-step semantics, as we do not require the additional control provided by a small-step semantics. For example, we do not concern ourselves with details of termination, as the soundness of the analysis relates only to terminating executions. Fig. 2 presents the full semantics as a relation ρ ⊢ t s⇓ w <sup>u</sup> v over tuples (P, T, S, {false,true}, R, V ). S is a set of traces capturing the random draws at assume during evaluation. Intuitively, ρ ⊢ t s⇓ w <sup>u</sup> v holds if t evaluates to v in the environment ρ with the trace s and the total probability density (i.e., the accumulated weight) w. We describe the suspension fag u later in this section.

Most of the rules are standard and we focus on explaining key properties related to PPLs and suspension. We frst consider the rule (Const-App), which uses the δ-function to evaluate intrinsic operations.

310 D. Lundén et al.

$$\begin{array}{lcl}\hline\hline\cline{2-4}\rho\vDash\mathbf{t}\_{1}\ \mathbf{t}\_{\mathrm{u}\_{1}}^{\operatorname{\bf u}\_{1}}\ \langle\lambda\mathbf{x},\mathbf{t},\rho^{\boldsymbol{\prime}}\rangle\ \rho\vDash\mathbf{t}\_{2}\ \mathbf{t}\_{\mathrm{u}\_{2}}^{\operatorname{\bf u}\_{2}}\ \mathbf{v}\_{2}\ \rho^{\boldsymbol{\prime}},x\rightarrow\mathbf{v}\_{2}\ \mathbf{t}^{\operatorname{\bf s}\_{3}}\ \mathbf{v}\_{\mathrm{u}\_{3}}^{\operatorname{\bf u}\_{3}}\\\hline\crve{\rm H}\mathbf{t}\_{1}\ \mathbf{t}\_{2}\ \mathbf{t}\_{1}^{\operatorname{\bf s}\_{2}}\ \mathbf{t}\_{\mathrm{u}\_{1}^{\operatorname{\bf s}\_{2}}\ \mathbf{v}\_{\mathrm{u}\_{2}^{\operatorname{\bf s}\_{3}}\ \mathbf{v}\_{\mathrm{u}\_{3}^{\operatorname{\bf s}\_{2}}}}\ \mathbf{v}\_{\mathrm{u}\_{1}^{\operatorname{\bf s}\_{2}}\ \mathbf{v}\_{\mathrm{u}\_{2}^{\operatorname{\bf s}\_{3}}}\\\hline\crve{\rm H}\mathbf{t}\_{1}\ \mathbf{t}\_{\mathrm{fuale}}^{\operatorname{\bf fu}\_{1}}\ \mathbf{v}\_{\mathrm{f}}(\mathbf{x})\ \mathbf{n}\ \mathbf{t}\_{\mathrm{f}}\ \frac{\ \rho\vdash\mathbf{t}\_{1}\ \mathbf{t}\_{\mathrm{u}\_{1}^{\operatorname{\bf s}\_{2}}\ \mathbf{v}\_{1}\ \mathbf{v}\_{\mathrm{u}\_{2}^{\operatorname{\bf s}\_{2}}}\ \mathbf{v}\_{\mathrm{f}}^{\operatorname{\bf s}\_{2}}\mathbf{v}\_{\mathrm{f}}^{\operatorname{\bf s}\_{2}}}{\ \rho\vdash\mathbf{t}\_{1}\ \mathbf{t}\_{2}\ \mathbf{t}\_{\mathrm{f}}^{\operatorname{\bf$$

Fig. 2: A big-step operational semantics for t ∈ T. We omit the rule (If-False) for brevity; it is analogous to (If-True). The environment ρ, x 7→ v denotes ρ extended with a binding v for x. For each d ∈ D, the function f<sup>d</sup> is its probability density or probability mass function. E.g., f<sup>N</sup>(0,1)(x) = e x <sup>2</sup>/2/ √ 2π, the density function of the standard normal distribution. We use the following notation: ∥ for sequence concatenation, · for multiplication, and ∨ for logical disjunction.

Defnition 3 (Intrinsic arities and the δ-function). For each c ∈ C, we let |c| ∈ N denote its arity. We also assume the existence of a partial function δ : C × C → C such that if δ(c, c1) = c2, then |c| > 0 and |c2| = |c| − 1.

For example, δ((δ(+, 1)), 2) = 3. We use the arity property of intrinsics to formally defne traces.

Defnition 4 (Traces). For all s ∈ S, s is a sequence of intrinsics with arity 0, called a trace. We write s = [c1, c2, . . . , cn] to denote a trace s with n elements.

The rule (Assume) formalizes random draws and consumes elements of the trace. Specifcally, (Assume) updates the evaluation's total probability density w ∈ R with the density w ′ of the frst trace element with respect to the distribution given as argument to assume. The rule (Weight) furthermore directly modifes the total probability density according to the weight argument.

We now consider the special suspension fag u in the derivation ρ ⊢ t s⇓ w <sup>u</sup> v.

Defnition 5 (Suspension requirement). A derivation ρ ⊢ t s⇓ w <sup>u</sup> v requires suspension if the suspension fag u is true.

For example, the rule (App) requires suspension if u<sup>1</sup> ∨ u<sup>2</sup> ∨ u3—i.e., if any subderivation requires suspension. To refect the particular suspension requirements in SMC and MCMC inference, we limit the source of suspension requirements to assume and weight. We turn the individual sources on and of through the

```
1 let t1 = 2 in
2 let t2 = 2 in
3 let t3 = Beta in
4 let t4 = t3 t1 in
5 let t5 = t4 t2 in
6 let a1 = assume t5 in
7 let rec iter = λobs.
8 let t6 = null in
9 let t7 = t6 obs in
10 let t8 =
11 if t7 then
12 let t9 = () in
13 t9
14 else
15 let t10 = fBernoulli in
16 let t11 = t10 a1 in
17 let t12 = head in
18 let t13 = t12 obs in
                                              19 let t14 = t11 t13 in
                                              20 let w1 = weight t14 in
                                              21 let t15 = tail in
                                              22 let t16 = t15 obs in
                                              23 let t17 = iter t16 in
                                              24 t17
                                              25 in
                                              26 t8
                                              27 in
                                              28 let t18 = true in
                                              29 let t19 = false in
                                              30 let t20 = true in
                                              31 let t21 = true in
                                              32 let t22 = [t21,t20,t19,t18] in
                                              33 let t23 = iter t22 in
                                              34 a1
```
Fig. 3: The running example texample from Fig. 1a transformed to ANF.

boolean variables suspend assume and suspend weight in Fig. 2. For the examples in the remainder of this paper, we let suspend weight = true and suspend assume = false (i.e., only weight requires suspension, as in SMC inference).

To illustrate the semantics, consider texample of Fig. 1a again. Because texample evaluates precisely one assume, the only valid traces for texample are singleton traces [a1], where a<sup>1</sup> ∈ R[0,1] due to the Beta prior for a1. By initially setting ρ to the empty environment ∅ and following the rules of Fig. 2, we derive ∅ ⊢ texample [a1] ⇓ fBeta(2,2)(a1)·a 3 1 (1−a1) true a1. Note that every evaluation of texample has u = true, as there are always four calls to weight during evaluation. That is, the derivation requires suspension. However, many subderivations of texample do not require suspension. For example, the subderivations assume (Beta 2 2) and null obs do not (i.e., have u = false). Section 4 presents a suspension analysis that conservatively approximates which subderivations require suspension. The analysis enables, e.g., the selective CPS transformation in Fig. 1e.

#### 3.3 A-Normal Form

We simplify the suspension analysis in Section 4 and the selective CPS transformation in Section 5 by requiring that terms are in A-normal form (ANF) [13].

Defnition 6 (A-normal form). We defne the A-normal form terms tANF ∈ TANF as follows.

$$\begin{array}{lclcl}\mathbf{t}\_{\text{ANF}} ::= \boldsymbol{x} & \mathbf{1}\mathbf{et} \ \boldsymbol{x} = \mathbf{t}'\_{\text{ANF}} & \mathbf{in} \ \mathbf{t}\_{\text{ANF}}\\\mathbf{t}'\_{\text{ANF}} ::= \boldsymbol{x} & \boldsymbol{c} \mid \lambda \boldsymbol{x} . \ \mathbf{t}\_{\text{ANF}} \mid \boldsymbol{x} \ \boldsymbol{y} \\\ & \mid \quad \mathbf{if} \ \boldsymbol{x} \ \mathbf{then} \ \mathbf{t}\_{\text{ANF}} \ \mathbf{else} \ \mathbf{t}\_{\text{ANF}} \mid \mathbf{assume} \ \boldsymbol{x} \ \mid \mathbf{weight} \ \boldsymbol{x} \end{array} \tag{3}$$

It holds that TANF ⊂ T. Furthermore, there exist standard transformations to convert terms in T to TANF. Fig. 3 illustrates Fig. 1a transformed to ANF. We will use Fig. 1a as a running example in Section 4 and Section 5.

Restricting programs to ANF signifcantly simplifes the suspension analysis and selective CPS transformation. From now on we require that all variable bindings in programs are unique, and together with ANF, the result is that every expression in a program t ∈ TANF is uniquely labeled by a variable name from a let expression. This property is essential for the treatment in Section 4.

# 4 Suspension Analysis

This section presents the main technical contribution: the suspension analysis. The analysis goal is to identify program expressions that may require suspension in the sense of Defnition 5. Identifying such expressions leads to the selective CPS transformation in Section 5, enabling transformations such as in Fig 1e.

The suspension analysis builds upon the 0-CFA algorithm [46,39], and we formalize our algorithms based on Lundén et al. [29]. The main challenge we solve is how to model the propagation of suspension in the presence of higherorder functions. The 0 in 0-CFA stands for context insensitivity—the analysis considers every part of the program in one global context. Context insensitivity makes the analysis more conservative compared to context-sensitive approaches such as k-CFA, where k ∈ N indicates the level of context sensitivity [33]. We use 0-CFA for two reasons: (i) the worst-case time complexity for the analysis is polynomial, while it is exponential for k-CFA already at k = 1, and (ii) the limitations of 0-CFA rarely matter in practical PPL applications. For example, k-CFA provides no benefts over 0-CFA for the programs in Section 7.

We assume ⟨λx. t, ρ⟩ ̸∈ C (recall that C is the set of intrinsics). That is, we assume that closures are not part of the intrinsics. In particular, this disallows intrinsic operations (including the use of assume d, d ∈ D ⊂ C) to produce closures, which would needlessly complicate the analysis without any beneft.

Consider the program in Fig. 3, and assume that weight requires suspension. Clearly, the expression labeled by w<sup>1</sup> at line 20 then requires suspension. Furthermore, w<sup>1</sup> evaluates as part of the larger expression labeled by t<sup>8</sup> at line 10. Consequently, the evaluation of t<sup>8</sup> also requires suspension. Also, t<sup>8</sup> evaluates as part of an application of the abstraction binding obs at line 7. In particular, the abstraction binding obs binds to iter , and we apply iter at lines 23 and 33. Thus, the expressions named by t<sup>17</sup> and t<sup>22</sup> require suspension. In summary, we have that w1, t8, t17, and t<sup>22</sup> require suspension, and we also note that all applications of the abstraction binding obs require suspension.

We proceed to the formalization and frst introduce standard abstract values.

Defnition 7 (Abstract values). We defne the abstract values a ∈ A as a ::= λx.y | const<sup>x</sup> n for x, y ∈ X and n ∈ N.

The abstract value λx.y represents all closures originating at, e.g., a term λx. let y = 1 in y in a program at runtime (recall that we assume that the variables x and y are unique). Note that the y indicates the name returned by the body (formalized by the function name in Algorithm 1). The abstract value Algorithm 1 Constraint generation for the suspension analysis. We write the functional-style pseudocode for the algorithm itself in sans serif font to distinguish it from terms in T.

function generateConstraints(t): TANF → P(R) = match t with | x → ∅ | let x = t<sup>1</sup> in t<sup>2</sup> → generateConstraints(t2) ∪ match t<sup>1</sup> with | y → {S<sup>y</sup> ⊆ Sx} | c → if |c| > 0 then {const<sup>x</sup> |c| ∈ Sx} else ∅ | λy. t<sup>b</sup> → generateConstraints(tb) ∪ {λy. name t<sup>b</sup> ∈ Sx} ∪ {suspend<sup>n</sup> ⇒ suspend<sup>y</sup> | n ∈ suspendNames(tb)} | lhs rhs → { ∀z∀y λz.y ∈ Slhs ⇒ (Srhs ⊆ Sz) ∧ (S<sup>y</sup> ⊆ Sx), ∀y∀n const<sup>y</sup> n ∈ Slhs ∧ n > 1 ⇒ const<sup>y</sup> n − 1 ∈ Sx, ∀y λy.\_ ∈ Slhs ⇒ (suspend<sup>y</sup> ⇒ suspendx), ∀y const<sup>y</sup> \_ ∈ Slhs ⇒ (suspend<sup>y</sup> ⇒ suspendx), suspend<sup>x</sup> ⇒ (∀y λy.\_ ∈ Slhs ⇒ suspend<sup>y</sup> ) ∧ (∀y const<sup>y</sup> \_ ∈ Slhs ⇒ suspend<sup>y</sup> ) } | assume \_ → if suspendassume then {suspendx} else <sup>∅</sup> | weight \_ → if suspendweight then {suspendx} else <sup>∅</sup> | if y then t<sup>t</sup> else t<sup>e</sup> → generateConstraints(tt) ∪ generateConstraints(te) ∪ {Sname <sup>t</sup><sup>t</sup> ⊆ Sx, Sname <sup>t</sup><sup>e</sup> ⊆ Sx} ∪ {suspend<sup>n</sup> ⇒ suspend<sup>x</sup> | n ∈ suspendNames(tt) ∪ suspendNames(te)} function name(t): TANF → X = match t with | x → x | let x = t<sup>1</sup> in t<sup>2</sup> → name(t2) function suspendNames(t): TANF → P(X) = match t with | x → ∅ | let x = t<sup>1</sup> in t<sup>2</sup> → suspendNames(t2) ∪ match t<sup>1</sup> with | lhs rhs → {x} | if y then t<sup>t</sup> else t<sup>e</sup> → {x} | assume \_ → if suspendassume then {x} else <sup>∅</sup> | weight \_ → if suspendweight then {x} else <sup>∅</sup> | \_ → ∅

const<sup>x</sup> n represents all intrinsic functions of arity n originating at x. For example, const<sup>x</sup> 2 originates at, e.g., a term let x = + in t.

The central objects in the analysis are sets S<sup>x</sup> ∈ P(A) and boolean values suspend<sup>x</sup> for all x ∈ X. The set S<sup>x</sup> contains all abstract values that may fow to the expression labeled by x, and suspend<sup>x</sup> indicates whether or not the expression requires suspension. A trivial but useless solution is S<sup>x</sup> = A and suspend<sup>x</sup> = true for all variables x in the program. To get more precise information regarding suspension, we wish to fnd smaller solutions to the S<sup>x</sup> and suspend<sup>x</sup> .

To formalize the set of sound solutions for S<sup>x</sup> and suspend<sup>x</sup> , we generate constraints c ∈ R for programs.† Algorithm 1 formalizes the necessary constraints for programs t ∈ TANF with a function generateConstraints that recursively traverses the program t to generate a set of constraints. Due to ANF, there are only two cases in the top match (line 1). Variables generate no constraints, and the important case is for let expressions at lines 3–30. The algorithm makes use of an auxiliary function name (line 39) that determines the name of an ANF expression, and a function suspendNames (line 44) that determines the names of all top-level expressions within an expression that may suspend (namely, applications, if expressions, and assume and/or weight).

#### 314 D. Lundén et al.

We next illustrate and motivate the generated constraints by considering the set of constraints generateConstraints(texample), where texample is the program in Fig. 3. Many constraints are standard, and we therefore focus on the new suspension constraints introduced as part of this paper. In particular, the challenge is to correctly capture the fow of suspension requirements across function applications and higher-order functions. First, we see that defning aliases (line 6) generates constraints of the form S<sup>y</sup> ⊆ Sx, that constants introduce const abstract values (e.g., constt<sup>6</sup> 1 ∈ St<sup>6</sup> ), and that assume and weight introduce suspension requirements, e.g., suspend <sup>w</sup><sup>1</sup> (shorthand for suspend <sup>w</sup><sup>1</sup> = true).

First, we consider the constraints generated for λobs. (line 7 in Fig. 3) through the case at lines 9-12 in Algorithm 1. To keep the example simple, we treat the unexpanded let rec as an ordinary let in the analysis (for this particular example, the analysis result is unafected). Omitting the recursively generated constraints for the abstraction body, the generated constraints are

$$\{\lambda obs. t\_8 \in S\_{iter}\} \cup \{ suspend\_n \Rightarrow supend\_{obs} \mid n \in \{t\_7, t\_8\}\}.\tag{4}$$

The frst constraint is standard and states that the abstract value λobs. t<sup>8</sup> fows to Siter as the variable naming the λobs expression is t<sup>8</sup> at line 26 in Fig. 3 (difcult to notice due to the column breaks). The remaining constraints are new and sets up the fow of suspension requirements. Specifcally, the abstraction obs itself requires suspension if any expression bound by a top-level let in its body requires suspension. For efciency, we only set up dependencies for expressions that may suspend (formalized by suspendNames in Algorithm 1). Note here that we do not add the constraint suspend <sup>w</sup><sup>1</sup> ⇒ suspend obs , as w<sup>1</sup> is not at top-level in the body of obs. Instead, we later add the constraint suspend <sup>w</sup><sup>1</sup> ⇒ suspendt<sup>8</sup> , and suspend <sup>w</sup><sup>1</sup> ⇒ suspend obs follows by transitivity.

The constraints generated for the if bound to t<sup>8</sup> at line 10 through the case at lines 31-37 in Algorithm 1 are (omitting recursively generated constraints)

$$\begin{aligned} \{ S\_{t\_9} \subseteq S\_{t\_8}, S\_{t\_{17}} \subseteq S\_{t\_8} \} \\ \cup \{ suspend\_n \Rightarrow suspend\_{t\_8} \mid n \in \{ t\_{11}, t\_{13}, t\_{14}, w\_1, t\_{16}, t\_{17} \} \}. \end{aligned} \tag{5}$$

The frst two constraints are standard, and state that abstract values in the results of both branches fow to the result S<sup>t</sup><sup>8</sup> . The last set of constraints is new and similar to the abstraction suspension constraints. The constraints capture that all expressions at top-level in both branches that require suspension also cause t<sup>8</sup> to require suspension.

Consider the application at line 23 in Fig. 3. The generated constraints through the case at lines 13-25 in Algorithm 1 are

$$\begin{aligned} \{ \begin{aligned} \forall z \forall y \ \lambda z.y \in S\_{iter} \Rightarrow & (S\_{t\_{16}} \subseteq S\_z) \land (S\_y \subseteq S\_{t\_{17}}),\\ \forall y \forall n \ \mathsf{const}\_y \ n \in S\_{iter} \land n > 1 \Rightarrow \mathsf{const}\_y \ n - 1 \in S\_{t\_{17}},\\ \forall y \ \lambda y.\\_ \in S\_{iter} \Rightarrow & (suspend\_y \Rightarrow suspend\_{t\_{17}}),\\ \forall y \ \mathsf{const}\_y \ \\_ \in S\_{iter} \Rightarrow & (suspend\_y \Rightarrow suspend\_{t\_{17}}),\\ \qquad suspend\_{t\_{17}} \Rightarrow & (\forall y \ \lambda y.\\_ \in S\_{iter} \Rightarrow & suspend\_y) \end{aligned} \tag{6}$$
 
$$\begin{aligned} \wedge \ (\forall y \ \mathsf{const}\_y \ \\_ \in S\_{iter} \Rightarrow & suspend\_y) \end{aligned} \tag{7}$$

The frst two constraints are standard and state how abstract values fow as a result of applications. The last three constraints are new and relate to suspension. The third and fourth constraints state that if an abstraction or intrinsic requiring suspension fows to iter , the result t<sup>17</sup> of the application also requires suspension. The ffth constraint states that if the result t<sup>17</sup> requires suspension, then all abstractions and constants fowing to iter require suspension. This last constraint is not strictly required to later prove the soundness of the analysis in Theorem 1, but, as we will see in Section 5, it is required for the selective CPS transformation.

We fnd a solution to the constraints through a fairly standard algorithm that propagates abstract values according to the constraints until fxpoint.† However, we extend the algorithm to support the new suspension constraints. The algorithm is a function analyzeSuspend: TANF → ((X → P(A)) × P(X)). The function returns a map data : X → P(A) that assigns sets of abstract values to all S<sup>x</sup> and a set suspend : P(X) that assigns suspend<sup>x</sup> = true if x ∈ suspend. Importantly, the assignments to S<sup>x</sup> and suspend<sup>x</sup> satisfy all generated constraints. To illustrate the algorithm, here are the analysis results analyzeSuspend(texample):

$$\begin{aligned} S\_{iter} &= \{ \lambda obs. t\_8 \} & S\_{t\_6} &= \{ \mathtt{const}\_{t\_6} 1 \} & S\_{t\_{10}} &= \{ \mathtt{const}\_{t\_{10}} 2 \} \\ S\_{t\_{11}} &= \{ \mathtt{const}\_{t\_{10}} 1 \} & S\_{t\_{12}} &= \{ \mathtt{const}\_{t\_{12}} 1 \} & S\_{t\_{15}} &= \{ \mathtt{const}\_{t\_{15}} 1 \} \end{aligned}$$

$$\begin{aligned} S\_n &= \mathcal{Q} \mid \text{all other } n \in X \\ &\quad \text{supp}nd\_n = \text{true} \mid n \in \{ obs, w\_1, t\_8, t\_{17}, t\_{22} \} \\ &\quad \quad \text{supp}nd\_n = \text{false} \mid \text{all other } n \in X. \end{aligned} \tag{7}$$

The above results confrm our earlier reasoning: the expressions labeled by obs, w1, t8, t17, and t<sup>22</sup> may require suspension.

We now consider the soundness of the analysis. First, the soundness of 0- CFA is well established (see, e.g., Nielson et al. [39]) and extends to our new constraints, and we take the following lemma to hold without proof.

Lemma 1 (0-CFA soundness). For every t ∈ TANF, the solution given by analyzeSuspend(t) for S<sup>x</sup> and suspend<sup>x</sup> , x ∈ X, satisfes the constraints generateConstraints(t).

Next, we must show that the constraints themselves are sound. Consider the evaluation of an arbitrary term t ∈ TANF. For each subderivation of t, labeled by a name x (due to ANF), it must hold that suspend<sup>x</sup> = true if the subderivation requires suspension. Otherwise, the analysis is unsound. Theorem 1 formally captures the soundness. Note that the analysis is conservative (i.e., incomplete), because it may fnd suspend<sup>x</sup> = true even if the subderivation for x does not require suspension.

Theorem 1 (Suspension analysis soundness). Let t ∈ TANF, s ∈ S, u ∈ {false, true}, w ∈ R, and v ∈ V such that ∅ ⊢ t s⇓ w <sup>u</sup> v. Now, let S<sup>x</sup> and suspend<sup>x</sup> for x ∈ X according to analyzeSuspend(t). For every subderivation (ρ ⊢ let x = t<sup>1</sup> in t<sup>2</sup> <sup>s</sup>1∥s<sup>2</sup> ⇓ w1·w<sup>2</sup> u1∨u<sup>2</sup> v ′ ) of (∅ ⊢ t s⇓ w <sup>u</sup> v), u<sup>1</sup> = true implies suspend<sup>x</sup> = true.

Algorithm 2 Selective continuation-passing style transformation. We defne tid = λx.x. The term cCPS is the CPS version of c. We write the functional-style pseudocode for the algorithm itself in sans serif font to distinguish it from terms in T.

function cps(vars, t): P(X) × TANF → T <sup>+</sup> = <sup>1</sup> return cps′ (tid, t) 2 <sup>3</sup> function cps′ (cont,t): T × TANF → T <sup>+</sup> = 4 match t with <sup>5</sup> | x → if cont = tid then t else cont t <sup>6</sup> | let x = t<sup>1</sup> in t<sup>2</sup> → <sup>7</sup> let t′ <sup>2</sup> <sup>=</sup> cps′ (cont, t2) in <sup>8</sup> match t<sup>1</sup> with <sup>9</sup> | y → let x = t<sup>1</sup> in t ′ 2 <sup>10</sup> | c → let x = <sup>11</sup> (if x ∈ vars then cCPS else c) in t ′ 2 <sup>12</sup> | λy. t<sup>b</sup> → <sup>13</sup> let t′ <sup>1</sup> = if y ∈ vars <sup>14</sup> then λk.λy. cps′ (k, tb) <sup>15</sup> else λy. cps′ (tid, tb) 16 in <sup>17</sup> let x = t ′ 1 in t ′ 2 18 | lhs rhs → 19 if x ∈ vars then <sup>20</sup> if tailCall(t) 21 then lhs cont rhs 22 else lhs (λx.t ′ 2 ) rhs <sup>23</sup> else let x = t<sup>1</sup> in t ′ 2 24 25 26 27 <sup>28</sup> | if y then t<sup>t</sup> else t<sup>e</sup> → 29 if x ∈ vars then <sup>30</sup> if tailCall(t) then <sup>31</sup> if y then cps′ (cont, tt) <sup>32</sup> else cps′ (cont, te) 33 else <sup>34</sup> let k = λx.t ′ 2 in <sup>35</sup> if y then cps′ (k, tt) else cps′ (k, te) <sup>36</sup> else let x = if y then cps′ (tid, tt) <sup>37</sup> else cps′ (tid, te) in t ′ 2 <sup>38</sup> | assume y → let x = t<sup>1</sup> in t ′ 2 39 if x ∈ vars then <sup>40</sup> if tailCall(t) <sup>41</sup> then Suspensionassume(y, cont) <sup>42</sup> else Suspensionassume(y,λx.cps′ (cont, t2)) <sup>43</sup> else let x = t<sup>1</sup> in t ′ 2 <sup>44</sup> | weight y → let x = t<sup>1</sup> in t ′ 2 45 if x ∈ vars then <sup>46</sup> if tailCall(t) <sup>47</sup> then Suspensionweight(y, cont) <sup>48</sup> else Suspensionweight(y,λx.cps′ (cont, t2)) <sup>49</sup> else let x = t<sup>1</sup> in t ′ 2 50 <sup>51</sup> function tailCall(t): TANF → {false, true} = 52 match t with <sup>53</sup> | let x = \_ in x → true 54 | \_ → false

The proof uses Lemma 1 and structural induction over the derivation ∅ ⊢ t s⇓ w <sup>u</sup> v. †

Next, we use the suspension analysis to selectively CPS transform programs.

# 5 Selective CPS Transformation

This section presents the second technical contribution: the selective CPS transformation. The transformations themselves are standard, and the challenge is to correctly use the suspension analysis results for a selective transformation.

Algorithm 2 is the full algorithm. Using terms in ANF as input signifcantly helps reduce the algorithm's complexity. The main function cps takes as input a set vars : P(X), indicating which expressions to CPS transform, and a program t ∈ TANF to transform. It is the new vars argument that separates the transformation from a standard CPS transformation. For the purposes of this paper, we always use vars = {x | suspend<sup>x</sup> = true}, where the suspend<sup>x</sup> come from analyzeSuspend(t). One could also use vars = X for a standard full CPS transformation (e.g., Fig 1f), or some other set vars for other application domains. The value returned from the cps function is a (non-ANF) term of the

```
1 let t1 = 2 in
2 let t2 = 2 in
3 let t3 = Beta in
4 let t4 = t3 t1 in
5 let t5 = t4 t2 in
6 let a1 = assume t5 in
7 let rec iter = λk. λobs.
8 let t6 = null in
9 let t7 = t6 obs in
10 if t7 then
11 let t9 = () in
12 t9
13 else
14 let t10 = fBernoulli in
15 let t11 = t10 a1 in
16 let t12 = head in
                                               17 let t13 = t12 obs in
                                               18 let t14 = t11 t13 in
                                               19 Suspensionweight(t14,
                                               20 λ_.
                                               21 let t15 = tail in
                                               22 let t16 = t15 obs in
                                               23 iter k t16)
                                               24 in
                                               25 let t18 = true in
                                               26 let t19 = false in
                                               27 let t20 = true in
                                               28 let t21 = true in
                                               29 let t22 = [t21,t20,t19,t18] in
                                               30 let k
                                                       ′
                                                         = λ_. a1 in
                                               31 iter k
                                                        ′
                                                         t22
```
Fig. 4: The running example from Fig. 3 after selective CPS transformation. The program is semantically equivalent to Fig. 1e.

type T <sup>+</sup>. The helper function cps′ , initially called at line 1, takes as input a continuation term cont, indicating the continuation to apply in tail position. Initially, this continuation term is tid, which indicates no continuation. Similarly to Algorithm 1, the top-level match at line 4 has two cases: a simple case for variables (line 5) and a complex case for let expressions (lines 6–49). To enable optimization of tail calls, the auxiliary function tailCall indicates whether or not an ANF expression is a tail call (i.e., of the form let x = t ′ in x).

We now illustrate Algorithm 2 by computing cps(varsexample, texample), where varsexample = {obs, w1, t8, t17, t22} is from (7), and texample is from Fig. 3. Fig. 4 presents the fnal result. First, we note that the transformation does not change expressions not labeled by a name in varsexample, as they do not require suspension. In the following, we therefore focus only on the transformed expressions. First, consider the abstraction obs defned at line 7 in Fig. 3, handled by the case at line 12 in Algorithm 2. As obs ∈ varsexample, we apply the standard CPS transformation for abstractions: add a continuation parameter to the abstraction and recursively transform the body with this continuation. Next, consider the transformation of the weight expression w<sup>1</sup> at line 20 in Fig. 3, handled by the case at line 44 in Algorithm 2. The expression is not at tail position, so we build a new continuation containing the subsequent let expressions, recursively transform the body of the continuation, and then wrap the end result in a Suspension object. The if expression t<sup>8</sup> at line 10 in Fig. 3, handled by the case at line 28 in Algorithm 2, is in tail position (it is directly followed by returning t8). Consequently, we transform both branches recursively. Finally, we have the applications t<sup>17</sup> and t<sup>22</sup> at lines 23 and 33 in Fig. 3, handled by the case at line 18 in Algorithm 2. The application t<sup>17</sup> is at tail position, and we transform it by adding the current continuation as an argument. The application at t<sup>22</sup> is not at tail position, so we construct a continuation k ′ that returns the fnal value a<sup>1</sup> (line 34 in Fig. 3), and then add it as an argument to the application.

Fig. 5: Overview of the Miking CorePPL compiler implementation. We divide the overall compiler into two parts, (i) suspension analysis and selective CPS (Section 6.1) and (ii) inference problem extraction (Section 6.2). The fgure depicts artifacts as gray rectangular boxes and transformation units and libraries as blue rounded boxes. Note how the inference extractors transformation separates the program into two diferent paths that are combined again after the inference-specifc compilation. The white inheritance arrows (pointing to suspension analysis and selective CPS transformations) mean that these libraries are used within the inference-specifc compiler transformation.

It is not guaranteed that Algorithm 2 produces a correct result. Specifcally, for all applications lhs rhs, we must ensure that (i) if we CPS transform the application, we must also CPS transform all possible abstractions that can occur at lhs, and (ii) if we do not CPS transform the application, we must not CPS transform any abstraction that can occur at lhs. We control this through the argument vars. In particular, assigning vars according to the suspension analysis produces a correct result. To see this, consider the application constraints at lines 13–25 in Algorithm 1 again, and note that if any abstraction or intrinsic operation that requires suspension occur at lhs, suspend<sup>x</sup> = true. Furthermore, the last application constraint ensures that if suspend<sup>x</sup> = true, then all abstractions and intrinsic operations that occur at lhs require suspension. Consequently, for all λy. \_ and const<sup>y</sup> \_, either all suspend <sup>y</sup> = true or all suspend <sup>y</sup> = false.

# 6 Implementation

We implement the suspension analysis and selective CPS transformation in Miking CorePPL [30], a core PPL implemented in the domain-specifc language construction framework Miking [9]. We choose Miking CorePPL for the implementation over other CPS-based PPLs, as the language implementation contains an existing 0-CFA base implementation which simplifes the suspension analysis implementation. Fig. 5 presents the organization of the CorePPL compiler. The input is a CorePPL program that may contain many inference problems and applications of inference algorithms, similar to WebPPL and Anglican. The output is an executable produced by one of the Miking backend compilers. Section 6.1 gives the details of the suspension analysis and selective CPS implementations, and in particular the diferences compared to the core calculus in Section 3. Section 6.2 presents the inference extractor and its operation combined with selective CPS. The suspension analysis, selective CPS transformation, and inference extraction implementations consist of roughly 1500 lines of code (a contribution in this paper). The code is available on GitHub [2].

#### 6.1 Suspension Analysis and Selective CPS

Miking CorePPL extends the abstract syntax in Defnition 1 with standard functional data structures and features such as algebraic data types (records, tuples, and variants), lists, and pattern matching. The suspension analysis and selective CPS implementations in Miking CorePPL extend Algorithm 1 and Algorithm 2 to support these language features. Furthermore, compared to suspend weight and suspend assume in Fig. 2, the implementation allows arbitrary confguration of suspension sources. In particular, the implementation uses this arbitrary confguration together with the alignment analysis by Lundén et al. [29]. This combination allows selectively CPS transforming to suspend at a subset of assumes or weights for aligned versions of SMC and MCMC inference algorithms.

Miking CorePPL also includes a framework for inference algorithm implementation. Specifcally, to implement new inference algorithms, users implement an inference-specifc compiler and inference-specifc runtime. Fig. 5 illustrates the diferent compilers and runtimes. Each inference-specifc compiler applies the suspension analysis and selective CPS transformation to suit the inference algorithm's particular suspension requirements.

Next, we show how Miking CorePPL handles programs containing many inference problems solved with diferent inference algorithms.

#### 6.2 Inference Problem Extraction

Fig. 5 includes the inference extraction compiler procedure. First, the compiler applies an inference extractor to the input program. The result is a set of inference problems and a main program containing remaining glue code. Second, the compiler applies inference-specifc compilers to each inference problem. Finally, the compiler combines the main program and the compiled inference problems with inference-specifc runtimes and supplies the result to a backend compiler.

Consider the example in Fig. 6a. We defne a function m that constructs a minimal inference problem on lines 7–10, using a single call to assume and a single call to observe (modifying the execution weight similar to weight). The function takes an initial probability distribution d and a data point y as input. We apply aligned lightweight MCMC inference for the inference problem through the infer construct on lines 12–16. The frst argument to infer gives the inference algorithm confguration, and the second argument the inference problem. Inference problems are thunks (i.e., functions with a dummy unit argument). We construct the inference problem thunk by an application of m with a uniform initial distribution and data point 1.0. The inference result d0 is another probability distribution, and we use it as the frst initial distribution in the recursive

```
1 mexpr
  2 let data = [
  3 24.0, 42.2, 96.7, 9.2, 85.8,
  4 34.2, 41.7, 53.4, 85.6, 45.4
  5 ] in
  6
  7 let m = lam d. lam y. lam.
  8 let x = assume d in
  9 observe y (Gaussian x 0.1);
 10 x in
 11
 12 let d0 =
 13 infer (LightweightMCMC
 14 { iterations = 100,
 15 aligned = true })
 16 (m (Uniform 0.0 4.0) 1.0) in
 17
                                                 18 recursive let repeat =
                                                 19 lam data. lam d.
                                                 20 match data with [y] ++ data then
                                                 21 let posterior =
                                                 22 infer (BPF {particles = 100})
                                                 23 (m d y) in
                                                 24 repeat data posterior
                                                 25 else d
                                                 26 let d1 = repeat data d0 in
                                                 27 match distEmpiricalSamples d1
                                                 28 with (samples, weights) in
                                                 29 iter
                                                 30 (lam s.
                                                 31 print
                                                 32 (concat (float2string s) "\n"))
                                                 33 samples
320 D. Lundén et al.
```
(a) Miking CorePPL program.


line 13 in (a).

Fig. 6: Example Miking CorePPL program in (a) with two non-trivial uses of infer. Figures (b) and (c) show the extracted and selectively CPS-transformed inference problems at lines 13 and 22 in (a), respectively. The compiler handles the free variables d and y in (c) in a later stage.

line 22 in (a).

repeat function (lines 19–24). This function repeatedly performs inference using the SMC bootstrap particle flter (lines 21–23), again using the function m to construct the sequence of inference problems. Each infer application uses the result distribution from the previous iteration as the initial distribution and consumes data points from the data sequence. We extract and print the samples from the fnal result distribution d1 at lines 29–33. A limitation with the current extraction approach is that we do not yet support nested infers.

A key challenge in the compiler design is how to handle diferent inference algorithms within one probabilistic program. In particular, inference algorithms require diferent selective CPS transformations, applied to diferent parts of the code. To allow the separate handling of inference algorithms, we apply the extraction approach by Hummelgren et al. [22] on the infer applications, producing separate inference problems for each occurrence of infer. Although the compiler design mostly concerns rather comprehensive engineering work, special care must be taken to handle the non-trivial problem of name bindings when transforming and combining diferent code entities together. For instance, the compiler must selectively CPS transform Fig. 6b to suspend at assume (required by MCMC) and selectively CPS transform Fig. 6c to suspend at observe (required by SMC). We design a robust and modular solution, where it is possible to easily add new inference algorithms without worrying about name conficts.

# 7 Evaluation

This section presents the evaluation of the suspension analysis and selective CPS implementations. Our main claims are that (i) the approach of selective CPS signifcantly improves performance compared to traditional full CPS, and (ii) that this holds for a signifcant set of inference algorithms, evaluated on realistic inference problems. We use four PPL models and corresponding data sets from the Miking benchmarks repository, available on GitHub [1]. The models are: constant rate birth-death (CRBD) in Section 7.1, cladogenetic diversifcation rate shift (ClaDS) in Section 7.2, latent Dirichlet allocation (LDA) in Section 7.3, and vector-borne disease (VBD) in Section 7.4. All models are signifcant and actively used in diferent research areas: CRBD and ClaDS in evolutionary biology and phylogenetics [37,43,32], LDA in topic modeling [7], and VBD in epidemiology [14,34]. In addition to the Miking CorePPL models from the Miking benchmarks, we also implement CRBD in WebPPL and Anglican.

We add a number of popular inference algorithms in Miking CorePPL with support for selective CPS. The frst is standard likelihood weighting (LW), as introduced in Section 2. LW does not strictly require CPS, but we implement it with suspensions at weight to highlight the diference between no CPS, selective CPS, and full CPS. LW gives a good direct measure of CPS overhead as the algorithm simply executes programs many times. Suspending at weight can also be useful in LW to stop executions with weight 0 (i.e., useless samples) early. However, we do not use early stopping to isolate the efect CPS has on execution time. Next, we add the bootstrap particle flter (BPF) and alive particle flter (APF). Both are SMC algorithms that suspend at weight to resample executions. BPF is a standard algorithm often used in PPLs, and APF is a related algorithm introduced in a PPL context by Kudlicka et al. [24]. The fnal two inference algorithms we add are aligned lightweight MCMC (just MCMC for short) and particle-independent Metropolis–Hastings. Aligned lightweight MCMC [29] is an extension to the standard PPL Metropolis–Hastings approach introduced by Wingate et al. [49], and suspends at a subset of calls to assume. Particle-independent Metropolis–Hastings (PIMH) is an MCMC algorithm that repeatedly uses the BPF (suspending at weight) within a Metropolis–Hastings MCMC algorithm [40]. We limit the scope to single-core CPU inference.

In addition to the inference algorithms in Miking CorePPL, we also use three other state-of-the-art PPLs for CRBD: Anglican, WebPPL, and the special highperformance RootPPL compiler for Miking CorePPL [30]. For Anglican, we apply LW, BPF, and PIMH inference. For WebPPL, we use BPF and (non-aligned) lightweight MCMC. For the RootPPL version of Miking CorePPL, we use BPF inference (the only supported inference algorithm).

We consider two confgurations for each model: 1 000 and 10 000 samples. An exception is for CRBD and ClaDS, where we adjust APF to use 500 and 5 000

Fig. 7: Mean execution times for the CRBD model. The error bars show 95% confdence intervals (using the option ('ci', 95) in Seaborn's barplot). The table shows standard deviations.

samples to make the inference accuracy comparable to the related BPF. We run each experiment 300 times (with one warmup run) and measure execution time (excluding compile time). To justify the efciency of the suspension analysis and selective CPS transformation that are part of the compiler, we note here that they, combined, run in only 1–5 ms for all models.

The experiments do not compare the performance of diferent inference algorithms. To do this, one would also need to consider how accurate the inference results are for a given amount of execution time. Accuracy varies dramatically between diferent combinations of inference algorithms and models. We evaluate the execution time of selective and full CPS in isolation for individual inference algorithms. Selective CPS is solely an execution time optimization—the algorithms themselves and their accuracy remain unchanged.†

For Miking CorePPL, we used OCaml 4.12.0 as backend compiler for the implementation in Section 6 and GCC 7.5.0 for the separate RootPPL compiler. We used Anglican 1.1.0 (OpenJDK 11.0.19) and WebPPL 0.9.15 (Node.js 16.18.0). We ran the experiments on an Intel Xeon Gold 6148 CPU with 64 GB of memory using Ubuntu 18.04.6.

#### 7.1 Constant Rate Birth-Death

CRBD is a diversifcation model, used by evolutionary biologists to infer distributions over birth and death rates for observed evolutionary trees of groups of species, called phylogenies. For the CRBD experiment, we use the Alcedinidae phylogeny (Kingfsher birds, 54 extant species) [43,23]. We compare CRBD in Miking CorePPL (55 lines of code)† , Anglican (129 lines of code)† , and WebPPL (66 lines of code)† . The total experiment execution time was 9 hours.

Fig. 8: Mean execution times for the ClaDS model. The error bars show 95% confdence intervals (using the option ('ci', 95) in Seaborn's barplot).

Fig. 7 presents the results. We note that selective CPS is faster than full CPS in all cases. Unlike full CPS, the overhead of selective CPS compared to no CPS is marginal for LW. The execution time for early MCMC samples is sensitive to initial conditions, and we therefore see more variance for MCMC compared to the other algorithms. When we increase the number of samples to 10 000, the variance reduces. With the exception of MCMC in WebPPL, the execution times for Anglican and WebPPL are one order of magnitude slower than the equivalent algorithms in Miking CorePPL. However, note that the comparison is only for reference and not entirely fair, as Anglican and WebPPL use diferent execution environments compared to Miking CorePPL. Lastly, we note that the Miking CorePPL BPF implementation with selective CPS is not much slower than when compiling Miking CorePPL to RootPPL BPF—a compiler designed specifcally for efciency (but with other limitations, such as the lack of garbage collection). RootPPL does not use CPS, and instead enables suspension through a low-level transformation using the concept of PPL control-fow graphs [30].

#### 7.2 Cladogenetic Diversifcation Rate Shift

ClaDS is another diversifcation model used in evolutionary biology [32,43]. Unlike CRBD, it allows birth and death rates to change over time. We again use the Alcedinidae phylogeny. The source code consists of 72 lines of code.† The total experiment execution time was 3 hours. Fig. 8 presents the results. We note that selective CPS is faster than full CPS in all cases.

#### 7.3 Latent Dirichlet Allocation

LDA [7] is a model from natural language processing used to categorize documents into topics. We use a synthetic data set with size comparable to the data set in Ritchie et al. [41]: a vocabulary of 100 words, 10 topics, and 25 observed documents (30 words in each). We do not apply any optimization techniques such as collapsed Gibbs sampling [21]. Solving the inference problem using a PPL is

Fig. 9: Mean execution times for the LDA model. The error bars show 95% confdence intervals (using the option ('ci', 95) in Seaborn's barplot).

Fig. 10: Mean execution times for the VBD model. The error bars show 95% confdence intervals (using the option ('ci', 95) in Seaborn's barplot).

therefore challenging already for small data sets. The source code consists of 26 lines of code.† The total experiment execution time was 12 hours.

Fig. 9 presents the results. We note that selective CPS is faster than full CPS in all cases. Interestingly, the reduction in overhead compared to full CPS for LW is not as signifcant. The reason is that suspension at weight for the model requires that we CPS transform the most computationally expensive recursion.

#### 7.4 Vector-Borne Disease

We use the VBD model from Funk et al. [14] and later Murray et al. [34]. The background is a dengue outbreak in Micronesia and the spread of disease between mosquitos and humans. The inference problem is to fnd the true numbers of susceptible, exposed, infectious, and recovered (SEIR) individuals each day, given daily reported numbers of new cases at health centers. The source code consists of 140 lines of code.† The total execution time was 8 hours.

Fig. 10 presents the results. Again, we note that selective CPS is faster than full CPS in all cases, except seemingly for APF and 1 000 samples. This is very likely a statistical anomaly, as the variance for APF is quite severe for the case with 1 000 samples. Compared to the BPF, APF uses a resampling approach for

which the execution time varies a lot if the number of samples is too low [24]. The plots clearly show this as, compared to 1 000 samples, the variance is reduced to BPF-comparable levels for 10 000 samples. In summary, the evaluation demonstrates the clear benefts of selective CPS over full CPS for universal PPLs.

# 8 Related Work

There are a number of universal PPLs that require non-trivial suspension. One such language is Anglican [50], which solves the suspension problem using CPS. Anglican performs a full CPS transformation with one exception—certain statically known functions named primitive procedures, that include a subset of the regular Clojure (the host language of Anglican) functions, are guaranteed to not execute PPL code, and Anglican does not CPS transform them [47]. However, higher-order functions in Clojure libraries cannot be primitive procedures, and Anglican must manually reimplement such functions (e.g., map and fold). Anglican does not consider a selective CPS transformation of PPL code, and always fully CPS transforms the PPL part of Anglican programs.

WebPPL [18] and the approach by Ritchie et al. [41] also make use of CPS transformations to implement PPL inference. They do not, however, consider selective CPS transformations. Ścibior et al. [44] present an architectural design for a probabilistic functional programming library based on monads and monad transformers (and corresponding theory in Ścibior et al. [45]). In particular, they use a coroutine monad transformer to suspend SMC inference. This approach is similar to ours in that it makes use of high-level functional language features to enable suspension. They do not, however, consider a selective transformation.

The PPLs Pyro [6], Stan [10,5], Gen [11,27], and Edward [48] either implement inference algorithms that do not require suspension (e.g., Hamiltonian Monte Carlo), or restrict the language in such a way that suspension is explicit and trivially handled by the language implementation. For example, SMC in Pyro<sup>8</sup> and newer versions of Birch require that users explicitly write programs as a step function that the SMC implementation calls iteratively. Resampling only occurs in between calls to step, and suspension is therefore trivial.

Work on general-purpose selective CPS transformations include Nielsen [38], Asai and Uehara [4], Rompf et al. [42], and Leijen [26]. They consider typed languages, unlike the untyped language in this paper. The early work by Nielsen [38] considers the efcient implementation of call/cc through a selective CPS transformation. The transformation requires manual user annotations, unlike the fully automatic approach in this paper. A more recent approach is due to Asai and Uehara [4], who consider an efcient implementation of delimited continuations using shift and reset through a selective CPS transformation. Similar to us, they automatically determine where to selectively CPS transform programs. They use an approach based on type inference, while our approach builds upon 0-CFA. Rompf et al. [42] follow a similar approach to Asai and Uehara [4], but for

<sup>8</sup> Note that the main inference algorithm in Pyro is stochastic variational inference, which does not require suspension.

Scala, and additionally require user annotations. Leijen [26] uses a type-directed selective CPS transformation to compile algebraic efect handlers.

There are low-level alternatives to CPS for suspension in PPLs. In particular, there are various languages and approaches that directly implement support for non-preemptive multitasking (e.g., coroutines). Turing [15] and older versions of Birch [36,35] implement coroutines to enable arbitrary suspension, but do not discuss the implementations in detail. Lundén et al. [30] introduces and uses the concept of PPL control-fow graphs to compile Miking CorePPL to the low-level C++ framework RootPPL. The compiler explicitly introduces code that maintains special execution call stacks, distinct from the implicit C++ call stacks. The implementation results in excellent performance, but supports neither garbage collection nor higher-order functions. Another low-level approach is due to Paige and Wood [40], who exploits mutual exclusion locks and the fork system call to suspend and resample SMC executions. In theory, many of the above low-level alternatives to CPS can, if implemented efciently, result in the least possible overhead due to more fne-grained low-level control. The approaches do, however, require signifcantly more implementation efort compared to a CPS transformation. Comparatively, the selective CPS transformation is a surprisingly simple, high-level, and easy-to-implement alternative that brings the overhead of CPS closer to that of more low-level approaches.

# 9 Conclusion

This paper introduces a selective CPS transformation for the purpose of execution suspension in PPLs. To enable the transformation, we develop a static suspension analysis that determines parts of programs that require a CPS transformation as a consequence of inference algorithm suspension requirements. We implement the suspension analysis, selective CPS transformation, and an inference problem extraction procedure (required as a result of the selective CPS transformation) in Miking CorePPL. Furthermore, we evaluate the implementation on real-world models from phylogenetics, topic-modeling, and epidemiology. The results demonstrate signifcant speedups compared to the standard full CPS suspension approach for a large number of Monte Carlo inference algorithms.

Acknowledgments. This project was fnancially supported by the Swedish Foundation for Strategic Research (FFL15-0032 and RIT15-0012), partially supported by the Swedish Research Council (Grant No. 2018-04329), and by Digital Futures (the DLL project). The research has also been carried out as part of the Vinnova Competence Center for Trustworthy Edge Computing Systems and Applications at KTH Royal Institute of Technology. We thank Gizem Çaylak for her LDA implementation and Viktor Senderov for his ClaDS implementation.

Data-Availability Statement. The paper has an accompanying artifact that supports the evaluation: https://zenodo.org/doi/10.5281/zenodo.10454311.

# References


328 D. Lundén et al.


330 D. Lundén et al.

50. Wood, F., Meent, J.W., Mansinghka, V.: A new approach to probabilistic programming inference. In: Proceedings of the Seventeenth International Conference on Artifcial Intelligence and Statistics. vol. 33, pp. 1024–1032. Proceedings of Machine Learning Research (2014)

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Higher-Order LCTRSs and Their Termination

Liye Guo(B) and Cynthia Kop

Radboud University, Nijmegen, The Netherlands {l.guo,c.kop}@cs.ru.nl

Abstract Logically constrained term rewriting systems (LCTRSs) are a formalism for program analysis with support for data types that are not (co)inductively defned. Only imperative programs have been considered through the lens of LCTRSs so far since LCTRSs were introduced as a frstorder formalism. In this paper, we propose logically constrained simplytyped term rewriting systems (LCSTRSs), a higher-order generalization of LCTRSs, which suits the needs of representing and analyzing functional programs. We also study the termination problem of LCSTRSs and defne a variant of the higher-order recursive path ordering (HORPO) for the newly proposed formalism.

Keywords: Higher-order term rewriting · Constraints · Recursive path ordering.

# 1 Introduction

It is hardly a surprising idea that term rewriting can serve as a vehicle for reasoning about programs. During the last decade or so, the term rewriting community has seen a line of work that translates real-world problems from program analysis into questions about term rewriting systems, which include, for example, termination (see, e.g., [8,10,15,37]) and equivalence (see, e.g., [13,36,9]). Such applications take place across programming paradigms due to the versatile nature of term rewriting, and often materialize into automatable solutions.

Data types are a central building block of programs and must be properly handled in program analysis. While it is rarely a problem for term rewriting systems to represent (co)inductively defned data types, others such as integers and arrays traditionally require encoding; think of neg (suc (suc (suc zero))) encoding −3. This usually turns out to cause more obfuscation than clarifcation to the methods applied and the results obtained. An alternative is to incorporate primitive data types into the formalism, which contributes to the proliferation of subtly diferent formalisms that are generally incompatible with each other, and it is often difcult to transfer techniques between such formalisms.

Logically constrained term rewriting systems (LCTRSs) [27,12] emerged from this proliferation as a unifying formalism seeking to be general in both the selection of primitive data types (little is presumed) and the applicability of varied methods (many are extensible). LCTRSs thus allow us to beneft from the broad term rewriting arsenal in a wide range of scenarios for program analysis (see, e.g., [32,24,23]). In particular, rewriting induction on LCTRSs [12,30] ofers a powerful tool for program verifcation.

As a frst-order formalism, LCTRSs only naturally accommodate imperative programs. This paper aims to generalize this formalism in a higher-order setting.

Motivation. Below is a frst-order LCTRS implementing the factorial function:

$$\textbf{\texttt{fact }}\ n \rightarrow 1 \quad [n \leq 0] \qquad \textbf{\texttt{fact }}\ n \rightarrow n \ast \texttt{fact } (n-1) \quad [n > 0]$$

where n ≤ 0 and n > 0 are logical constraints, which the integer n must satisfy respectively when the corresponding rewrite rule is applied. Suppose we have access to higher-order functions, a defning feature of functional programming; now we have the following alternative implementation of fact:

$$\begin{aligned} \text{fact } n &\rightarrow \text{fold } (\*) \text{ 1 (generalist } n) \\ \text{genlist } n &\rightarrow \text{nil } \quad [n \le 0] \qquad \text{genlist } n \rightarrow \text{cons } n \text{ (genlist } (n-1)) \quad [n > 0] \\ \text{fold } f \text{ } y \text{ nil} &\rightarrow y \qquad \text{fold } f \text{ } y \text{ (cons } x \text{ } l) \rightarrow f \text{ (fold } f \text{ } y \text{ } l) \end{aligned}$$

Here fold takes an argument f, which itself represents a function. Higher-order functions such as fold do not ft into frst-order LCTRSs, which leads to the frst reason to generalize this formalism: to overcome the limitation of its expressivity.

There is another reason for higher-order LCTRSs. The latter implementation of fact refects a pattern of functional programming: the combination of "standard" higher-order building blocks such as fold and map, and functions that are specifc to the problem at hand. We note that a higher-order formalism can reveal more modularity in programs. It would be valuable to exploit such modularity in analyses as well.

With higher-order LCTRSs, we would like to explore automatable solutions to the termination problem of functional programs in the same fashion as the frst-order case [27,25], or even better, to the fnding of their complexity by term rewriting. Moreover, given two programs supposedly implementing the same function, a method that derives whether they are indeed equivalent is also desirable. For example, a proof that the above two implementations of fact are equivalent may serve as a correctness proof of the latter, less intuitive implementation (which in general might be an outcome of code refactoring). Such methods have been explored in a frst-order setting [12,7].

Higher-order LCTRSs will broaden the horizons of both LCTRSs and higherorder term rewriting. The eventual goal is to have a formalism that can be deployed to analyze both imperative and functional programs, so that through this formalism, the abundant techniques based on term rewriting may be applied to automatic program analysis. This paper is a step toward that goal.

Contributions. The presentation begins with our perspective on higher-order term rewriting (without logical constraints) in Section 2. The contributions of this paper follow in subsequent sections:


# 2 Preliminaries

One of the frst problems that a student of higher-order term rewriting faces is the absence of a standard formalism on which the literature agrees. This variety refects the diverse interests and needs held by diferent authors.

In this section, we present simply-typed term rewriting systems (STRSs) [29] as the unconstrained basis of our formalism. This is one of the simplest higherorder formalisms, and closely resembles simple functional programs. We choose this formalism as our starting point because it is already powerful, while avoiding many of the complications that may be interesting for equational reasoning purposes but are not needed in program analysis, such as reduction modulo β.

Types and Terms. Types rule out undesired terms. We consider simple types: given a non-empty set S of sorts (or base types), the set T of simple types over S is generated by the grammar T ::= S | (T → T ). Right-associativity is assigned to → so we can omit some parentheses. The order of a type A, denoted by ord(A), is defned as follows: ord(A) = 0 for A ∈ S and ord(A → B) = max(ord(A) + 1, ord(B)).

Given disjoint sets F and V, whose elements we call function symbols and variables, respectively, the set T of pre-terms over F and V is generated by the grammar T ::= F | V | (T T). Left-associativity is assigned to the juxtaposition operation, called application, so t<sup>0</sup> t<sup>1</sup> t<sup>2</sup> stands for ((t<sup>0</sup> t1) t2), for example.

We assume that every function symbol and variable is assigned a unique type. Typing works as expected: if pre-terms t<sup>0</sup> and t<sup>1</sup> have types A → B and A, respectively, t<sup>0</sup> t<sup>1</sup> has type B. The set of terms over F and V, denoted by T(F, V), is the subset of T consisting of pre-terms with a type. We write t : A if a term t has type A. The set of variables occurring in a term t ∈ T(F, V), denoted by Var(t), is defned as follows: Var(f) = ∅ for f ∈ F, Var(x) = { x } for x ∈ V and Var(t<sup>0</sup> t1) = Var(t0) ∪ Var(t1). A term t is called ground if Var(t) = ∅. The set of ground terms over F is denoted by T(F, ∅).

Substitutions and Contexts. Variables occurring in a term can be seen as placeholders: the occurrences of a variable may be replaced with terms which have the same type as the variable does. Type-preserving mappings from V to T(F, V) are called substitutions. Every substitution σ extends to a type-preserving mapping σ¯ from T(F, V) to T(F, V). We write tσ for σ¯(t) and defne it as follows: fσ = f for f ∈ F, xσ = σ(x) for x ∈ V and (t<sup>0</sup> t1)σ = (t0σ) (t1σ).

Term formation gives rise to the concept of a context: a term containing a hole. Formally, let □ be a special terminal symbol denoting the hole, and the grammar C ::= □ | (C T) | (T C) with the above rule for T generates pre-terms containing exactly one occurrence of the hole. Given a type for the hole, a context is an element of C which is typed as a term is. Let C[]<sup>A</sup> denote a context in which the hole has type A; flling the hole with a term t : A produces the term C[t]<sup>A</sup> defned as follows: □[t]<sup>A</sup> = t, (C0[]<sup>A</sup> t1)[t]<sup>A</sup> = C0[t]<sup>A</sup> t<sup>1</sup> and (t<sup>0</sup> C1[]A)[t]<sup>A</sup> = t<sup>0</sup> C1[t]A. We usually omit types in the above notation, and in C[t], t is understood as a term which has the same type as the hole does.

Rules and Rewriting. Now we have all the ingredients in our recipe for higherorder term rewriting. A rewrite rule ℓ → r is an ordered pair of terms where ℓ and r have the same type, Var(ℓ) ⊇ Var(r) and ℓ assumes the form f t<sup>1</sup> · · ·t<sup>n</sup> for some function symbol f. Formally, a simply-typed term rewriting system (STRS) is a quadruple (S, F, V, R) where every element of F ∪ V is assigned a simple type over S and R ⊆ T(F, V) × T(F, V) is a set of rewrite rules. We usually let R alone stand for the system and keep the details of term formation implicit.

The set R of rewrite rules induces the rewrite relation →<sup>R</sup> ⊆ T(F, V) × T(F, V): t →<sup>R</sup> t ′ if and only if there exist a rewrite rule ℓ → r ∈ R, a substitution σ and a context C[] such that t = C[ℓσ] and t ′ = C[rσ]. When there is no ambiguity about the system in question, we may simply write → for →R.

Given a relation ≻ ⊆ X × X, an element x of X is called terminating with respect to ≻ if there is no infnite sequence x = x<sup>0</sup> ≻ x<sup>1</sup> ≻ · · ·, and ≻ is called well-founded if all the elements of X are terminating with respect to ≻. An STRS R is called terminating if →<sup>R</sup> is well-founded.

Example 1. The following rewrite rules constitute a terminating system:

take zero l → nil take n nil → nil take (suc n) (cons x l) → cons x (take n l)

where zero : nat, suc : nat → nat, nil : natlist, cons : nat → natlist → natlist and take : nat → natlist → natlist are function symbols, and l : natlist, n : nat and x : nat are variables.

Example 2. The following rewrite rule constitutes a non-terminating system:

$$\text{iterate } f \ x \to \text{cons } x \text{ (iterate } f \ (f \ x)).$$

where cons : nat → natlist → natlist and iterate : (nat → nat) → nat → natlist are function symbols, and f : nat → nat and x : nat are variables.

Limitations. The above formalism does not ofer product types, polymorphism or λ-abstractions. What it does ofer is its already expressive syntax enabling us, in a higher-order setting, to generalize LCTRSs and to discover what challenges one may face when extending existing unconstrained techniques. We expect that, once preliminary higher-order results are developed, we will adopt more features from other higher-order formalisms in future extensions.

The exclusion of λ-abstractions does not rid us of frst-class functions, thanks to curried notation. For example, the occurrence of suc in iterate suc zero is partially (in this case, not at all) applied and still forms a term, which can be passed as an argument. Also, a term such as iterate (λx.suc (suc x)) zero can be simulated at the cost of an extra rewrite rule (in this case, add2 x → suc (suc x)). There are also straightforward ways of encoding product types.

Notions of Termination. If we combine the two systems from Examples 1 and 2, the outcome is surely non-terminating: take zero (iterate suc zero) is not terminating, for example. From a Haskell programmer's perspective, however, this term is "terminating" due to the non-strictness of Haskell. In general, every functional language uses a certain evaluation strategy to choose a specifc redex, if any, to rewrite within a term, whereas the rewrite relation we defne in this section corresponds to full rewriting: the redex is chosen non-deterministically.

Furthermore, programmers usually care only about the termination of terms that are reachable from the entry point of a program and seldom consider full termination: the termination of all terms, i.e., the well-foundedness of the rewrite relation. We study full termination with respect to full rewriting in this paper, as it implies any other termination properties and full termination is often a prerequisite for determining properties such as confuence and equivalence.

# 3 Logically Constrained STRSs

Term rewriting systems do not have primitive data types built in; with some function symbols constructing (introducing) values of a certain type and pattern matching rules deconstructing (eliminating) those values, a term rewriting system relies on (co)inductively defned data types. While (co)inductive reasoning is straightforward this way, data types such as integers and arrays require encoding, which can be convoluted; think of the space-consuming unary representation of a number or a binary representation which takes less space but shifts the burden to rewrite rules defning arithmetic, and negative numbers bring up even more complications. Besides, such encoding neglects advances in modern SMT solvers.

In this section, we extend unconstrained STRSs with logical constraints so that data types that are not (co)inductively defned can be represented directly, and analysis tools can take advantage of existing SMT solvers. We follow the ideas of frst-order LCTRSs [27,12]. Specifcally, we will consider systems over arbitrary frst-order theories, i.e., we are not bound to, say, systems over integers, while avoiding higher-order logical constraints. In the unconstrained part of such

a system (outside theories), however, higher-order arguments and results are still completely usable.

#### 3.1 Terms Modulo Theories

Following Section 2, we postulate a set S of sorts, a set F of function symbols and a set V of variables where every element of F ∪ V is assigned a simple type over S. First, we assume that there is a distinguished subset S<sup>ϑ</sup> of S, called the set of theory sorts. The grammar T<sup>ϑ</sup> ::= S<sup>ϑ</sup> | (S<sup>ϑ</sup> → Tϑ) generates the set T<sup>ϑ</sup> of theory types over Sϑ. Note that the order of a theory type is never greater than one. Next, we assume that there is a distinguished subset F<sup>ϑ</sup> of F, called the set of theory symbols, and that the type of every theory symbol is in Tϑ, which means that the type of any argument passed to a theory symbol is a theory sort. Elements of T(Fϑ, V) are called theory terms. Last, for technical reasons, we assume that there are infnitely many variables of each type.

Theory symbols are interpreted in an underlying theory: given an Sϑ-indexed family of sets (XA)A∈S<sup>ϑ</sup> , we extend it to a Tϑ-indexed family by letting XA→<sup>B</sup> be the set of mappings from X<sup>A</sup> to XB; an interpretation of theory symbols is a Tϑ-indexed family of mappings ([[·]]A)A∈T<sup>ϑ</sup> where [[·]]<sup>A</sup> assigns to each theory symbol of type A an element of X<sup>A</sup> and is bijective<sup>1</sup> if A ∈ Sϑ. Theory symbols whose type is a theory sort are called values. Given an interpretation of theory symbols ([[·]]A)A∈T<sup>ϑ</sup> , we extend each indexed mapping [[·]]<sup>B</sup> to one that assigns to each ground theory term of type B an element of X<sup>B</sup> by letting [[t<sup>0</sup> t1]]<sup>B</sup> be [[t0]]A→B([[t1]]A). We usually write just [[·]] when the type can be deduced.

Example 3. Let S<sup>ϑ</sup> be { int }. Then int → int → int is a theory type over S<sup>ϑ</sup> while (int → int) → int is not. Let F<sup>ϑ</sup> be { sub }∪{ n¯ | n ∈ Z } where sub : int → int → int and n¯ : int. The values are the elements of { n¯ | n ∈ Z }. Let Xint be Z, [[·]]int be the mapping n¯ 7→ n and [[sub]] be the mapping λm. λn. m − n. The interpretation of sub ¯1 is the mapping λn. 1 − n.

We are not limited to the theory of integers:

Example 4. To reason about integer arrays, we could either represent them as lists and simulate random access through more costly list traversal (which afects the complexity), or consider a theory of bounded arrays as follows: Let S<sup>ϑ</sup> be { int, intarray } and F<sup>ϑ</sup> be the union of { size,select,store }, { n¯ | n ∈ Z } and { ⟨n¯0, . . . , n¯k−1⟩ | k ∈ N and ∀i. n<sup>i</sup> ∈ Z } where size : intarray → int, select : intarray → int → int, store : intarray → int → int → intarray, n¯ : int and ⟨n¯0, . . . , n¯k−1⟩ : intarray. Let Xint and Xintarray be Z and Z ∗ , respectively. Let [[·]]int be the mapping n¯ 7→ n and [[·]]intarray be the mapping ⟨n¯0, . . . , n¯k−1⟩ 7→ n<sup>0</sup> . . . nk−1. Let [[size]](n<sup>0</sup> . . . nk−1) be k. Let [[select]](n<sup>0</sup> . . . nk−1, i) be n<sup>i</sup> if 0 ≤ i < k, and 0 otherwise. Let [[store]](n<sup>0</sup> . . . nk−1, i, m) be n<sup>0</sup> . . . ni−1mni+1 . . . nk−<sup>1</sup> if 0 ≤ i < k, and n<sup>0</sup> . . . nk−<sup>1</sup> otherwise. Note that the values include theory symbols ⟨n¯0, . . . , n¯k−1⟩ : intarray as well as n¯ : int.

<sup>1</sup> The bijectivity is assumed so that values (see below) are isomorphic to (and therefore a representation of) elements of (XA)A∈S<sup>ϑ</sup> .

In this paper, we largely consider the theory of integers in Example 3 when giving examples because it is easy to understand. This particular theory does not play a special role for the formalism we will shortly present; in fact, the theory of bit vectors may be more appropriate to real-world programs using integers, and our formalism is not biased toward any choice of theories. In particular, we do not have to choose predefned theories from SMT-LIB [3]. The theory of bounded arrays in Example 4 is an instance of such a "non-standard" theory (which can nevertheless be encoded within the theory of functional arrays). On the other hand, theories supported by SMT solvers are preferable in light of automation.

#### 3.2 Constrained Rewriting

Constrained rewriting requires the theory sort bool: we henceforth assume that bool ∈ Sϑ, { f, t } ⊆ Fϑ, Xbool = { 0, 1 }, [[f]]bool = 0 and [[t]]bool = 1. A logical constraint is a theory term φ such that φ has type bool and the type of each variable in Var(φ) is a theory sort. A (constrained) rewrite rule is a triple ℓ → r [φ] where ℓ and r are terms which have the same type, φ is a logical constraint, the type of each variable in Var(r) \ Var(ℓ) is a theory sort and ℓ is a term that assumes the form f t<sup>1</sup> · · ·t<sup>n</sup> for some function symbol f and contains at least one function symbol in F \ Fϑ. 2

This defnition can be obscure at frst glance, especially when compared with its unconstrained counterpart in Section 2: variables which do not occur in ℓ are allowed to occur in r, not to mention the logical constraint φ as a brand-new component. Given a rewrite rule ℓ → r [φ], the idea is that variables occurring in φ are to be instantiated to values which make φ true and other variables which occur in r but not in ℓ are to be instantiated to arbitrary values—note that the type of each of these variables is a theory sort. Formally, given an interpretation of theory symbols [[·]], a substitution σ is said to respect a rewrite rule ℓ → r [φ] if σ(x) is a value for all x ∈ Var(φ) ∪ (Var(r) \ Var(ℓ)) and [[φσ]] = 1.

We summarize all the above ingredients in the following defnition:

Defnition 1. A logically constrained STRS (LCSTRS) consists of S, Sϑ, F, Fϑ, V, (XA), [[·]] and R where


<sup>2</sup> We do not require f to be in F \ F<sup>ϑ</sup> (that is, f can be a theory symbol) because a theory symbol may occur at the head position of a rewrite rule's left-hand side in rewriting induction, and this general defnition is in line with frst-order LCTRSs.


We usually let R alone stand for the system.

And the following defnition concludes the elaboration of constrained rewriting:

Defnition 2. Given an LCSTRS R, the set of rewrite rules induces the rewrite relation →<sup>R</sup> ⊆ T(F, V) × T(F, V) such that t →<sup>R</sup> t ′ if and only if one of the following conditions is true:


Note that the above conditions are mutually exclusive for any given context C[]: f v<sup>1</sup> · · · v<sup>n</sup> is a theory term, whereas ℓ in any rewrite rule ℓ → r [φ] is not. If t →<sup>R</sup> t ′ due to the second condition, we also write t →<sup>κ</sup> t ′ and call it a step of calculation. When no ambiguity arises, we may simply write → for →R.

Example 5. We can rework Example 1 into an LCSTRS:

take n l → nil [n ≤ 0] take n nil → nil take n (cons x l) → cons x (take (n − 1) l) [n > 0]

where S = S<sup>ϑ</sup> ∪ { intlist }, S<sup>ϑ</sup> = { bool, int }, F = F<sup>ϑ</sup> ∪ { nil, cons,take }, F<sup>ϑ</sup> = { ≤, >, −, f, t } ∪ Z, V ⊇ { l, n, x }, ≤ : int → int → bool, > : int → int → bool, − : int → int → int, f : bool, t : bool, v : int for all v ∈ Z, nil : intlist, cons : int → intlist → intlist, take : int → intlist → intlist, l : intlist, n : int and x : int.

Here and henceforth we let integer literals and operators, e.g., 0, 1, ≤, > and −, denote both the corresponding theory symbols and their respective images under the interpretation—in contrast to Examples 3 and 4, where we pedantically make a distinction between, say, ¯1 and 1. We also use infx notation for some binary operators to improve readability, and omit the logical constraint of a rewrite rule when it is t. Below is a rewrite sequence:

$$\begin{aligned} \mathtt{take 1 (cons } x \text{ (cons } y \text{ } l)) &\to \mathtt{cons } x \text{ (take } (1 - 1) \text{ (cons } y \text{ } l)) \\ \to\_{\kappa} \mathtt{cons } x \text{ (take } 0 \text{ (cons } y \text{ } l)) &\to \mathtt{cons } x \text{ ni} \end{aligned}$$

Example 6. In Section 1, the rewrite rules implementing the factorial function by fold constitute an LCSTRS. Below is a rewrite sequence:

$$\begin{aligned} \mathsf{fact} \ 1 &\rightarrow \mathsf{fold} \ (\*) \ 1 \ (\mathsf{genlist} \ 1) \rightarrow \mathsf{fold} \ (\*) \ 1 \ (\mathsf{cons} \ 1 \ (\mathsf{genlist} \ (1 - 1))) \\ &\rightarrow\_{\kappa} \mathsf{fold} \ (\*) \ 1 \ (\mathsf{cons} \ 1 \ (\mathsf{genlist} \ 0)) \rightarrow \mathsf{fold} \ (\*) \ 1 \ (\mathsf{cons} \ 1 \ \mathsf{nil}) \\ &\rightarrow \ (\*) \ 1 \ (\mathsf{fold} \ (\*) \ 1 \ \mathsf{nil}) \rightarrow \mathsf{(\*)} \ 1 \ 1 \rightarrow\_{\kappa} \mathsf{1} \end{aligned}$$

Example 7. Consider the rewrite rule readint → n, in which the variable n : int occurs on the right-hand side of → but not on the left. Unconstrained STRSs do not permit such a rewrite rule, but LCSTRSs do. It looks as if we might rewrite readint to a variable but it is not the case: all the substitutions which respect this rewrite rule must map n to a value. Indeed, readint is always rewritten to a value of type int. We may have, say, readint → 42. Such variables can be used to model user input.

Example 8. Getting input by means of the rewrite rule from Example 7 has one faw: in case of multiple integers to be read, the order of reading each is non-deterministic. Even in the presence of an evaluation strategy, the order may not be the desired one. We can use continuation-passing style to choose an order:

```
readint k → k n comp g f x → g (f x) sub → readint (comp readint (−))
```
where comp : ((int → int) → int) → (int → int → int) → int → int. If the frst and the second integers to be read were 1 and 2, respectively, the following rewrite sequence would be the only one starting from sub:

$$\begin{aligned} \mathsf{sub} & \quad \mathsf{add} \rightarrow \mathsf{readint} \ (\mathsf{comp} \ \mathsf{readint} \ (-)) \rightarrow \mathsf{comp} \ \mathsf{readint} \ (-) \ 1 \\ & \rightarrow \mathsf{readint} \ ((-) \ 1) \rightarrow (-) \ 1 \ 2 \rightarrow\_{\kappa} - 1 \end{aligned}$$

Since there is no way to specify the actual input within an LCSTRS, rewrite sequences such as the one above cannot be derived deterministically. Nevertheless, this example demonstrates that the newly proposed formalism can represent relatively sophisticated control mechanisms utilized by functional programs.

Remarks. We refect on some of the concepts presented in this section:


$$\begin{array}{llll} \text{filter } f \ (\mathsf{cons} \ x \ l ) \rightarrow \mathsf{cons} \ x \ (\mathsf{filter } f \ l ) & \quad [f \ x] & \quad \mathsf{filter } f \ \mathsf{nil} \rightarrow \mathsf{nil} \\\ \mathsf{filter } f \ (\mathsf{cons} \ x \ l ) \rightarrow \mathsf{filter } f \ l & \quad [\neg \ (f \ x)] \end{array}$$

The flter function can actually be implemented in an LCSTRS as follows:

flter f (cons x l) → if (f x) (cons x (flter f l)) (flter f l) flter f nil → nil if t l l′ → l if f l l′ → l ′

#### 340 L. Guo, C. Kop

In the former implementation, the problem is not the higher-order variable f itself but its occurrence in logical constraints. In this case, because the flter function is usually meant to be used in combination with "user-defned" predicates—which are function symbols defned by rewrite rules and therefore do not belong to the theories—it makes sense to disallow f from occurring in logical constraints. In general, we may encounter use cases for higher-order constraints; until then, we focus on frst-order constraints, which are very common in functional programs.

# 4 Constrained Higher-Order Recursive Path Ordering

Recall that an important part of our goal is to allow the abundant term rewriting techniques to be applied toward program analysis. We have defned a formalism for constrained higher-order term rewriting; now it remains to be seen that—or how—existing techniques can be extended to it.

In the rest of this paper, we consider termination, an important aspect of program analysis and a topic that has been studied by the term rewriting community for decades. Not only is termination itself critical to the correctness of certain programs, but it also facilitates other analyses by admitting well-founded induction on terms.

In this section, we adapt HORPO [21] to our formalism. This is one of the oldest, yet still efective techniques for higher-order termination. HORPO can be used either as a stand-alone method or in a higher-order version of the dependency pair framework [1,39,11,25]. Hence, this adaptation ofers a solid basis for use in an analysis tool's termination module. We will discuss the use of HORPO within the dependency pair framework in Section 5, and its automation in Section 6.

Constrained RPO for frst-order LCTRSs was proposed in [27]. We take inspiration from it for its approach to theory terms, formalize the ideas, and add support for (higher) types as well as partial application.

### 4.1 HORPO, Unconstrained and Uncurried

We frst recall HORPO in its original form. Note that the original defnition is based on an unconstrained and uncurried format, and a thorough discussion on it is beyond the scope of this paper. The following presentation is mostly informal and only serves the purposes of comparison and inspiration.

We begin with two standard defnitions:

Defnition 3. Given relations ≿ and ≻ over X, the generalized lexicographic ordering ≻<sup>l</sup> ⊆ X<sup>∗</sup> × X<sup>∗</sup> is induced as follows: x<sup>1</sup> . . . x<sup>m</sup> ≻<sup>l</sup> y<sup>1</sup> . . . y<sup>n</sup> if and only if there exists k ≤ min(m, n) such that x<sup>i</sup> ≿ y<sup>i</sup> for all i < k and x<sup>k</sup> ≻ yk.

Defnition 4. Given relations ≿ and ≻ over X, the generalized multiset ordering ≻<sup>m</sup> ⊆ X<sup>∗</sup> × X<sup>∗</sup> is induced as follows: x<sup>1</sup> . . . x<sup>m</sup> ≻<sup>m</sup> y<sup>1</sup> . . . y<sup>n</sup> if and only if there exist a non-empty subset I of { 1, . . . , m } and a mapping π from { 1, . . . , n } to { 1, . . . , m } such that

$$\begin{array}{l} \text{1. } \forall i \in I. \forall j \in \pi^{-1}(i). x\_i \succ y\_j, \\ \text{2. } \forall i \in \{1, \ldots, m\} \{I. \forall j \in \pi^{-1}(i). x\_i \succ y\_j, \text{ and} \\ \text{3. } \forall i \in \{1, \ldots, m\} \{I. \} \{\pi^{-1}(i)\} = 1. \end{array}$$

Here the generalized multiset ordering is formulated in terms of lists because we will compare argument lists by this ordering and this formulation facilitates implementation. In the following defnition of HORPO, when we refer to the above defnitions, ≿ is the equality over terms and ≻ is HORPO itself.

Roughly, HORPO extends a given ordering over function symbols, and when considering terms headed by the same function symbol, compares the arguments by either of the above orderings. Given a well-founded ordering ▶ ⊆ F × F, called the precedence, and a mapping s : F → { l, m }, called the status, HORPO is a type-preserving relation ≻ such that s ≻ t if and only if one of the following conditions is true:


$$\text{6. } s = \text{@}(s\_0, s\_1), \text{ } t = \text{@}(t\_0, t\_1) \text{ and } s\_0s\_1 \succ^{\mathfrak{m}} t\_0t\_1.$$

7. s = λx. s0, t = λx. t<sup>0</sup> and s<sup>0</sup> ≻ t0.

Here ⪰ denotes the refexive closure of ≻.

We call this format uncurried because every function symbol has an arity, i.e., the number of arguments guaranteed for each occurrence of the function symbol in a term. This is indicated by the functional notation f(s1, . . . , sm) as opposed to f s<sup>1</sup> · · · sm. If f has arity m, its occurrence in a term must take m arguments so f(s1, . . . , sm−1) is not a well-formed term, for example. A function symbol's type (or more technically, its type declaration) can permit more arguments than its arity guarantees. Such an extra argument is supplied through the syntactic form @(·, ·). For example, if the same function symbol f is given an extra argument sm+1, we write @(f(s1, . . . , sm), sm+1). This syntactic form is also used to pass arguments to variables and λ-abstractions.

The diference between an uncurried and a curried format is more than a notational issue, and poses technical challenges to our extension of HORPO. Another source of challenges is, as one would expect, constrained rewriting.

#### 4.2 Rule Removal

HORPO is defned as a reduction ordering ≻, which is a type-preserving, stable (i.e., t ≻ t ′ implies tσ ≻ t ′σ), monotonic (i.e., t ≻ t ′ implies C[t] ≻ C[t ′ ]) and well-founded relation. Note that despite its name, HORPO is not necessarily transitive. If such a relation orients all the rewrite rules in R (i.e., ℓ ≻ r for all ℓ → r ∈ R), we can conclude that the rewrite relation →<sup>R</sup> is well-founded.

A similar strategy for LCSTRSs requires a few tweaks. First, stability should be tightly coupled with rule orientation because every rewrite rule now is equipped with a logical constraint, which decides what substitutions are expected when the rewrite rule is applied. Second, the monotonicity requirement can be weakened because ℓ is never a theory term in a rewrite rule ℓ → r [φ]. We defne as follows:

Defnition 5. A type-preserving relation ⇒ ⊆ T(F, V) × T(F, V) is said


Besides having rewrite rules stably oriented, we need to deal with calculation. It turns out to be unnecessary to search for a well-founded relation which includes →κ, given the following observation:

Lemma 1. →<sup>κ</sup> is well-founded.

Proof. The term size strictly decreases through every step of calculation. ⊓⊔

We rather look for a type-preserving and well-founded relation ≻ which stably orients every rewrite rule, is rule-monotonic, and is compatible with →κ, i.e., →<sup>κ</sup> ; ≻ ⊆ ≻<sup>+</sup> or ≻ ; →<sup>κ</sup> ⊆ ≻+. This strategy is an instance of rule removal:

Theorem 1. Given an LCSTRS R, the rewrite relation →<sup>R</sup> is well-founded if and only if there exist sets R<sup>1</sup> and R<sup>2</sup> such that →<sup>R</sup><sup>1</sup> is well-founded and R<sup>1</sup> ∪ R<sup>2</sup> = R, and type-preserving, rule-monotonic relations ⇒ and ≻ such that


Here →<sup>R</sup><sup>1</sup> assumes the same term formation and interpretation as →<sup>R</sup> does.

Proof. If →<sup>R</sup> is well-founded, take R<sup>1</sup> = ∅, R<sup>2</sup> = R, ⇒ = →<sup>κ</sup> and ≻ = →R. Note that →<sup>∅</sup> = →<sup>κ</sup> by defnition.

Now assume given R1, R2, ⇒ and ≻. Since ⇒ is rule-monotonic, includes →<sup>κ</sup> and stably orients every rewrite rule in R1, →<sup>R</sup><sup>1</sup> ⊆ ⇒. So the compatibility of ≻ with ⇒ implies its compatibility with →<sup>R</sup><sup>1</sup> , which in turn implies the well-foundedness of →<sup>R</sup><sup>1</sup> ∪ ≻, given that both →<sup>R</sup><sup>1</sup> and ≻ are well-founded. Since R<sup>1</sup> ∪ R<sup>2</sup> = R and ≻ is a rule-monotonic relation which stably orients every rewrite rule in R2, →<sup>R</sup> ⊆ →<sup>R</sup><sup>1</sup> ∪ ≻. Hence, →<sup>R</sup> is well-founded. ⊓⊔

In a termination proof of R, Theorem 1 allows us to remove rewrite rules that are in R<sup>2</sup> from R. If none of the rewrite rules are left after iterations of rule removal, the termination of the original system can be concluded with Lemma 1.

#### 4.3 Constrained HORPO for LCSTRSs

Before adapting HORPO for LCSTRSs, we discuss how the theories may be handled. Let us consider the following system:

$$\text{\textbf{\*\*\textbf{rec}}\ n\ x\ f \to\ x\quad [n\le 0]}\qquad \textbf{\textbf{rec}}\ n\ x\ f \to\ f\ (n-1)\ (\textbf{rec}\ (n-1)\ x\ f)\quad [n>0]$$

where rec : int → int → (int → int → int) → int. In the second rewrite rule, the left-hand side of → is rec n x f while the right-hand side has a subterm rec (n − 1) x f. It is natural to expect n ≻ n − 1 in the construction of HORPO. Note that this is impossible with respect to any recursive path ordering for unconstrained rewriting because n is a variable occurring in n − 1; in an unconstrained setting, we actually have n − 1 ≻ n. Hence, we must somehow take the logical constraint n > 0 into account. To this end, we largely follow the ideas of constrained RPO for frst-order LCTRSs [27].

The occurrence of n in the logical constraint ensures that n is instantiated to a value, say 42, when the rewrite rule is applied, and it is sensible to have 42 ≻ 42 − 1. Also, n > 0 guarantees that all the sequences of such descents are fnite, i.e., the ordering λm. λn. m > 0 ∧ m > n, denoted by ⊐, is well-founded. Let φ |= φ ′ denote, on the assumption that φ and φ ′ are logical constraints such that Var(φ) ⊇ Var(φ ′ ), that [[φσ]] = 1 implies [[φ ′σ]] = 1 for each substitution σ which maps variables in Var(φ) to values. Then we have n > 0 |= n ⊐ n − 1. We thus would like to have s ≻ t if φ |= s ⊐ t.

However, with the same ordering ⊐, we have both m > 0 ∧ m > n |= m ⊐ n and n > 0 ∧ n > m |= n ⊐ m, whereas we cannot have both m ≻ n and n ≻ m without breaking the well-foundedness of ≻. To resolve this issue, we split ≻ into a family of relations (≻φ) indexed by logical constraints, and let s ≻<sup>φ</sup> t be true if φ |= s ⊐ t. We also introduce a separate family of relations (≿φ) such that s ≿<sup>φ</sup> t if φ |= s ⊒ t where ⊒ is the refexive closure of ⊐. Hence, ≿<sup>φ</sup> is not necessarily the refexive closure of ≻φ; if it was, even n ≿n≥<sup>1</sup> 1 would not be obtainable.

Now we have a family of pairs (≿φ, ≻φ), which does not seem to suit rule removal; after all, the essential requirement is a fxed relation which is typepreserving, rule-monotonic, well-founded and at least compatible with →κ. When the defnition of constrained HORPO is fully presented, we will show that ≻t—the irrefexive relation indexed by the boolean t—is such a relation and stably orients a rewrite rule ℓ → r [φ] if ℓ ≻<sup>φ</sup> r.

The annotation φ of HORPO does not capture variables in Var(r) \ Var(ℓ), which also have a part to play in the decision of what substitutions are expected when ℓ → r [φ] is applied. We may use a new annotation to accommodate these variables but there is a hack (also present in [38]): given a variable in Var(r) \ Var(ℓ), it can be harmlessly appended to φ, syntactically and without tampering with any interpretation. We henceforth assume that Var(r) \ Var(ℓ) ⊆ Var(φ) for each rewrite rule ℓ → r [φ]. We also say that a substitution σ respects a logical constraint φ if σ(x) is a value for all x ∈ Var(φ) and [[φσ]] = 1.

Before presenting constrained HORPO, we recall that in [21] all sorts collapse into one, and for example, int → int → int and int → intlist → intlist are considered equal. The idea is that the original rewrite relation can be embedded in the single-sorted one, and if the latter is well-founded, so is the former. We follow this convention and henceforth compare types by their →-structure only.

Below ≻<sup>l</sup> <sup>φ</sup> and ≻<sup>m</sup> <sup>φ</sup> are induced by ≿<sup>φ</sup> and ≻φ:

### Defnition 6. Constrained HORPO depends on the following parameters:


The higher-order recursive path ordering (HORPO) is a family of pairs of typepreserving relations (≿φ, ≻φ) indexed by logical constraints and defned by the following conditions:

	- (a) s and t are theory terms whose type is a sort, Var(s) ∪ Var(t) ⊆ Var(φ) and φ |= s ⊒ t.
	- (b) s ≻<sup>φ</sup> t.
	- (c) s ↓<sup>κ</sup> t.
	- (d) s is not a theory term, s = s<sup>0</sup> s1, t = t<sup>0</sup> t1, s<sup>0</sup> ≿<sup>φ</sup> t<sup>0</sup> and s<sup>1</sup> ≿<sup>φ</sup> t1.
	- (a) s and t are theory terms whose type is a sort, Var(s) ∪ Var(t) ⊆ Var(φ) and φ |= s ⊐ t.
	- (b) s and t have equal types and s ▷<sup>φ</sup> t.
	- (c) s is not a theory term, s = f s<sup>1</sup> · · · s<sup>n</sup> for some f ∈ F, t = f t<sup>1</sup> · · ·tn, ∀i. s<sup>i</sup> ≿<sup>φ</sup> t<sup>i</sup> and ∃k. s<sup>k</sup> ≻<sup>φ</sup> tk.
	- (d) s is not a theory term, s = x s<sup>1</sup> · · · s<sup>n</sup> for some x ∈ V, t = x t<sup>1</sup> · · ·tn, ∀i. s<sup>i</sup> ≿<sup>φ</sup> t<sup>i</sup> and ∃k. s<sup>k</sup> ≻<sup>φ</sup> tk.

$$\begin{aligned} (a) \; &\exists k. s\_k \; \widecheck{\lesssim}\_{\varphi} t. \\ (b) \; &t = t\_0 \; t\_1 \; \widecheck{\;} \boldsymbol{t} \; \boldsymbol{s} \; \boldsymbol{t}\_n, \forall i. s \; \not\models\_{\varphi} t\_i. \end{aligned}$$


Here s ↓<sup>κ</sup> t if and only if there exists a term r such that s →<sup>∗</sup> κ r and t →<sup>∗</sup> κ r. Comparison to the Original HORPO. Conditions 1d, 2c and 2d are included in the defnition so that ≿<sup>φ</sup> and ≻<sup>φ</sup> are rule-monotonic. We stress that it is mandatory to use the weakened, rule-monotonicity requirement instead of the traditional monotonicity requirement: if ≻<sup>t</sup> is monotonic, 1 ≻<sup>t</sup> 0 implies 1 − 1 ≻<sup>t</sup> 1 − 0, but t |= (1 − 0) ⊐ (1 − 1), i.e., ≻<sup>t</sup> cannot possibly be well-founded.

From curried notation, another issue related to rule-monotonicity arises, which leads to the above defnition of ▷φ. If we had the original HORPO naively mirrored, the defnition of ≻<sup>φ</sup> would include a condition which corresponds to condition 3b and reads: "s ≻<sup>φ</sup> t if s is not a theory term, s = f s<sup>1</sup> · · · s<sup>m</sup> for some f ∈ F, t = t<sup>0</sup> t<sup>1</sup> · · ·t<sup>n</sup> and ∀i. s ≻<sup>φ</sup> t<sup>i</sup> ∨ ∃k. s<sup>k</sup> ≿<sup>φ</sup> ti". Assume given such terms s and t, and that, say, s ≻<sup>φ</sup> t1. Now if there is a term r to which s can be applied, we have a problem with proving s r ≻<sup>φ</sup> t r = t<sup>0</sup> t<sup>1</sup> · · ·t<sup>n</sup> r because s r ≻<sup>φ</sup> t<sup>1</sup> is not obtainable due to the type restriction. Note that ≿<sup>φ</sup> and ≻<sup>φ</sup> are by defnition type-preserving, whereas ▷<sup>φ</sup> is not.

This limitation is overcome by means of ▷φ, which actually makes the overall defnition more powerful, and is reminiscent of the distinction between ≻ and ≻<sup>T</sup><sup>S</sup> in later versions of HORPO (e.g., [5]). Other extensions from these works, however, are not yet included in the above defnition, and except for the type restriction and uncurried notation, the conditions of ▷<sup>φ</sup> largely match those of the original HORPO.

Another subtle diference is the use of generalized lexicographic and multiset orderings: in the original HORPO, ≿ is the refexive closure of ≻, and therefore it sufces to use the more traditional defnitions of lexicographic and multiset orderings. Here, as observed above, this would be unnecessarily restrictive.

The split of a single multiset status label in m2, m3, . . . is due to curried notation—in particular, the possibility of partial application. If we had only a single multiset status label, which would admit, for example, both f 2 2 ▷<sup>t</sup> f 1 and f 1 3 ≻<sup>t</sup> f 2 2, it would be possible that ≻<sup>t</sup> is not well-founded: note that g (f 1) ≻<sup>t</sup> f 1 3 due to, among others, conditions 2b and 3b, and if f ▶ g, we would then have f 2 2 ≻<sup>t</sup> g (f 1) due to, among others, conditions 2b and 3c. This change adds some power to constrained HORPO: we can prove, for example, the termination of the single-rule system f x a y → f b x (g y) by choosing s(f) = m2, which is not possible if all arguments must be considered, as the original HORPO requires. We do not need m<sup>1</sup> because this case is already covered by choosing l.

Given an LCSTRS R, if we can divide the set of rules into two subsets R<sup>1</sup> and R2, and fnd a combination of [[⊐]], ▶ and s that guarantees ℓ ≿<sup>φ</sup> r for all ℓ → r [φ] ∈ R<sup>1</sup> and ℓ ≻<sup>φ</sup> r for all ℓ → r [φ] ∈ R2, the termination of R is reduced to that of R1. Before proving the soundness, we check out some examples:

Example 9. We continue the analysis of the motivating example rec. Let [[⊐int]] be λm. λn. m > 0∧ m > n as above. There is only one function symbol in F \ Fϑ, and it turns out that ▶ can be any precedence. Let s be a mapping such that s(rec) = l. The frst rewrite rule can be removed due to conditions 2b and 3a. The second rewrite rule can be removed as follows:

$$1.\ \mathsf{rec}\ n\ x\ f\succ\_{n\geq0} f\ (n-1)\ (\mathsf{rec}\ (n-1)\ x\ f)\ \text{by 2b},\ 2.)$$

346 L. Guo, C. Kop

2. rec n x f ▷n><sup>0</sup> f (n − 1) (rec (n − 1) x f) by 3b, 3, 4, 5. 3. rec n x f ▷n><sup>0</sup> f by 3a, 6. 4. rec n x f ▷n><sup>0</sup> n − 1 by 3a, 7. 5. rec n x f ▷n><sup>0</sup> rec (n − 1) x f by 3d, 8, 4, 9, 3. 6. f ≿n><sup>0</sup> f by 1c. 7. n ≿n><sup>0</sup> n − 1 by 1a. 8. n ≻n><sup>0</sup> n − 1 by 2a. 9. rec n x f ▷n><sup>0</sup> x by 3a, 10. 10. x ≿n><sup>0</sup> x by 1c.

Example 10. Consider Example 5. Let [[⊐int]] be λm. λn. m > 0 ∧ m > n. Let ▶ be a precedence such that take ▶ nil and take ▶ cons. Let s be a mapping such that s(take) = l. Then we can remove all of the rewrite rules. Note that to establish take n (cons x l) ≻n><sup>0</sup> cons x (take (n − 1) l), we need cons x l ≿n><sup>0</sup> x, which is obtainable because intlist is not distinguished from int.

### 4.4 Properties of Constrained HORPO

The soundness of constrained HORPO as a technique for rule removal relies on the following properties, which we now prove.

Rule Orientation. The goal consists of two parts: ≿<sup>t</sup> stably orients a rewrite rule ℓ → r [φ] if ℓ ≿<sup>φ</sup> r, and ≻<sup>t</sup> stably orients a rewrite rule ℓ → r [φ] if ℓ ≻<sup>φ</sup> r. The core of the argument is to prove the following lemma:

Lemma 2. Given logical constraints φ and φ ′ such that Var(φ) ⊇ Var(φ ′ ) and φ |= φ ′ , t |= φ ′σ holds for each substitution σ which respects φ.

Proof. It follows from φ |= φ ′ that [[φ ′σ]] = 1. Note that Var(φ ′σ) = ∅, and therefore φ ′σσ′ = φ ′σ for all σ ′ . Hence, t |= φ ′σ. ⊓⊔

And the rest is routine:

Theorem 2. Given a logical constraint φ, terms s and t, the following statements are true for each substitution σ which respects φ:

1. s ≿<sup>φ</sup> t implies sσ ≿<sup>t</sup> tσ. 2. s ≻<sup>φ</sup> t implies sσ ≻<sup>t</sup> tσ. 3. s ▷<sup>φ</sup> t implies sσ ▷<sup>t</sup> tσ.

Proof. By mutual induction on the derivation. Note that →<sup>κ</sup> is stable. ⊓⊔

Rule-Monotonicity. Both ≿<sup>φ</sup> and ≻<sup>φ</sup> are rule-monotonic for all φ. The former is trivial to prove, and the key to proving the latter is the following lemma:

Lemma 3. f s<sup>1</sup> · · · s<sup>m</sup> r ▷<sup>φ</sup> t if f s<sup>1</sup> · · · s<sup>m</sup> ▷<sup>φ</sup> t.

Proof. By induction on the derivation. ⊓⊔

Now we can prove the rule-monotonicity:

Theorem 3. ≻<sup>φ</sup> is rule-monotonic.

Proof. By induction on the context C[]. Essentially, we ought to prove that given terms s and t which have equal types, if s is not a theory term and s ≻<sup>φ</sup> t, s r ≻<sup>φ</sup> t r for all r, and r s ≻<sup>φ</sup> r t for all r. We prove the former by case analysis on the derivation of s ≻<sup>φ</sup> t, and prove the latter by case analysis on r: r = f r<sup>1</sup> · · · r<sup>n</sup> for some f ∈ F or r = x r<sup>1</sup> · · · r<sup>n</sup> for some x ∈ V. ⊓⊔

Compatibility. The strict relation ≻<sup>t</sup> is compatible with its non-strict counterpart ≿<sup>t</sup> ; we prove that ≿<sup>t</sup> ; ≻<sup>t</sup> ⊆ ≻<sup>t</sup> ∪ (≻<sup>t</sup> ; ≻t), given the following observation:

Theorem 4. ≿<sup>t</sup> = ≻<sup>t</sup> ∪ ↓κ.

Proof. By defnition, ≿<sup>t</sup> ⊇ ≻<sup>t</sup> ∪ ↓κ. We prove ≿<sup>t</sup> ⊆ ≻<sup>t</sup> ∪ ↓<sup>κ</sup> by induction on the derivation of s ≿<sup>t</sup> t. Only two cases are non-trivial. If s and t are ground theory terms whose type is a sort and [[s ⊒ t]] = 1, we have either [[s ⊐ t]] = 1 or [[s]] = [[t]], and the former implies s ≻<sup>t</sup> t while the latter implies s ↓<sup>κ</sup> t. On the other hand, if s is not a theory term, s = s<sup>0</sup> s1, t = t<sup>0</sup> t1, s<sup>0</sup> ≿<sup>t</sup> t<sup>0</sup> and s<sup>1</sup> ≿<sup>t</sup> t1, by induction, if s<sup>0</sup> ≻<sup>t</sup> t<sup>0</sup> or s<sup>1</sup> ≻<sup>t</sup> t1, we can prove s ≻<sup>t</sup> t in the same manner as we prove the rule-monotonicity of ≻<sup>t</sup> , or s<sup>0</sup> ↓<sup>κ</sup> t<sup>0</sup> and s<sup>1</sup> ↓<sup>κ</sup> t1, then s ↓<sup>κ</sup> t. ⊓⊔

Theorem 4 plays an important role in the well-foundedness proof of ≻<sup>t</sup> as well.

For the compatibility of ≻<sup>t</sup> with ≿<sup>t</sup> , it remains to prove that ↓<sup>κ</sup> ; ≻<sup>t</sup> ⊆ ≻<sup>t</sup> , which is implied by the following lemma:

Lemma 4. Given terms s and s ′ such that s →<sup>κ</sup> s ′ , the following statements are true for all t:

1. s ≿<sup>t</sup> t if and only if s ′ ≿<sup>t</sup> t. 2. s ≻<sup>t</sup> t if and only if s ′ ≻<sup>t</sup> t. 3. s ▷<sup>t</sup> t if and only if s ′ ▷<sup>t</sup> t.

Proof. By mutual induction on the derivation for "if" and "only if" separately. Note that →<sup>κ</sup> is confuent. ⊓⊔

The compatibility follows as a corollary:

Corollary 1. ≿<sup>t</sup> ; ≻<sup>t</sup> ⊆ ≻<sup>t</sup> ∪ (≻<sup>t</sup> ; ≻t).

Well-Foundedness. Following [21], we base the well-foundedness proof of ≻<sup>t</sup> on the predicate of computability [40,17]. There are, however, two major diferences, which pose new technical challenges: ≿<sup>t</sup> is no more the refexive closure of ≻<sup>t</sup> and curried notation instead of uncurried notation is in use.

In Defnition 6, ≻<sup>l</sup> <sup>φ</sup> and ≻<sup>m</sup> <sup>φ</sup> are induced by ≿<sup>φ</sup> and ≻φ. We need certain properties of ≻<sup>l</sup> <sup>t</sup> and ≻<sup>m</sup> t to prove that ≻<sup>t</sup> is well-founded. Because ≿<sup>t</sup> is neither the equality over terms nor the refexive closure of ≻<sup>t</sup> , those properties are less standard and deserve inspection. The property of ≻<sup>l</sup> t is relatively easy to prove: 348 L. Guo, C. Kop

Theorem 5. Given relations ≿ and ≻ over X such that ≻ is well-founded and ≿ ; ≻ ⊆ ≻<sup>+</sup>, ≻<sup>l</sup> is well-founded over X<sup>n</sup> for all n.

Proof. The standard method used when ≿ is the equality still applies. ⊓⊔

We refer to [41] for the proof of the following property of ≻<sup>m</sup> t :

Theorem 6. Given relations ≿ and ≻ over X such that ≿ is a quasi-ordering, ≻ is well-founded and ≿ ; ≻ ⊆ ≻, ≻<sup>m</sup> is well-founded over X<sup>∗</sup> .

Proof. See Theorem 3.7 in [41]. ⊓⊔

In comparison to [41], we waive the transitivity requirement for ≻ above, but we cannot get around the requirement that ≿ is a quasi-ordering without signifcantly changing the proof. This seems problematic because ≿<sup>t</sup> is not necessarily transitive due to its inclusion of ≻<sup>t</sup> . Fortunately, one observation resolves this issue: ≻<sup>m</sup> t can equivalently be seen as induced by ↓<sup>κ</sup> and ≻<sup>t</sup> due to Theorem 4. In the same spirit, we can prove the following property:

Theorem 7. ↓ m κ ; ≻<sup>m</sup> <sup>t</sup> ⊆ ≻<sup>m</sup> <sup>t</sup> where s<sup>1</sup> . . . s<sup>n</sup> ↓ m κ t<sup>1</sup> . . . t<sup>n</sup> if and only if there exists a permutation π over { 1, . . . , n } such that sπ(i) ↓<sup>κ</sup> t<sup>i</sup> for all i.

Proof. See Lemma 3.2 in [41]. ⊓⊔

Our defnition of computability (or reducibility [17]) is standard:

Defnition 7. A term t<sup>0</sup> is called computable if either


In [21], a term is called neutral if it is not a λ-abstraction. Due to the exclusion of λ-abstractions, one might consider all LCSTRS terms neutral. This naive defnition, however, does not capture the essence of neutrality: if a term t<sup>0</sup> is neutral, a one-step reduct (with respect to ≻t) of t<sup>0</sup> t<sup>1</sup> can only be t ′ 0 t ′ <sup>1</sup> where t ′ 0 and t ′ <sup>1</sup> are reducts of t<sup>0</sup> and t1, respectively. Because of curried notation, neutral LCSTRS terms should be defned as follows:

Defnition 8. A term is called neutral if it assumes the form x t<sup>1</sup> · · ·t<sup>n</sup> for some variable x.

And we recall the following results:

Theorem 8. Computable terms have the following properties:


Proof. The standard proof still works despite the seemingly diferent defnition of neutrality. ⊓⊔

In addition, we prove the following lemma:

Lemma 5. Given terms s and t such that s ↓<sup>κ</sup> t, if s is computable, so is t.

Proof. By induction on the type of s and t. ⊓⊔

And we have the following corollary due to Theorem 4:

Corollary 2. Given terms s and t such that s ≿<sup>t</sup> t, if s is computable, so is t.

The goal is to prove that all terms are computable. To do so, the key is to prove that f s<sup>1</sup> · · · s<sup>m</sup> is computable where f is a function symbol if s<sup>i</sup> is computable for all i. In [21], this is done on the basis that f s<sup>1</sup> · · · s<sup>m</sup> is neutral, which is not true in our case. We do it diferently and start with a defnition:

Defnition 9. Given f : A<sup>1</sup> → · · · → A<sup>n</sup> → B where f ∈ F and B ∈ S, let ar(f) be n. We introduce a special symbol ⊤ and extend our previous defnitions so that ⊤ ≻<sup>t</sup> t for all t ∈ T(F, V) and ⊤ ↓<sup>κ</sup> ⊤. This way ⊤ ≿<sup>t</sup> t if t ∈ T(F, V) or t = ⊤. Given terms t¯= t<sup>1</sup> . . . tn, let (t¯)<sup>k</sup> be t<sup>k</sup> if k ≤ n, and ⊤ if k > n. Given terms s = f s<sup>1</sup> · · · s<sup>m</sup> and t = g t<sup>1</sup> · · ·t<sup>n</sup> where f ∈ F, g ∈ F, all s<sup>i</sup> and t<sup>i</sup> are computable, we defne ≻<sup>c</sup> such that s ≻<sup>c</sup> t if and only if f ▶ g, or f = g and

$$\begin{array}{l} \mathsf{s} - \mathsf{s}(f) = \mathsf{l} \text{ and } (\bar{\mathsf{s}})\_{1} \dots (\bar{\mathsf{s}})\_{\text{ar}(f)} \succ\_{\mathsf{t}}^{\mathsf{l}}(\bar{t})\_{1} \dots (\bar{t})\_{\text{ar}(f)}, \text{ or} \\\mathsf{s} - \mathsf{s}(f) = \mathsf{m}\_{k} \text{ and} \\\mathsf{o} \cdot (\bar{\mathsf{s}})\_{1} \dots (\bar{\mathsf{s}})\_{k} \succ\_{\mathsf{t}}^{\mathsf{m}}(\bar{t})\_{1} \dots (\bar{t})\_{k}, \text{ or} \\\mathsf{o} \cdot (\bar{\mathsf{s}})\_{1} \dots (\bar{\mathsf{s}})\_{k} \Downarrow\_{\mathsf{n}}^{\mathsf{m}}(\bar{t})\_{1} \dots (\bar{t})\_{k}, \forall i > k. \ (\bar{\mathsf{s}})\_{i} \succ\_{\mathsf{t}}^{\mathsf{m}}(\bar{t})\_{i} \text{ and } \exists i > k. \ (\bar{\mathsf{s}})\_{i} \succ\_{\mathsf{t}} (\bar{t})\_{i}. \end{array}$$

This gives us a well-founded relation:

Lemma 6. ≻<sup>c</sup> is well-founded.

Proof. Since all computable terms are terminating with respect to ≻<sup>t</sup> , ≻<sup>t</sup> is well-founded over computable terms. The introduction of ⊤ clearly does not break this well-foundedness. The outermost layer of ≻<sup>c</sup> regards ▶, which is well-founded by defnition. We need only to fx the function symbol f and to go deeper. If s(f) = l, we know that ≻<sup>l</sup> t is well-founded over lists of length ar(f) because of Theorem 5. If s(f) = mk, ≻<sup>c</sup> splits each list of arguments in two and performs a lexicographic comparison. We can go past the frst component because of Theorems 6 and 7. And the rest, a pointwise comparison, is also well-founded. So we can conclude that ≻<sup>c</sup> is well-founded. ⊓⊔

Now we prove the aforementioned statement:

Lemma 7. Given a term s = f s<sup>1</sup> · · · s<sup>m</sup> where f is a function symbol, if s<sup>i</sup> is computable for all i, so is s.

Proof. By well-founded induction on ≻c. We consider the type of s:

350 L. Guo, C. Kop

	- 1. If ∃k. s<sup>k</sup> ≿<sup>t</sup> t, t is computable due to Corollary 2.
	- 2. If t = t<sup>0</sup> t<sup>1</sup> · · ·t<sup>n</sup> and ∀i. s ▷<sup>t</sup> t<sup>i</sup> , t<sup>i</sup> is computable for all i by inner induction. By defnition, t is computable.
	- 3. If t = g t<sup>1</sup> · · ·tn, f ▶ g and ∀i. s ▷<sup>t</sup> t<sup>i</sup> , t<sup>i</sup> is computable for all i by inner induction. It follows from f ▶ g that s ≻<sup>c</sup> t, and t is computable by outer induction.
	- 4. If t = f t<sup>1</sup> · · ·tn, s(f) = l, s<sup>1</sup> . . . s<sup>m</sup> ≻<sup>l</sup> t t<sup>1</sup> . . . t<sup>n</sup> and ∀i. s ▷<sup>t</sup> t<sup>i</sup> , t<sup>i</sup> is computable for all i by inner induction. Likewise, s ≻<sup>c</sup> t.
	- 5. If t = f t<sup>1</sup> · · ·tn, s(f) = mk, k ≤ n, s<sup>1</sup> . . . smin(m,k) ≻<sup>m</sup> t t<sup>1</sup> . . . t<sup>k</sup> and ∀i. s ▷<sup>t</sup> t<sup>i</sup> , t<sup>i</sup> is computable for all i by inner induction. Likewise, s ≻<sup>c</sup> t.
	- 6. If t is a value, t is terminating with respect to ≻<sup>t</sup> and its type is a sort.

We conclude that s is computable. ⊓⊔

Now the well-foundedness of ≻<sup>t</sup> follows immediately:

Theorem 9. ≻<sup>t</sup> is well-founded.

Proof. We prove that every term t is computable by induction on t. Given Lemma 7, we need only to prove that variables are computable, which is the case because variables are neutral and in normal form with respect to ≻<sup>t</sup> . ⊓⊔

# 5 Discussion: HORPO and Dependency Pairs

In Section 4, we discussed rule removal, and presented a reduction ordering to prove termination. However, in practice it is not so common to directly use reduction orderings as a termination method. Rather, the norm in the literature nowadays is to use dependency pairs.

The dependency pair framework [1,16] allows a single term rewriting system to be split into multiple "DP problems", each of which can then be analyzed independently. The framework operates by iteratively simplifying DP problems until none remain, in which case the original system is proved terminating. There

are variants for many styles of term rewriting, including frst-order LCTRSs [25] and unconstrained higher-order TRSs [39,25,11].

Importantly, many existing techniques can be reformulated as "processors" (DP problem simplifers) in the dependency pair framework. Such techniques include reduction orderings, which are at the heart of the dependency pair framework. This combination is far more powerful than using reduction orderings directly because the monotonicity requirement is replaced by weak monotonicity, and we do not have to orient the entire system in one go.

Consider the following frst-order LCTRS:

$$\begin{aligned} \text{u } x \ y &\rightarrow \text{u } (x+1) \ (y \ast 2) & \quad [x < 100] & \quad \text{v } y &\rightarrow \text{v } (y-1) & \quad [y > 0] \\ \text{u } 100 \ y &\rightarrow \text{v } y \end{aligned}$$

This system cannot be handled by HORPO directly: the ordering [[⊐int]] needs to be fxed globally, so we can either orient the rewrite rule at the top-left corner or the one at the top-right corner, but not both at the same time. We could address this dilemma by using a more elaborate defnition of HORPO (for example, by giving every function symbol an additional status that indicates the theory ordering to be used for each of its arguments), but this seems redundant: in practice, such a system would be handled by the dependency pair framework. Following the defnition in [25], the above system would be split in two separate DP problems corresponding to the two loops:

$$\left\{ \begin{array}{ll} \mathbf{u}^{\sharp} \ x \ y \rightarrow \mathbf{u}^{\sharp} \ (x+1) \ (y \ast 2) \end{array} \begin{array}{ll} \left[x < 100 \right] \end{array} \right\} \qquad \left\{ \begin{array}{ll} \mathbf{v}^{\sharp} \ y \rightarrow \mathbf{v}^{\sharp} \ (y-1) \end{array} \begin{array}{ll} \left[y > 0 \right] \end{array} \right\}$$

which could then be handled independently.

While dependency pairs for LCSTRSs are not yet defned (and beyond the scope of this paper), we postulate that the defnitions for curried higher-order rewriting in [11] and frst-order constrained rewriting in [25] can be combined in a natural way. In this setting, HORPO would naturally be combined with argument flterings [1,11]. That is, since we only require weak monotonicity, some arguments can be removed. For example, the frst DP problem above can be handled by showing the following inequalities:

> u <sup>♯</sup> x ≻x<<sup>100</sup> u ♯ (x + 1) u ≿x<<sup>100</sup> u v ≿y><sup>0</sup> v u ≿<sup>t</sup> v

This is the case with u ▶ v.

# 6 Implementation

A preliminary implementation of LCSTRSs is available in Cora through the link:

https://github.com/hezzel/cora

Cora is an open-source analyzer for constrained rewriting, which can be used both as a stand-alone tool and as a library. Note that Cora is still in active development, and its functionalities, as well as its interface, are subject to change. Nevertheless, Cora is already used in several student projects. Cora supports only the theories of integers and booleans so far, but is intended to eventually support any theory, provided that an SMT solver is able to handle it. Example input fles are supplied in the above repository. The version of this paper is available in [28].

Automating Constrained HORPO. Cora includes an implementation of constrained HORPO. Following existing termination tools such as AProVE [14], NaTT [42] and Wanda [26], we use an SMT encoding such that a satisfying assignment to variables in the SMT problem corresponds to a combination of the precedence ▶, the status s and the ordering [[⊐int]] that proves the termination of the encoded system by constrained HORPO. As for booleans, we simply choose the ordering [[⊐bool]] such that [[t ⊐bool f]] = 1.

To encode the precedence and the status, we introduce integer variables prec<sup>f</sup> and stat<sup>f</sup> for each function symbol f that is not a value. We require that prec<sup>f</sup> < 0 if f is a theory symbol, and that prec<sup>f</sup> ≥ 0 otherwise—so that prec<sup>f</sup> > prec<sup>g</sup> corresponds to f ▶ g. The value k of stat<sup>f</sup> indicates s(f) = l if k = 1, and s(f) = m<sup>k</sup> if k > 1. We let down be a boolean variable which indicates the choice between two possibilities for [[⊐int]]: λm. λn. m > −M ∧ m > n and λm. λn. m < M ∧ m < n (the choice of M is discussed below).

In the derivation of s ≻<sup>φ</sup> t, all assertions assume the form s ′ R<sup>φ</sup> t ′ where s ′ and t ′ are subterms of s and t, respectively (see Example 9). Hence, given a fnite set of rewrite rules, there are only fnitely many possible assertions to be analyzed. By inspecting the defnition of constrained HORPO, we also note that there are no cyclic dependencies. For all ℓ → r [φ], respective subterms s and t of ℓ and r, and R ∈ { ≿, ≻, ▷, 1a, 1b, . . . , 3f }, we thus introduce a variable ⟨s R<sup>φ</sup> t⟩ with its defning constraint. Without going into detail for all the cases, we provide a few key examples:

	- If either of s and t is not a theory term, or their respective types are not the same theory sort, or Var(s) ∪ Var(t) ⊈ Var(φ), we add ¬⟨s 2a<sup>φ</sup> t⟩.
	- Otherwise, we consider the type of s and t:
		- ∗ The type is int. We respectively check if φ =⇒ s > −M ∧ s > t and φ =⇒ s < M ∧ s < t are valid. If the former is not valid, we add ⟨s 2a<sup>φ</sup> t⟩ =⇒ ¬down; if the latter is not valid, we add ⟨s 2a<sup>φ</sup> t⟩ =⇒ down. That is, if both of the validity checks fail, both of the constraints are added, which is equivalent to adding ¬⟨s 2a<sup>φ</sup> t⟩.
		- ∗ The type is bool. We add ¬⟨s 2a<sup>φ</sup> t⟩ if φ =⇒ s ∧ ¬t is not valid; if it is valid, nothing is added and the SMT solver is free to set true for the variable ⟨s 2a<sup>φ</sup> t⟩.

Here M is twice the largest absolute value of integers occurring in the rewrite rules, or just 1000 if that is too large—this value is chosen arbitrarily. Note that the validity checks are not included as part of the SMT problem: if they were included, the satisfability problem would contain universal quantifcation, which is typically hard to solve. We rather pose a separate question to the SMT solver every time we encounter theory comparison, and for integers, consider whether the pair can be oriented downward with λm. λn. m > −M ∧ m > n, upward with λm. λn. m < M ∧ m < n, or not at all. Hence, we must fx the bound M beforehand.

	- ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ =⇒ 2 ≤ stat<sup>f</sup> ≤ n.
	- ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ =⇒ V j ⟨f s<sup>1</sup> · · · s<sup>m</sup> ▷<sup>φ</sup> t<sup>j</sup> ⟩.
	- ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ =⇒ W i strict<sup>i</sup> .
	- For all i ∈ { 1, . . . , m }, ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ ∧ strict<sup>i</sup> =⇒ i ≤ stat<sup>f</sup> . That is, I ⊆ { 1, . . . , k } if s(f) = mk.
	- For all j ∈ { 1, . . . , n }, ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ ∧ j ≤ stat<sup>f</sup> =⇒ 1 ≤ π(j) ∧ π(j) ≤ m ∧ π(j) ≤ stat<sup>f</sup> . That is, 1 ≤ π(j) ≤ min(m, k) for all j ∈ { 1, . . . , k } if s(f) = mk.
	- For all i ∈ { 1, . . . , m }, j ∈ { 1, . . . , n − 1 } and j ′ ∈ { j + 1, . . . , n }, ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ =⇒ strict<sup>i</sup> ∨ π(j) ̸= i ∨ π(j ′ ) ̸= i. That is, π −1 (i) <sup>≤</sup> <sup>1</sup> for all <sup>i</sup> ∈ { <sup>1</sup>, . . . , m } \ <sup>I</sup>—which sufces because we can add to I all i ∈ { 1, . . . , min(m, k) } \ I such that π −1 (i)  = 0 without changing the generalized multiset ordering if s(f) = mk.
	- For all i ∈ { 1, . . . , m } and j ∈ { 1, . . . , n }, ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ ∧ π(j) = i ∧ strict<sup>i</sup> =⇒ ⟨s<sup>i</sup> ≻<sup>φ</sup> t<sup>j</sup> ⟩.
	- For all i ∈ { 1, . . . , m } and j ∈ { 1, . . . , n }, ⟨f s<sup>1</sup> · · · s<sup>m</sup> 3e<sup>φ</sup> f t<sup>1</sup> · · ·tn⟩ ∧ π(j) = i ∧ ¬strict<sup>i</sup> =⇒ ⟨s<sup>i</sup> ≿<sup>φ</sup> t<sup>j</sup> ⟩.

Cora succeeds in proving that all the examples in this paper are terminating, except Example 2, which is non-terminating.

# 7 Related Work

In this section, we assess the newly proposed formalism and the prospects for its application by comparing and relating it to the literature.

Term Rewriting. The closest related work is LCTRSs [27,12], the frst-order formalism for constrained rewriting upon which the present work is built. Similarly, there are numerous formalisms for higher-order term rewriting, but without built-in logical constraints, e.g., [21,22,31]. It seems likely that the methods for

354 L. Guo, C. Kop

analyzing those can be extended with support for SMT, as what is done for HORPO in this paper.

Also worth mentioning is the K Framework [35], which, like our formalism, can be used as an intermediate language for program analysis and is based on a form of frst-order rewriting. The K tool includes techniques through reachability logic, rather than methods like HORPO.

There are several works that analyze functional programs using term rewriting, e.g., [2,15]. However, they typically use translations to frst-order systems. Hence, some of the structure of the initial problem is lost, and their power is weakened.

HORPO. Our defnition of constrained HORPO is based on the frst-order constrained RPO for LCTRSs [27] and the frst defnition of higher-order RPO [21]. There have been other HORPO extensions since, e.g., [5,6], and we believe that the ideas for these extensions can also be applied to constrained HORPO. We have not done so because the purpose of this paper is to show that and how techniques for analyzing higher-order systems extend, not to introduce the most powerful (and consequently more elaborate) ones.

Also worth mentioning is [4], a higher-order RPO for λ-free systems. This variant is defned for the purpose of superposition rather than termination analysis, and is ground-total but generally not monotonic.

Functional Programming. There are many works performing direct analyses of functional programs, including termination analysis, although they typically concern specifc programming languages such as Haskel (e.g., [19]) and OCaml (e.g., [20]). A variety of techniques have been proposed, such as sized types [33] and decreasing measures on data [18], but as far as we can fnd, there is no real parallel of many rewriting techniques such as RPO. We hope that, through LCSTRSs, we can help make the techniques of term rewriting available to the functional programming community.

# 8 Conclusion and Future Work

In summary, we have defned a higher-order extension of logically constrained term rewriting systems, which can represent realistic higher-order programs in a natural way. To illustrate how such systems may be analyzed, we have adapted HORPO, one of the oldest higher-order termination techniques, to handle logical constraints. Despite being a very basic method, this is already powerful enough to handle examples in this paper. Both LCSTRSs and constrained HORPO are implemented in our new analysis tool Cora.

In the future, we intend to extend more techniques, both frst-order and higher-order, to this formalism, and to implement them in a fully automatic tool. We hope that this will make the methods of the term rewriting community available to other communities, both by providing a powerful backend tool, and by showing how existing techniques can be adapted—so they may also be natively adopted in program analysis.

A natural starting point is to increase our power in termination analysis by extending dependency pairs [1,39,11,25] and various supporting methods like the subterm criterion and usable rules. In addition, methods for analyzing complexity, reachability and equivalence (e.g., through rewriting induction [34,12]), which have been defned for frst-order LCTRSs, are natural directions for higher-order extension as well.

Acknowledgments. The authors are supported by NWO VI.Vidi.193.075, project "CHORPE". We thank Deivid Vale for his work on Cora and his assistance in the preparation of the artifact, Carsten Fuhs for his comments on an early draft of this paper, and the anonymous reviewers for their helpful feedback.

Disclosure of Interests. The authors have no competing interests to declare that are relevant to the content of this paper.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Abstract Interpretation**

# A Modular Soundness Theory for the Blackboard Analysis Architecture

, Dominik Helm1,<sup>2</sup> , Tobias Roth1,<sup>2</sup> , and Mira Mezini1,2,<sup>3</sup> Sven Keidel1(B)

<sup>1</sup> Technische Universität Darmstadt, Darmstadt, Germany keidel,helm,roth,mezini@cs.tu-darmstadt.de <sup>2</sup> National Research Center for Applied Cybersecurity (ATHENE), Darmstadt, Germany

<sup>3</sup> Hessian Center for Artifcial Intelligence (hessian.AI), Darmstadt, Germany

Abstract. Sound static analyses are an important ingredient for compiler optimizations and program verifcation tools. However, mathematically proving that a static analysis is sound is a difcult task due to two problems. First, soundness proofs relate two complicated program semantics (the static and the dynamic semantics) which are hard to reason about. Second, the more the static and dynamic semantics difer, the more work a soundness proof needs to do to bridge the impedance mismatch. These problems increase the efort and complexity of soundness proofs. Existing soundness theories address these problems by deriving both the dynamic and static semantics from the same artifact, often called generic interpreter. A generic interpreter provides a common structure along which a soundness proof can be composed, which avoids having to reason about the analysis as a whole. However, a generic interpreter restricts which analyses can be derived, as all derived analyses must roughly follow the program execution order.

To lift this restriction, we develop a soundness theory for the blackboard analysis architecture, which is capable of describing backward, demanddriven, and summary-based analyses. The architecture describes static analyses with small independent modules, which communicate via a central store. Soundness of a compound analysis follows from soundness of all of its modules. Furthermore, modules can be proven sound independently, even though modules depend on each other. We evaluate our theory by proving soundness of four analyses: a pointer and call-graph analysis, a refection analysis, an immutability analysis, and a demanddriven reaching defnitions analysis.

# 1 Introduction

Developing static analyses is a laborious and complicated task due to the complexity of modern programming languages. A signifcant part of the complication pertains to ensuring that static analyses are sound, i.e., over-approximate the runtime behavior of analyzed programs. Unfortunately, even well-established static analyses are shown to be unsound, e.g., since 2010, more than 80 soundness bugs have been found in diferent analyses used in the LLVM compiler [46].

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-57267-8 14.

Testing helps fnding soundness bugs but cannot prove their absence, leaving the trustworthiness of these analyses in question.

Mathematical soundness proofs ensure the absence of soundness bugs. However, such proofs are difcult for two reasons: First, soundness proofs relate two program semantics: the static semantics and the dynamic semantics [12]—each in its own can individually be complex. Especially modern programming language features such as refection [30], concurrency [29], or native code [1] are notoriously difcult to analyze and hard to reason about. Second, the style of static and dynamic semantics can difer signifcantly, e.g., the static semantics of Doop [7], which is described in Datalog, difers signifcantly from dynamic semantics described with small-step rules [6]. This impedance mismatch makes soundness proofs monolithic, i.e., it is difcult to determine which parts of the static semantics relate to which parts of the dynamic semantics, requiring the soundness proofs to reason about both semantics as a whole. These problems complicate soundness proofs such that only leading experts with multiple years of experience can conduct them [13, 26].

To deal with the complexity of soundness proofs, existing works modularize static and dynamic semantics [5,14,28]. This modularization allows to compose a soundness proof for the entire analysis from soundness lemmas of small parts of the analysis. This allows reasoning about small parts of the analysis one at a time. These existing works require that both the static and dynamic semantics are derived from the same artifact, often called a generic interpreter. A generic interpreter describes the operational semantics of a language, without referring to details of dynamic or static semantics, and provides a common structure along which a soundness proof can be composed. However, generic interpreters restrict what types of analyses can be derived. In particular, generic interpreters derive analyses that follow the program execution order, specifcally, forward wholeprogram abstract interpreters. But it is unclear how other types of analyses can be derived that do not follow the program execution order, such as backward, demand-driven/lazy, or summary-based analyses.

The work presented in this paper lifts this restriction by developing a soundness theory for the blackboard analysis architecture. The architecture is the foundation of the OPAL framework [21], which has been used to develop diferent kinds of analyses, including backward analyses [17], on-demand/lazy analyses [19,41], and summary-based analyses [21]. In the architecture, complex static analyses are modularly composed from smaller, simpler static modules that handle individual language features, e.g., refection, or program properties, e.g., immutability. These modules are decoupled—they are not allowed to call each other directly; instead, they communicate with each other by exchanging information via a central data store called blackboard [39] orchestrated by a fxpoint solver.

To develop a soundness theory for the blackboard analysis architecture, we defne a dynamic semantics, which follows the same style as the static semantics and thus avoids the impedance mismatch problem. Specifcally, the dynamic semantics is composed of dynamic modules that communicate with each other via a store. Our soundness theory is compositional, which means that each static module can be proven sound individually and soundness for the compound analysis follows from a meta theorem. Furthermore, we extend the theory to make soundness proofs of existing static modules reusable across diferent analyses. In particular, we prove that the soundness proof of an static module remains valid, even if (a) the compound analysis processes source code elements unknown to the module and (b) the store contains other types of analysis information unknown to the module. Furthermore, our proofs are polymorphic in the lattices on which static modules operate, i.e., the lattices can be changed without afecting soundness. For instance, we can reuse a pointer-static module, which typically depends on an allocation-site lattice, in a refection analysis to propagate string information by extending this lattice without invalidating the pointer-static modules' soundness proof.

We demonstrate the applicability of our theory by implementing four diferent analyses and their dynamic semantics in the blackboard analysis architecture: (1) a pointer and call-graph analysis, (2) an analysis for refection, (3) an immutability analysis, and (4) a demand-driven reaching-defnitions analysis. Our choice of analyses is inspired by existing state-of-the-art analyses for Java implemented in the OPAL framework [21, 41]. We implemented and tested each analysis and dynamic semantics in Scala to ensure they are executable. Furthermore, we used our theory to prove each analysis sound, where each analysis exercises a diferent aspect of our theory: (1) static modules can be proven sound independently despite mutually depending on each other, (2) soundness of modules remains valid even though the lattice changes, (3) soundness of a module remains valid even though diferent source code elements are analyzed, and (4) our theory applies to analyses which do not follow the program execution order.

In summary, we make the following contributions:


All proofs of theorems, lemmas, and case studies are provided in the paper's supplementary material.

# 2 Blackboard Analysis Architecture

In this section, we introduce and formalize the static and dynamic semantics of the blackboard analysis architecture used in the OPAL framework [21].

### 2.1 Static Semantics

Static analyses in the blackboard analysis architecture consist of multiple static modules exchanging information via a central data store called blackboard [39]. This avoids coupling between modules as they are not allowed to call each other directly: Modules store analysis results in the blackboard using keys. These keys allow other modules to retrieve results without needing to know their producer.

Defnition 1 (Static Semantics). We defne basic notions and datatypes of the static semantics of the blackboard analysis architecture:


<sup>4</sup> We use a hat symbol <sup>b</sup> to disambiguate static defnitions from dynamic defnitions with the same name but without hat.

<sup>5</sup> The syntax A ⇀ B denotes a partial function from A to B. Furthermore, dom(f) is the set of all inputs for which a partial function f is defned.

A Modular Soundness Theory for the Blackboard Analysis Architecture 365

The types Entity \, Kind, and Property \ are defned by analysis developers, whereas the other types and functions are fxed by this defnition. ⊓⊔

We illustrate Defnition 1 at the example of a text-book reaching-defnitions analysis [38] for an imperative language with labeled assignments and expressions:

```
Entity \\= Stmt
Property \ [κControlFlowPred] = P(Stmt)
Property[κReachingDefs] = Var ⇀ P(Assign)
Store [ = [Stmt × κControlFlowPred ⇀ P(Stmt)]
      ∪ [Stmt × κReachingDefs ⇀ (Var ⇀ P(Assign))]
reachingDefs \ (stmt: Entity \, σb: Store [): Store [ =
  predecessors = σb(stmt, κControlFlowPred)
  inc =
        F
          p∈predecessors σb(p, κReachingDefs)
  out d = stmt match
    case Assign(x,_,_) => inc[x 7→ {stmt}]
    case _ => inc
  σb ⊔ [stmt, κReachingDefs 7→ out d]
```
The static module reachingDefs \ is implemented with Scala-like pseudo code. Module reachingDefs \ computes for every statement of the program which variable defnitions reach it. Therefore, entities are statements and the module's property is a mapping from variables to assignments that may have defned it. Module reachingDefs \ joins the reaching defnitions of all control-fow predecessors and then updates them on variable assignments. Note that module reachingDefs \ neither computes the control-fow predecessors directly nor does it call another module which computes this information. Instead, it retrieves this information from the store <sup>σ</sup>b. This decoupling avoids dependencies between static modules and enables compositional soundness proofs.

### 2.2 Dynamic Semantics

Static analyses in the blackboard analysis architecture are proven sound with respect to a dynamic semantics in the same style, which we defne formally in this subsection:

Defnition 2 (Dynamic Semantics). We defne the dynamic semantics used to prove soundness of analyses in the blackboard analysis architecture:


366 S. Keidel et al.

3. Static analyses are proven sound with respect to a dynamic reachability semantics (reachable : P(Module) × Store → P(Store)). The reachability semantics returns the set of all reachable stores by iteratively applying a set of dynamic modules. More specifcally, the set reachable(F, σ1) contains store σ<sup>1</sup> and for all f ∈ F, reachable stores σ, and for entities e ∈ dom(σ), the set contains f(e, σ), if it is defned. ⊓⊔

We illustrate these defnitions again at the example of the reaching-defnitions analysis which we introduced in the previous subsection:

```
Entity = Stmt | Unit
Property[κControlFlowPred] = Stmt
Property[κReachingDefs] = Var ⇀ Assign
Property[κState] = ProgramState
Store = [Stmt × κControlFlowPred ⇀ Stmt] ∪ [Stmt × κReachingDefs ⇀ (Var ⇀ Assign)]
     ∪ [Unit × κState ⇀ ProgramState]
reachingDefs(stmt: Entity, σ: Store): Store =
  predecessor = σ(stmt, κControlFlowPred)
  in = σ(predecessor, κReachingDefs)
  out = stmt match
    case Assign(x,_,_) => in[x 7→ stmt]
    case _ => in
  σ[stmt, κReachingDefs 7→ out]
controlFlow(stmt1: Entity, σ: Store): Store =
  state1 = σ[Unit, κState]
  (stmt2, state2) = step(stmt1, state1)
  σ[stmt2, κControlFlowPred 7→ stmt1][Unit, κState 7→ state2]
```
Dynamic module reachingDefs is analogous to its static counterpart reachingDefs \ , but computes the most recent defnition of a variable instead of all possible definitions. The dynamic module depends on the control-fow predecessor, which is the most recently executed statement. The control-fow predecessors are computed by module controlFlow, which is based on a small-step operational semantics step : Stmt × ProgramState ⇀ Stmt × ProgramState. Module controlFlow demonstrates that the blackboard architecture is capable to integrate existing dynamic operational semantics, such as those for Java [6] or WebAssembly [18].

The blackboard analysis architecture not only modularizes the static semantics but also the dynamic semantics, which is crucial for enabling compositional and reusable soundness proofs. In particular, each static module is proven sound with respect to exactly one dynamic module, which limits the proof scope and guarantees proof independence. Furthermore, for analyses that approximate nonstandard dynamic semantics, the standard dynamic semantics can be modularly extended with further modules (e.g., Section 5.1).

To summarize, in this section we formally defned the blackboard analysis architecture, which allows to implement static analyses modularly. Furthermore, we defned a dynamic semantics in the same style against which analyses are proven sound.

# 3 Compositional Soundness Proofs

In this section, we develop a theory of compositional soundness proofs for analyses in the blackboard style: Soundness of a compound analysis follows directly from soundness of the individual static modules. This soundness theory simplifes the soundness proof, because it allows analysis developers to focus on soundness of individual static modules, instead of having to reason about soundness of the interaction of all static modules with each other. Furthermore, the soundness theory makes the proofs more maintainable, as a change to a module only afects the proof of that module and nothing else.

We start the section by defning soundness of static modules and then work up to soundness of whole analyses. The defnitions of soundness are standard and build upon the theory of abstract interpretation [12]:

Defnition 3 (Soundness of Static Modules). An static module <sup>b</sup><sup>f</sup> <sup>∈</sup> Module \ is sound if it overapproximates its dynamic counterpart <sup>f</sup> <sup>∈</sup> Module:

$$\begin{aligned} \mathsf{sound}(f,\widehat{f}) \; \mathsf{if} \; &\forall \widehat{e} \in \widehat{\mathsf{Entity}}, \widehat{\sigma} \in \widehat{\mathsf{Store}}, e \in \gamma\_{\mathsf{Entity}}(\widehat{e}), \sigma \in \gamma\_{\mathsf{Store}}(\widehat{\sigma}). \\ &f(e,\sigma) \in \gamma\_{\mathsf{Store}}(\widehat{f}(\widehat{e},\widehat{\sigma})) \end{aligned}$$

The expression <sup>x</sup> <sup>∈</sup> <sup>γ</sup>(yb) reads as "element <sup>y</sup><sup>b</sup> soundly overapproximates the concrete element <sup>x</sup>." Function <sup>γ</sup> : <sup>L</sup><sup>b</sup> → P(L) is a monotone function from an abstract domain <sup>L</sup><sup>b</sup> to a powerset of a concrete domain <sup>L</sup> and is called concretization function. We do not require that an abstraction function <sup>α</sup> : <sup>P</sup>(L) <sup>→</sup> <sup>L</sup><sup>b</sup> in the opposite direction exists nor that γ and α form a Galois connection, both of which are not necessary for soundness proofs.

The soundness defnition above requires that analysis developers defne concretizations for entities ( \ <sup>γ</sup>Entity : Entity \ → P(Entity)) and properties (γProperty : Property[κ] → P(Property[κ])). Often the abstract and concrete entities are of the same type (Entity \ <sup>=</sup> Entity). In this case, the concretization functions map to singleton sets (γEntity(e) = {e}). Based on concretization functions for entities, kinds, and properties, we defne a point-wise concretization on stores. The defnition can be found in the supplementary material.

In the following, we defne soundness of compound analyses.

Defnition 4 \ (Soundness of a Compound Analysis). Let <sup>Φ</sup> <sup>⊆</sup> Module <sup>×</sup> Module be a set of static modules paired with corresponding dynamic modules. A compound analysis is sound if the fxpoint of all of its static modules overapproximates the reachability semantics of the corresponding dynamic modules:

$$\begin{aligned} \mathsf{sound}(\Phi) &\quad \mathit{iff} \ \forall \widehat{\sigma} \in \widehat{\mathsf{Store}}. \ \mathsf{reachable}(F, \gamma\_{\mathsf{Store}}(\widehat{\sigma})) \subseteq \gamma\_{\mathsf{Store}}(\mathsf{fix}(\widehat{F}, \widehat{\sigma})) \\ \text{where } F &= \{ f \mid (f, \\_) \in \Phi \} \text{ and } \widehat{F} = \{ \widehat{f} \mid (\\_, \widehat{f}) \in \Phi \}. \end{aligned}$$

The compound analysis approximates the dynamic reachability semantics (Definition 2.3), which collects the set of all stores reachable by applying dynamic modules. The dynamic reachability semantics is a collecting semantics, commonly used to prove soundness of abstract interpreters [12].

We are now ready to state the main theorem of this work:

Theorem 1 (Soundness Composition). Let <sup>Φ</sup> <sup>⊆</sup> Module <sup>×</sup> Module \ be a set of static modules paired with corresponding dynamic modules. Soundness of a compound analysis follows from soundness of all of its static modules:

If sound(<sup>f</sup> , <sup>b</sup><sup>f</sup> ) for all (f, <sup>f</sup>b) <sup>∈</sup> <sup>Φ</sup> then sound(Φ).

Proof. We show reachable(F, γStore(σb<sup>1</sup>)) <sup>⊆</sup> <sup>γ</sup>Store(fx(F , <sup>b</sup> <sup>σ</sup>b<sup>1</sup>)) by well-founded induction on X ⪯ reachable(F, X).


We illustrate this theorem by applying it to the reaching defnitions analysis from Section 2.1. Specifcally, soundness of the compound analysis follows from soundness of module reachingDefs \ module controlFlow \ by Theorem 1:

$$\begin{array}{l} \texttt{sound}(\texttt{reachingDefs}, \texttt{reachingDefs})\\ \texttt{sound}(\texttt{controlFlow}, \texttt{controlFlow})\\ \texttt{sound}(\{(\texttt{reachingDefs}, \texttt{reachingDefs}), (\texttt{controlFlow}, \texttt{controlFlow})\}) \end{array}$$

This means reachingDefs \ can be proven sound independently from controlFlow \ , even though the modules interact with each other in the compound analysis. The proof independence is possible because neither module \ reachingDefs nor reachingDefs call the control-fow modules directly. Instead, both the static and dynamic module read the control-fow information from the stores, which are guaranteed to be a sound overapproximation initially (assumption <sup>σ</sup> <sup>∈</sup> <sup>γ</sup>Store(σb)). Furthermore, only properties that the reaching-defnitions modules themselves wrote to the store need to be sound overapproximations. Properties that other modules wrote to the store are not subject of the soundness proof of the reachingdefnitions modules. The soundness proof of module reachingDefs \ is found in the supplementary material.

To summarize, in this section we developed a theory of compositional soundness proofs for analyses described in the blackboard architectural style. Each static module can be proven sound independently from other modules. Furthermore, soundness of a whole analysis follows directly from soundness of each module. In particular, no reasoning about the analysis as a whole is required.

# 4 Reusable Soundness Proofs

As of now, static modules refer to a specifc type of entities, kinds, properties, and stores. However, adding new modules to an analysis may require extending A Modular Soundness Theory for the Blackboard Analysis Architecture 369

these types. This invalidates the soundness proofs of existing modules and they need to be re-established. In this section, we extend our theory to make static modules and their soundness proofs reusable.

#### 4.1 Extending the Type of Entities and Kinds

We start by explaining how entities and kinds can be extended without invalidating existing soundness proofs.

For example, if we were to add a taint static module to an existing analysis over types Entity \, Kind, and Store [, we needed to extend these types to hold the new analysis information:

$$\begin{aligned} \widehat{\mathsf{Enity'}} &= \widehat{\mathsf{Enity}} \mid \mathsf{Var} & \qquad \mathsf{Kin\mathsf{q'}} = \mathsf{Kin\mathsf{q}} \mid \kappa\_{\mathsf{Taint}} \end{aligned}$$

But this invalidates the proofs of existing modules that depend on the subsets Entity \and Kind. To solve this problem, we frst parameterize the type of modules to make explicit what types of entities and kinds they depend on:

Defnition 5 (Parameterized Modules (Preliminary)). We defne a type of module that is parameterized by the types of entities E, kinds K, and store S:

$$f \in \mathsf{Module}[E, K] = \forall S : \mathsf{Store}[E, K]. \; E \times S \to S \qquad \qquad \square$$

Interface Store[E, K] defnes read and write operations for an abstract store type S, that restricts access to entities of type E and kinds of type K. The store interface allows us to call parameterized modules with stores containing supersets of the type of entities and kinds.

For these parameterized modules, we defne a sound lifting to supersets of entities and kinds:

```
lift : ∀E
        ′
         , K′
             , E ⊆ E
                     ′
                      , K ⊆ K′
                                , Module[E, K] → Module[E
                                                              ′
                                                              , K′
                                                                   ]
lift(f)(e
         ′
          , σ) = e
                    ′
                      match
  case e : E => f(e, σ)
  case _ => σ
```
The lifting calls module f on all entities of type E on which f is defned and simply ignores all other entities, returning the store unchanged. For example, the lifted reaching-defnitions module lift[Stmt | Var, κReachingDefs |

<sup>κ</sup>ControlFlowPred <sup>|</sup> <sup>κ</sup>Taint](reachingDefs \ ) operates on the entities Stmt and the kinds κReachingDefs | κControlFlowPred, but ignores entities Var and kinds κTaint.

The lifting preserves soundness of the lifted modules for disjoint extensions of entities.

Defnition 6 (Disjoint Extension). Entities <sup>E</sup>b′ <sup>⊇</sup> <sup>E</sup><sup>b</sup> and <sup>E</sup>′ <sup>⊇</sup> <sup>E</sup> are a disjoint extension if <sup>γ</sup>Entity(Eb) <sup>⊆</sup> <sup>E</sup> and <sup>γ</sup>Entity(Eb′ \ <sup>E</sup>b) <sup>⊆</sup> <sup>E</sup>′ \ <sup>E</sup>. ⊓⊔

In other words, the concretization function <sup>γ</sup>Entity does not mix up entities in <sup>E</sup><sup>b</sup> and <sup>E</sup>b′ \ <sup>E</sup>b.

Lemma 1 (Lifting preserves Soundness). Let <sup>f</sup><sup>b</sup> <sup>∈</sup> Module[E, K <sup>b</sup> ] and <sup>f</sup> <sup>∈</sup> Module[E, K] be a parameterized static module and dynamic module, <sup>E</sup>b′ <sup>⊇</sup> <sup>E</sup><sup>b</sup> and E′ ⊇ E be a disjoint extension of entities, and K′ ⊇ K a superset of kinds.

> If sound(f, <sup>f</sup>b) then sound(lift[<sup>E</sup> ′ , K′ ](f), lift[Eb′ , K′ ](fb)).

Proof. Let <sup>f</sup><sup>b</sup> : Module[E, K <sup>b</sup> ] and <sup>f</sup> : Module[E, K] be an analysis and dynamic module. Furthermore, let <sup>e</sup><sup>b</sup> : <sup>E</sup>b′ and <sup>e</sup> <sup>∈</sup> <sup>γ</sup>Entity(eb) be an entity and <sup>σ</sup><sup>b</sup> : Store[Eb′ , K′ ] and <sup>σ</sup> <sup>∈</sup> <sup>γ</sup>Store(σb) be an abstract and concrete store.


This lemma means that we can prove the soundness of static modules once for specifc types of entities and kinds. Later, we can reuse the modules in a compound analysis with extended entities and kinds without having to prove soundness again.

#### 4.2 Changing the Type of Properties

Next, we extend our theory to allow changing the type of properties without invalidating the soundness proofs of existing modules that use them.

For example, consider we already have a pointer-static module that propagates object allocation information Property \ [κVal] = Obj <sup>d</sup>. We may want to track string information as well. This could be done with a independent string-tracking static module with its own lattice. However, since tracking strings is mostly identical to tracking pointer information, such an additional module would duplicate signifcant amounts of code and require a new proof from scratch.

Instead, we can thus reuse the same pointer-static module to propagate string information Str <sup>c</sup> by changing its lattice to Property \ ′ [κVal] = Obj <sup>d</sup> <sup>×</sup> Str <sup>c</sup>. However, this invalidates the soundness proof of the pointer-static module as it depends on type Property \ [κVal].

To solve this problem, we generalize the type of static modules again to be polymorphic over the type Property \ :

Defnition 7 (Parameterized Modules (Final)). We defne a type of module that is parameterized by the type of entities E, kinds K, properties P, and stores S:

$$f \in \text{Module}[E, K, I] = \forall P: I, S: \text{Store}[E, K, P], E \times S \to S \qquad \qquad \square$$

Interface Store[E, K, P] restricts access to entities of type E and type K and contains properties of type P. Interface I defnes operations on properties P.

For example, a pointer-static module may depend on the Scala-like interface Objects in Listing 1.1. Interface Objects depends on a type variable Value, which refers to possible values of variables. Function newObj creates a new object of a

```
trait Objects[Value]:
 newObj(class: Class, ctx: Context): Value
 forObj[S](Value, S)(f: (Class, Context, S) => S): S
object AllocationSite \ extends Objects[Obj d]:
 newObj(class, ctx) = {(class, ctx)}
 forObj[S](Obj d(objs), σb)(f) = F
                                   (class,ctx)∈objs f(class,ctx,σb)
object AllocationSiteAndStrings \ extends Objects[Obj d × Str c]:
 newObj(class, ctx) = ({(class, ctx)}, ⊥)
 forObj[S](value, σb)(f) = value match
   case (objs,_) => F
                        (class,ctx)∈objs f(class, ctx, σb)
```
Listing 1.1: Interface for diferent Object Abstractions

certain class and context. Function forObj iterates over all such objects applying continuation f. Continuation f takes a class name, context, and store and returns a modifed store. Interface Objects can be instantiated to support diferent value abstractions. For example, instance AllocationSite \ implements the interface with an allocation-site abstraction Obj <sup>d</sup> <sup>=</sup> Obj <sup>d</sup>(P(Class <sup>×</sup> Context)) which abstracts object allocations by their class names and a call string to their allocation site. Instance AllocationSiteAndStrings \ implements a reduced product [9] of objects Obj <sup>d</sup> and strings Str <sup>c</sup> <sup>=</sup> Constant[String], which abstracts the value of strings with a constant abstraction. This allows us to reuse the same pointer-static module to propagate string information.

Note that certain interfaces may restrict what instances can be implemented. For example, an abstract domain that only approximates strings but not objects, cannot soundly implement operation forObj in interface Objects. In this case, interfaces need to be generalized to allow a wider range of instances.

#### 4.3 Soundness of Parameterized Modules

In this subsection, we defne soundness of parameterized static modules and prove a generalized soundness composition theorem.

Defnition 8 (Soundness of Parameterized static Modules). A parameterized static module <sup>f</sup>b: Module \ [E, K, I <sup>b</sup> ] is sound w.r.t. a parameterized dynamic module f : Module[E, K, I] if all their instances are sound:

$$\begin{array}{c} \mathsf{sound}(f,\widehat{f}) \; \mathsf{iff} \forall P:I,\widehat{P}:I,S:\mathsf{Sotor}[E,K,P],\widehat{S}:\mathsf{Sotor}[\widehat{E},K,\widehat{P}].\\ \mathsf{sound}(P,\widehat{P}) \implies \mathsf{sound}(f[P,S],\widehat{f}[\widehat{P},\widehat{S}]). \end{array}$$

Parameterized static modules are proven sound for all sound instances of property interface <sup>I</sup>. A static instance <sup>P</sup><sup>b</sup> : <sup>I</sup> is sound w.r.t. to a dynamic instance P : I, if all of its operations are sound. Soundness for dynamic and static instances of interface Objects in Listing 1.1 is defned as follows:

sound(newObj, newObj \ ) if <sup>∀</sup>c, <sup>h</sup>b, <sup>h</sup> <sup>∈</sup> <sup>γ</sup>(hb), newObj(c, <sup>h</sup>) <sup>∈</sup> <sup>γ</sup>Obj(newObj \ (c, <sup>h</sup>b)) sound(forObj, forObj \) if <sup>∀</sup>f, f,<sup>b</sup> sound(f, <sup>f</sup>b) =<sup>⇒</sup> sound(forObj(f), forObj \(fb))

Soundness of frst-order operations like newObj \ is similar to that of static modules (Defnition 3). Soundness of higher-order operations like forObj \ is proven w.r.t. all sound functions <sup>f</sup>b.

Finally, we generalize the soundness composition Theorem 1 to parameterized static modules. In particular, an analysis composed of parameterized static modules is sound if all of its modules are sound and the instance of its property interface is sound.

Theorem 2 (Soundness Composition for Parameterized Static Modules). Let Φ be parameterized static modules paired with corresponding dynamic modules over families of entities <sup>E</sup>b′ <sup>=</sup> S <sup>i</sup> <sup>E</sup>b<sup>i</sup> , E′ = S <sup>i</sup> Ei, kinds K′ = S <sup>i</sup> Ki, properties <sup>P</sup>b, <sup>P</sup>.

If sound(f, <sup>f</sup>b) for all (f, <sup>f</sup>b) <sup>∈</sup> <sup>Φ</sup> and sound(P, <sup>P</sup>b) then sound(<sup>Φ</sup> ′ ), where Φ ′ = {(lift[E ′ , K′ ](f), lift[Eb′ , K′ ](fb)) <sup>|</sup> (f, <sup>f</sup>b) <sup>∈</sup> <sup>Φ</sup>}

Proof. We instantiate the polymorphic modules f, <sup>f</sup><sup>b</sup> with the compound types to obtain sound[E′ , K′ ](lift(f), lift[E′ , K′ ](fb)). Then the soundness composition Theorem 3.4 for monomorphic modules applies. ⊓⊔

To summarize, in this section we explained how the type of entities, kinds, and properties can be changed without invalidating the soundness proofs of existing modules. To this end, we generalized the type of modules to be parametric over the type of entities, kinds, and properties. The parameterized modules access properties via an interface. The instances of this interface are specifc to certain types of properties and require a soundness proof.

# 5 Applicability of the Theory

In this section, we demonstrate the applicability of our theory by frst developing four analyses in the blackboard architecture and then proving them sound compositionally.

### 5.1 Case Studies

We developed four diferent analyses in the blackboard architecture (Section 2) together with their dynamic semantics (Section 2.2). We proved each analysis sound and discuss the proofs in Section 5.2. Each analysis exercises a specifc part of our soundness theory:

– A pointer analysis which mutually depends on a call-graph analysis (exercises the part of our theory presented in Section 3).

A Modular Soundness Theory for the Blackboard Analysis Architecture 373


Our choice of analyses was inspired by similar but more complex analyses for JVM-bytecode implemented in OPAL, which scale to real-world applications [21, 41]. Our analyses operate on a simpler object-oriented language with the following abstract syntax:

Class = Class(ClassName, ClassName, Field<sup>∗</sup> , Method<sup>∗</sup> )

Method = SourceMethod(MethodName, Var<sup>∗</sup> , Stmt<sup>∗</sup> ) | NativeMethod(MethodName)


```
Ref = VarRef(Var) | FieldRef(Ref, Field)
```
The language features inheritance, mutable memory, class felds, virtual method calls, and Java-like refection [35]. Refection is modeled as virtual calls to native methods. We also deliberately added features such as control-fow constructs and boolean operations. These are ignored by the analyses, but need to be modeled by dynamic semantics, complicating the soundness proof of the analyses.

We implemented and tested each analysis in Scala to ensure they are executable. Furthermore, we implemented and tested the corresponding dynamic semantics to ensure they are sensible. The code of analyses and dynamic semantics can be found in the supplementary material accompanying this paper. In the following, we discuss the implementation of each analysis in more detail.

Pointer and Call-Graph Analysis A pointer analysis for an object-oriented language computes which objects a variable or feld may point to. A call-graph analysis determines which methods may be called at specifc call sites. Pointer and call-graph analyses are the foundation which many other analyses build upon.

The analyses are composed from four static modules, whose dependencies are visualized in Figure 1. An arrow from a store entry to a module represents a read, an arrow in the other direction represents a write. Even though all modules implicitly depend on each other, they can be proven sound independently from each other (Section 3). This is possible because they do not call other modules directly, instead, all communication happens via the store.

Module method \ registers each statement of a method in the store to trigger other modules. It disregards control fow as the analysis is fow-insensitive and hence also registers statements that can never be executed. Flow-insensitive

analyses can be more performant than fow-sensitive ones, but traditional approaches using generic abstract interpreters do not allow for fow-insentitive analyses. Module pointsTo \ analyzes New expressions and assignments of variable and feld references. Module virtualCall \ resolves target methods of virtual calls based on the receiver object. Once a call is resolved, module invokeReturn \ extends the call context, assigns the method parameters and return value. Finally, it registers the called method as an entity in the store, triggering module method \ .

The entities of the analyses are felds, statements, expressions, methods, and calls:

$$\begin{aligned} \widehat{\text{Eultity}} &= (\mathsf{Field} \times \mathsf{Hape\mathsf{Ctx}}) \mid (\mathsf{Stm} \times \widehat{\mathsf{callCtx}}) \mid (\mathsf{Expr} \times \widehat{\mathsf{callCtx}})\\ &\quad \mid (\mathsf{Method} \times \widehat{\mathsf{callCtx}}) \mid (\mathsf{Call} \times \widehat{\mathsf{callCtx}})\\ \widehat{\text{Property}}[\kappa\_{\mathsf{Val}}] &= \bot \mid \widehat{\mathsf{Obj}}\\ \widehat{\text{Property}}[\kappa\_{\mathsf{Call}\mathsf{Target}}] &= \mathsf{Call}\widehat{\mathsf{Target}}\\ \widehat{\mathsf{Obj}} &= \mathsf{Obj}(\mathcal{P}(\mathsf{Class} \times \mathsf{Hape\mathsf{Ctx}}))\\ \widehat{\mathsf{callTarget}} &= \mathsf{Call}\widehat{\mathsf{Target}}(\mathcal{P}(\mathsf{Class} \times \mathsf{Hape\mathsf{Ctx}} \times \mathsf{Method} \times \mathsf{Expr}^\*)) \end{aligned}$$

Each entity is paired with a call context or heap context, which allows to tune the precision of the analysis. The static modules communicate via two kinds: Kind κVal refers to possible values of expressions and felds and the return value of methods. Values are abstract objects containing information about where objects were allocated. Kind κCallTarget refers to possible targets of method calls. Call targets are sets of receiver objects paired with the target method and their arguments.

To illustrate the analysis, Listing 1.2 shows the code of modules virtualCall \ and invokeReturn \ . They implicitly communicate with each other via the store but do not call each other directly. Module virtualCall \ resolves virtual method calls by frst fetching the points-to set of the receiver reference from the store. Afterwards, it iterates over all possible receivers and fetches possible target methods from the class table. Finally, it writes the new call target to the store. Storing the receiver object and argument expressions as part of the call target allows to reuse module invokeReturn \ for diferent types of calls. If the entity is a Call expression, module invokeReturn \ frst fetches the targets of the call from the store. Then, it iterates over all targets, extends the call context with function

```
virtualCall \ (e, σb) = e match
  case (call@Call(receiver, methodName, args), callCtx) =>
    receiverVal = \ σb((receiver, callCtx), κVal)
    forObj(receiverVal, σb) { (class, heapCtx, σb
                                                      ′
                                                       ) =>
      method = classTable(class, methodName)
      σb
       ′ ⊔ [(call, callCtx), κCallTarget 7→ newCallTarget \ (class, heapCtx, method, args)]
    }
  case _ => σb
invokeReturn \ (e, σb) = e match
  case (call@Call(_,_,_), callCtx) =>
    targets = \ σb((call, callCtx), κCallTarget)
    forCallTarget(targets,σb){(class,heapCtx,method,args,σb
                                                                ′
                                                                 ) => method match
      case SourceMethod(_,params,_) =>
        newCallCtx = extendCtx \ (call.label,heapCtx,callCtx)
        σb
         ′ ⊔ [(call, callCtx), κVal 7→ σb
                                       ′
                                        ((method, newCallCtx), κVal)]
           ⊔ [(p, newCallCtx), κVal 7→ σb
                                       ′
                                        ((a, callCtx), κVal) | (p, a) ∈ zip(params, args)]
           ⊔ [(VarRef("this"), newCallCtx), κVal 7→ newObj \ (class, heapCtx)]
           ⊔ [(method, callCtx), κVal 7→ nullPointer \ ()]
           ⊔ [(call, callCtx), κVal 7→ σb((method, newCallCtx), κVal)]
      case NativeMethod(_,_,_) => σb
                                         ′
    }
  case Return(method,expr) =>
       σb ⊔ [(method, callCtx), κVal 7→ σb(expr, callCtx, κVal)]
  case _ => σb
```
Listing 1.2: Static modules for invoking calls and resolving virtual calls.

extendCtx \ , binds the parameters to the values of the arguments and variable this to the receiver object. Furthermore, it registers the called method as an entity in the store, which in turn triggers module method \ to process the statements of the called method. Lastly, module invokeReturn \ writes the return value of a method to the method entity in the store and copies it to call entities of this method.

The modules depend on interface Objects shown in Listing 1.1 and an analogous interface for call targets. Operations newObj \ and newCallTarget \ create new abstract objects and call targets. Operations forObj \ and forCallTarget \ iterate over all objects and call targets. Interface Objects also includes an operation nullPointer \ not shown in the listing, which returns an empty set of object allocation-sites (Obj <sup>d</sup>(∅)). The dynamic instances are analogous except that they operate on singleton types.

The dynamic modules compute a program's heap and describe its changes during execution. They are analogous to their static counterparts except that they operate on singleton types Obj(Class × HeapCtx) and CallTarget(Class × HeapCtx × Method × Expr<sup>∗</sup> ).

All dynamic modules combined do not cover the entire language. In particular, there are no dynamic modules that cover refective calls. This means, as of now, the dynamic semantics of refection is undefned, and the soundness proof

only covers programs without refective calls. We address this point with the following case study.

Refection Analysis Refection is a language feature that allows to query information about classes and methods at runtime [35]. Our language supports three refective methods: Methods Class.forName and Class.getMethod retrieve classes and methods by a string, respectively. Method.invoke invokes a method, where the target method is determined at runtime. Refection is notoriously difcult to statically analyze soundly and precisely [30]: analyses need to approximate the content of the string passed into a refective call. If the analysis cannot determine the string precisely, it needs to overapproximate or risk unsoundness. In this case study, we choose the former to be able to prove the analysis sound.

This case study demonstrates two important features of our formalization: First, the refection analysis reuses all pointer and call-graph modules of the previous section (pointsTo \ , method \ , virtualCall \ , and invokeReturn \ ). It extends the value lattice to propagate new types of analysis information about strings. Even though the pointer analysis propagates new information, it does not require any changes and its soundness proof remains valid (Section 4.2). Second, the refection analysis cooperates with the call-graph static module virtualCall \ as refective calls are regular virtual calls. For example, a call m.invoke(...) where variable m is of type Method is frst resolved by virtual call resolution and its target Method.invoke is then resolved by refective call resolution. Thus, both analyses add elements to the same set of call targets but can be proven sound independently from each other (Section 3).

The refection analysis extends the Obj <sup>d</sup> values of the pointer analysis with three new types of values—Str <sup>c</sup>, Class [, and Method \ —as a reduced product [9]:

Property \ [κVal] = ⊥ | (Obj <sup>d</sup> <sup>×</sup> Str <sup>c</sup> <sup>×</sup> Class [ <sup>×</sup> Method \ ) Str <sup>c</sup> <sup>=</sup> ⊥ | String | ⊤ Class [ <sup>=</sup> <sup>P</sup>(Class) | ⊤ Method \ <sup>=</sup> <sup>P</sup>(Method) | ⊤

String values are approximated with a constant lattice. Class and method values are approximated with a fnite set of classes/methods or ⊤. We reuse the modules of the pointer and call-graph analysis by implementing a new instance of interface Objects in Listing 1.1 for the new values. The new instance is similar to AllocationSiteAndStrings \ and iterates over all allocation-site information in strings, class/method values, and other objects.

The refection analysis adds two new modules to the existing analysis in Figure 1. The new modules and their dependencies are visualized in Figure 2.

```
refection \ (e, σb) = e match
 case (call@Call(receiver, method, _), callCtx) =>
   target = \ σb((call, callContext), κCallTarget)
   forCallTarget(target, σb) { (_,heapCtx,method,args,σb
                                                          ′
                                                          ) =>
     method match
     case NativeMethod("invoke") => arguments match
         case (invokeReceiver :: invokeArgs) =>
           invokeRecVal = σb
                             ′
                              ((invokeReceiver, heapCtx), κVal)
           methodVal = σb
                          ′
                           ((receiver, callContext), κVal)
           reflectiveTarget = methodInvoke \ (invokeRecVal, methodVal,
                invokeArgs)
           σb
             ′ ⊔ [(call, callCtx), κCallTarget 7→ reflectiveTarget]
     ... }
 case _ => σb
methodInvoke \ (recv:Value \,methodVal:Value \,invokeArgs:Expr∗
                                                            ) = methodVal match
 case (_,_,_,methods) => CallTarget \ ({(c,h,m,invokeArgs) |
       (c,h)∈recv, m ∈ methods, m ∈ classTable(c)})
 case (_,_,_,⊤) => CallTarget \ ({ (c,h,m,invokeArgs) |
       (c,h) ∈ recv, method ∈ classTable(class) })
 case ⊥ => ⊥
```
Listing 1.3: Static modules and operations for refection.

Module refection \ analyzes refective calls to Class.forName, Class.getMethod, and Method.invoke. Module string \ analyzes string literals and concatenation. Listing 1.3 shows an excerpt of module refection \ for Method.invoke. Module refection \ frst fetches the targets of a call resolved by module virtualCall \ . If the call target is the native method invoke, module refection \ matches on the arguments of the virtual call to extract the receiver and arguments of the refective call target. Finally, it calls operation methodInvoke \ which returns the set of call targets.

Operation methodInvoke \ is part of an interface for refective calls. The interface contains two other operations for retrieving class names and methods. methodInvoke \ matches on the call receiver and the method value. If the method value contains a fnite set of methods, the operation checks if the receiver class has these methods and adds them as call targets. If the method value contains ⊤, the operation adds all methods of the receiver class to the set of call targets. This over-approximates the dynamic module refection where only one method is added as a call target.

The dynamic refection modules are analogous except that diferent types of values are alternatives. In contrast to Section 5.1, the dynamic pointer and callgraph modules combined with the refection and string modules now cover the entire language. Thus, the analysis is sound for all programs, even those using refection.

Field and Object Immutability Analysis The analysis of this case study computes the immutability of objects and their felds inspired by a class and feld immutability analysis by Roth et al. [41]. This information is useful for assessing the thread safety of programs, where multiple threads have access to the same objects.

This case study highlights two important features of our formalization. First, the core dynamic semantics of our language does not describe the immutability property. Therefore, we need to prove the static immutability analysis sound with respect to a dynamic immutability analysis. The case study demonstrates that the immutability concern can be encapsulated with analysis and dynamic modules, added modularly to the existing analysis and dynamic semantics, and reasoned about independently (Section 3). It is unclear how this can be achieved with a non-modular, monolithic analysis implementation. Second, the immutability analysis adds new types of entities and kinds to the store and reuses all modules of the pointer, call-graph, and refection analysis. Even though the reused modules can be called with the new entities and have access to new kinds in the store, their soundness proofs remain valid (Section 4.1).

The immutability analysis adds objects (Class × HeapCtx) to the types of entities and adds kinds κMut and κAssign for their immutability and the assignability of their felds:

Entity \′ <sup>=</sup> Entity \<sup>|</sup> (Class <sup>×</sup> HeapCtx)

Property \ [κMut] = TransitivelyImmutable \ <sup>|</sup> NonTransitivelyImmutable \ <sup>|</sup> Mutable \ Property \ [κAssign] = Assignable \ <sup>|</sup> NonAssignable \

Mutable \ describes objects whose felds are reassigned. NonTransitivelyImmutable \ describes objects whose felds are not reassigned, but some objects transitively reachable via felds are mutated. TransitivelyImmutable \ describes objects whose felds are not reassigned and no transitively reachable objects are mutated. κAssign uses two elements for reassigned and not reassigned felds.

The immutability analysis consists of three modules shown in Figure 3. Module feldAssign \ sets felds <sup>f</sup> of objects <sup>o</sup> to Assignable \ for every assignment of the form x.f = e, where <sup>x</sup> may point to <sup>o</sup>. Module feldMutability \ sets a feld to Mutable \ if the feld is assignable, to NonTransitivelyImmutable \ if it is non-assignable but one of the pointed-to objects is mutable, and to TransitivelyImmutable \ otherwise. Lastly, module objectMutability \ sets an object's immutability to the least upper bound of the immutability of all of its felds.

The dynamic modules are analogous except that they operate on concrete objects instead of abstract objects.

Demand-Driven Reaching-Defnitions Analysis As a fnal case study, we developed a demand-driven intra-procedural reaching-defnitions analysis for our object-oriented language. This case study demonstrates that our theory lifts a restriction of existing soundness theories for generic interpreters. In particular, our theory also applies to analyses that do not follow the program execution order.

The analysis computes which defnitions of variables and felds reach a statement without being overwritten. The analysis is demand-driven, as it performs

the minimum amount of work to compute the reaching defnitions of a query statement: the analysis only computes the reaching defnitions of the query statement and its predecessors. Also, the analysis does not compute the entire control-fow graph, but only the query statement's predecessors.

The analysis consists of two modules reachingDefs \ and controlFlow \ similar to these discussed in Section 2. Module controlFlow \ calculates the set of controlfow predecessors of a given statement by computing the set of control-fow exits of the preceding statement within the abstract syntax tree. For example, the control-fow exits of an if statement are the exits of the last statements of both branches. The dynamic module controlFlow computes the predecessor immediately executed before the given statement. To this end, the module remembers the most recently executed statement in a mutable variable and only updates it if the given statement is the control-fow successor.

The main challenge in this case study was to fnd a dynamic module controlFlow that closely corresponds to the static module and still computes the correct control-fow predecessor. With a suitable dynamic module, the soundness proof of the static module became easier. Furthermore, we validated the correctness of the dynamic module with several unit tests.

#### 5.2 Soundness Proofs of the Case Studies

We apply our theory to compositionally prove the analyses from the previous section sound. The proofs can be found in the supplementary material accompanying this paper. They are pen-and-paper proofs and do not make use of mechanization; but due to modularization, they are small and easy to verify.

Proving each analysis sound includes (a) proving each of its modules sound (Defnition 8), (b) proving the instances of the property interface sound, and (c) verifying that Theorem 2 applies. To ensure the latter, we checked that there are no dependencies between modules and that all communication between them happens via the store (Defnition 1). This can be easily checked by inspecting the code of the modules. Furthermore, we verifed that modules do not make any assumption about abstract domains and are polymorphic in the store (Definition 7). This can be easily checked by inspecting the polymorphic type of the modules.

To prove the individual modules of an analysis sound, step (a) in the overall soundness proof, we use two techniques. The frst uses the observation that static modules and their corresponding dynamic modules are often very similar, except for the types of entities and properties. We can abstract over these diferences with a generic module, from which we derive both a dynamic and static module. Then, soundness follows immediately as a free theorem from parametricity [28]. In cases where abstracting with a generic module is not possible or desirable, we resort to a manual proof. We were able to use the frst technique for all modules, except for method \ , reachingDefs \ , and controlFlow \ . For illustrating cases where we need manual proofs, consider the fow-insensitive static module method \ of the pointer analysis and its corresponding dynamic module method. While we could potentially derive them from the same generic module, the derived static module would be less performant, because it would trigger the analysis of parts of the code, e.g., if conditions, which our current fow-insensitive module does not. This is an example where our approach leads to more freedom in the design of static analyses than the existing approach based on a generic interpreter (Section 6.1).

The soundness proofs of the static modules are reusable across diferent analyses, because the modules can be soundly lifted to supersets of entities and kinds (Lemma 1). For example, the immutability analysis adds class entities, requiring to lift the modules of the pointer and refection analysis. Furthermore, the soundness proofs of static modules can be reused because the proofs are independent of the lattices used (Defnition 8). For example, the refection analysis reuses all modules of the pointer analysis, extending the value lattice with string, class, and method information. The soundness proofs of the pointer static modules remain valid because they do not depend on a specifc value lattice. Instead, the proofs of the pointer modules depend on soundness lemmas of the newObj and forObj operations of Objects interface.

Finally, we consider step (b) in the overall soundness proof – the soundness proof of the instances of the property interface. These instances need to be proven sound manually, as the proof cannot be decomposed any further. To prove them sound, we proved each of their operations sound. For the pointer analysis we needed to prove 7 operations sound, for the refection analysis 6 operations, for the immutability analysis 6 operations, and for the reachingdefnitions analysis 0 operations. Of these 19 operations, 13 could be proven sound trivially, requiring only a single proof step after unfolding the defnitions. The remaining 6 operations required more elaborate proofs with multiple steps and case distinctions. These include \ forObj \ from the pointer analysis, classForName \ , getMethod \ , and methodInvoke \ from the refection analysis, and getFieldMutability and joinMutability \ from the immutability analysis.

# 6 Related Work

In this section, we discuss work related to compositional and reusable soundness proofs as well as to modular analysis architectures.

#### 6.1 Theories for Compositional and Reusable Soundness Proofs

All works discussed in this subsection, including our own, build upon the theory of abstract interpretation. Abstract interpretation is a formal theory of sound static analyses, frst conceived by Cousot et al. [12] but since then has found wide spread adoption in academia and industry [13, 16, 22, 25, 33, 44]. Abstract interpretation defnes soundness of static analyses but does not explain how soundness can be proved. As we elaborate in the introduction, soundness proofs of practical analyses for real-world languages are difcult because they relate two complicated semantics often described in diferent styles. Proof attempts of such analyses often fail due to high proof complexity and efort. Furthermore, existing proofs are prone to become invalid if the static or dynamic semantics change and reestablishing proofs is laborious and complicated.

Domain constructions, such as reduced products and reduced cardinal powers [12], combine multiple existing abstract domains to improve their precision. They can be used to compose the soundness proof of operations on the abstract domain, e.g, primitive arithmetic, boolean, or string operations. However, they cannot be used to compose the soundness proof of the analysis of statements, e.g., assignments, loops, or procedure calls. In contrast, the blackboard architecture is capable to compose soundness proofs of both of these types of operations.

Darais et al. [14] developed a theory for soundness proofs, in which the static and dynamic semantics are derived from a small-step generic interpreter that describes the operational semantics of the language without mentioning details of static or dynamic semantics. The small-step generic interpreter is instantiated with reusable Galois transformers that capture aspects such as fow- or pathsensitivity and allow to change an existing analysis while preserving soundness. Galois transformers can be proven sound once and for all and their soundness proofs are reusable across diferent analyses. However, the approach does not compose soundness proofs of static semantics derived from the generic interpreter.

Keidel et al. [28] developed a theory for big-step abstract interpreters, deriving both the static and dynamic semantics from a generic big-step interpreter. The theory enables soundness composition [28, Theorem 4 and 5] if the generic interpreter is implemented with arrows [23] or in a meta-language which enjoys parametricity. But there is no theory how parts of soundness proofs can be reused between diferent analyses. Keidel et al. [27] later refned the theory by introducing reusable analysis components that capture diferent aspects of the language such as values, mutable state, or exceptions and are described with arrow transformers [23]. While components can be proven sound independently from each other, their composition requires glue code, which needs to be proven sound. Furthermore, the composition creates large arrow transformer stacks – that, unless optimized away by the compiler, may lead to inefcient analysis code. For example, a taint analysis for WebAssembly developed by using the approach depends on a stack of 18 arrow transformers.Eliminating the overhead of an arrow transformer stack of this size requires aggressive inlining and optimizations causing binary bloat and excessive compile times.

Bodin et al. [5] developed a theory of compositional soundness proofs for a style of semantics called skeletal semantics, which consists of hooks (recursive calls to the interpreter), flters (tests if variables satisfy a condition), and branches. The dynamic and static semantics are derived from the same skeleton. Also, soundness of the instantiated skeleton follows from soundness of the dynamic and static instance [5, Lemma 3.4 and 3.5]. However, their work does not describe how proofs can be reused across diferent analyses.

To recap, in all theories above the static and dynamic semantics must be derived from the same generic interpreter. This restricts what types of analyses can be derived. In particular, the static analysis must closely follow the program execution order dictated by the generic interpreter and it is unclear how static analyses can be derived that do not closely follow the program execution order. For example, backward analyses process programs in reverse order, fowinsensitive analyses may process statements in any order, and summary-based analyses construct summaries in bottom-up order. Our work lifts the restriction that static and dynamic semantics must be derived from the same artifact. static modules and corresponding dynamic modules must follow the blackboard architecture style, but else do not need to share any commonalities. This gives greater freedom as to which types of analyses can be implemented. For example, the blackboard analysis architecture has been used in prior work to develop backward analyses [17], on-demand/lazy analyses [19, 41], and summary-based analyses [21]. We also demonstrated in Section 5.1 that our theory applies to a demand-driven reaching defnitions analysis. It is unclear how such an analysis can be derived from a generic interpreter.

#### 6.2 Modular Analysis Architectures

These architectures describe how to implement static analyses modularly. Modular analysis architectures are a necessary requirement to develop theories for compositional and reusable soundness proofs. The theories give formal guarantees about proof independence, composition, and reuse.

Our work formally defnes the blackboard analysis architecture used in the OPAL framework [15,21]. In the past, OPAL has been used to implement stateof-the-art analyses for method purity [19], class- and feld-immutability [41], and call-graphs [40] for Java Virtual Machine bytecode. Furthermore, OPAL features escape analyses and a solver for IFDS analyses [21] as well as a fxpoint algorithm that parallelizes the analysis execution [20].

Prior to the work presented in this paper, no formalization of the blackboard analysis architecture and no theory for its soundness existed. Our formalization captures the core of the OPAL framework, while deliberately ignoring implementation details. For example, our formalization does not describe the fxpoint algorithm and the order in which it executes static modules to resolve their dependencies. Proving the fxpoint algorithm correct is a separate concern compared to proving analyses sound, which is the focus of our formalization. That said, our formalization covers a variety of OPAL's features described by Helm et al. [21]. For example, OPAL supports default and fallback properties for missing properties in the store. Fallback properties can be described by our formalization by adding them to the initial store passed to the fxpoint algorithm. We deliberately leave out default properties, which are an edge case in OPAL to mark properties not computed, e.g., because of dead code. They could be added to our formalization by extending analyses with a second set of static modules to be executed after the fxpoint is reached. Furthermore, OPAL supports optimistic analyses which ascend the lattice and pessimistic analyses which descend the lattice during fxpoint iteration. Both of these are covered by our formalization which describes analyses as monotone functions that ascend or descend the lattice. However, we deliberately do not cover OPAL's mechanisms for allowing interaction between optimistic and pessimistic analyses, another edge case.

Confgurable program analysis (CPA) [4] is a modular analysis architecture that describes analyses with transfer relation between control-fow nodes. CPAs can be systematically composed with reduced products. Furthermore, soundness of a component-wise transfer relation follows directly from soundness of its constituents. However, it is unclear how soundness proofs of primitive CPAs can be composed or how proof parts can be reused across analyses.

Doop [7] is a framework which describes analysis with relations in Datalog. Each relation is defned as a set of rules. These rules can be modularly added or replaced, without requiring changes to other rules. While individual analyses in Doop have been proven sound [43], the proofs are not compositional or reusable. In particular, if one rule changes, the proof becomes invalid and needs to be reestablished. This is because the proof reasons about soundness of all rules at once instead of individual rules or relations. The IncA framework [45] also describes analyses in Datalog, but allows relations over lattices instead of only sets. However, no soundness theory for its analyses exists. Similar to IncA, the Flix framework [37] describes analyses with lattice-based Datalog relations and functions. Flix proves individual functions sound with an automated theorem prover [36]. While an automated theorem prover reduces the proof efort and increases proof trustworthiness, there is no guarantee that the automated theorem prover is able to conduct a proof. Furthermore, the automated theorem prover does not establish a soundness proof of Datalog relations.

Verasco [26] is a modular analysis for C#minor [32], an intermediate language used by the CompCert C compiler [33]. Verasco is proven sound with the Coq proof assistant [3]. The soundness proof of the abstract C#minor semantics is independent of the abstract domain, which makes the proof reusable for other abstract domains. However, the abstract semantics is proven sound w.r.t. the standard concrete semantics. Thus, the proof cannot be reused for abstract semantics which approximate non-standard concrete semantics, such as information fow analyses [2] or liveness analyses [11].

Several other modular analysis architectures [24, 31, 42] do not have formal theories for soundness.

#### 6.3 Monolithic Soundness Proofs

In this subsection, we compare compositional and reusable soundness proof theories to ad-hoc monolithic proofs and discuss their trade-ofs.

Monolithic soundness proofs consider the entire analysis and dynamic semantics as a whole. This complicates the proof because there is no separation of concerns to manage the complexity of modern programming languages. Furthermore, monolithic soundness proofs are harder to maintain. In particular, whenever the analysis needs to be updated to support a new version of the language, or whenever the analysis is fne-tuned to improve precision and scalability, the soundness proof becomes invalid and needs to be reestablished. However, reestablishing the soundness proof is difcult because it is unclear which parts of the proof have become invalid and need to be updated. In contrast, compositional soundness proofs narrow the proofscope to individual modules, which decreases the proofs' complexity. Furthermore, compositional soundness proofs are easier to maintain as changes to individual modules only invalidate their particular soundness proof, while the proofs of other modules remain valid.

The main beneft of monolithic soundness proofs over compositional proofs is that analyses may be proven sound with respect to existing formal dynamic semantics.However, often no suitable formal dynamic semantics exists and analyses still have to be proven sound with respect to customly defned or modifed dynamic semantics. For example, HornDroid [8] is proven sound with respect to a custom instrumented JVM small-step semantics and Jaam<sup>6</sup> is proven sound with respect to a custom JVM semantics in form of an abstract machine [22]. Furthermore, analyses of properties not present in standard language semantics need to be proven sound with respect to instrumented dynamic semantics. For example, a static taint analysis needs to be proven sound with respect to an instrumented dynamic semantics with taint information. In contrast, compositional soundness proofs require a one-time cost of formalizing a modular dynamic semantics for a language. Once this is done, several analyses can be proven sound with respect to this dynamic semantics. Furthermore, the dynamic semantics can be modularly extended to describe new aspects such as taint information.

# 7 Future Work

In this section, we discuss limitations of our work and how these limitations can be addressed in the future.

First, our soundness theory requires that static analyses and dynamic semantics are described in the blackboard analysis architecture. It is unclear how easily existing analyses and dynamic semantics be adapted to the architecture. In Section 2.2, we showed how existing small-step dynamic semantics can be described as a module and Helm et al. [21] implemented a wide range of static analyses in the architecture. In the future, we want to investigate how other styles of static and dynamic semantics can be adapted to the architecture.

<sup>6</sup> https://github.com/Ucombinator/jaam

Second, our soundness theory requires that all static modules are sound. However, in practice static analyses are deliberately unsound due to complicated language features [34]. In the future, we want to investigate how the blackboard analysis architecture can be used to localize unsoundness. Specifcally, unsound analysis results could be tagged with the name of the module that produced them. All results derived from unsound results then propagate the tags. This way, it is always clear which results are potentially unsound and which modules caused unsoundness.

Lastly, our work has focused on soundness, i.e., analyses do not produce false-negative results. A complementary property to soundness is completeness, i.e., analyses do not produce false-positives results. No false-positive results are especially important if analyses produce warnings that are to be inspected by developers. In the future, we want to investigate if our theory can be extended to prove completeness of static analyses.

# 8 Conclusion

In this work, we developed a theory for compositional and reusable soundness proofs for static analyses in the blackboard analysis architecture. The blackboard analysis architecture modularizes the implementation of static analyses with analyses composed of independent static modules. We proved that soundness of an analysis follows directly from independent soundness proofs of each module. Furthermore, we extended our theory to enable the reuse of soundness proofs of existing modules across diferent analyses. We evaluated our approach by implementing four analyses and proving them sound: A pointer, a call-graph, a refection, an immutability analysis, and a demand-driven reaching defnitions analysis.

# 9 Data Availability

The implementation of the case studies and proofs are provided as an artifact available at https://doi.org/10.5281/zenodo.10418484.

Acknowledgments. This research work has been funded by the German Federal Ministry of Education and Research and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE.

# References

1. Afonso, V.M., de Geus, P.L., Bianchi, A., Fratantonio, Y., Kruegel, C., Vigna, G., Doupé, A., Polino, M.: Going native: Using a large-scale analysis of android apps to create a practical native-code sandboxing policy. In: 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016. The Internet Society (2016)

386 S. Keidel et al.


USA, January 1979. pp. 269–282. ACM Press (1979). https://doi.org/10.1145/ 567752.567778


USA, November 8-13, 2020. pp. 184–196. ACM (2020). https://doi.org/10. 1145/3368089.3409765


390 S. Keidel et al.

46. Taneja, J., Liu, Z., Regehr, J.: Testing static analyses for precision and soundness. In: CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, San Diego, CA, USA, February, 2020. pp. 81–93. ACM (2020). https://doi.org/10.1145/3368826.3377927

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Detection of Uncaught Exceptions in Functional Programs by Abstract Interpretation<sup>⋆</sup>

Pierre Lermusiaux and Benoît Montagu(B)

Inria, Campus universitaire de Beaulieu, Rennes, France pierre.lermusiaux@inria.fr benoit.montagu@inria.fr ,

Abstract. Exception handling is a key feature in modern programming languages. Exceptions can be used to deal with errors, or as a means to control the fow of execution of a program. Since they might unexpectedly terminate a program, unhandled exceptions are a serious safety concern. We propose a static analysis to detect uncaught exceptions in functional programs, that is defned as an abstract interpreter. It computes a description of the values potentially returned by a program using a novel abstract domain, that can express inductively defned sets of values. Simultaneously, the analysis infers the possibly raised exceptions, by computing in the abstract exception monad. This abstract interpreter has been implemented as an efective static analyser for a large subset of OCaml programs, that supports mutable data types, the OCaml module system, and dynamically extensible data types such as the exception type. The analyser has been evaluated on several hundreds of OCaml programs.

Keywords: Static Analysis · Exceptions · Higher-Order Programs · Abstract Interpretation · Abstract Domain for Trees

# 1 Introduction

Programs that run in critical environments need to comply with strong safety guarantees. The minimal guarantee one expects for critical software is the absence of runtime failures. Sound static analyses can provide such guarantees statically, for every possible execution of a program, and in a fully automatic manner.

The static typing discipline found in the ML family of languages is such a static analysis technique, that brought strong safety guarantees to programs at a very low cost: well-typed programs cannot "go wrong" [48]. This soundness theorem for well-typed ML programs, however, does not preclude programs from abruptly ending with uncaught exceptions. Several analyses for ML-like languages have been developed to detect such undesirable behaviours, that were either leveraging type and efect systems [38,54], or that were based on variants of control-fow analyses or set constraints [68,67,14,15,66]. The recent success of algebraic efects and their introduction in popular languages such as OCaml [37] has renewed the interest in the static detection of uncaught exceptions and efects.

<sup>⋆</sup> This work was funded by the Salto grant, supported by Nomadic Labs and Inria.

#### 392 P. Lermusiaux, B. Montagu

Analysing uncaught exceptions in ML is a difcult problem, because data fow and control fow are interdependent. This is not only due to the frst class nature of functions, but also due to the frst class nature of exceptions themselves, e.g., they can be taken as parameters, recorded in data structures or in mutable references. Furthermore, exceptions can carry any value as argument—including functions—and new exceptions can be dynamically generated at runtime.

In this paper, we propose a static analysis for a higher-order language, in which exceptions are frst-class values. The analysis is based on the abstract interpretation framework [9]. It is a forward value analysis that infers which values any program point can compute, and which exceptions they might raise. For this purpose, we introduce a novel abstract domain that can represent recursively defned sets of values. We defne a widening operator for this abstract domain, that is responsible for fnding recursive generalisations of solutions.

Our analysis leverages this abstract domain to represent both possible values and exceptions, thanks to the abstract exception monad. This monad—that can also be used as an abstract domain—is an abstraction of the exception monad, that collects all values and exceptions.

We defne our analysis as a big-step monadic interpreter, written in the open recursive style, that was emphasised in the "Abstracting Defnitional Interpreter" approach [11]. Then, we obtain an efective analyser by applying a generic, dynamic fxpoint solver [6,63,59,24,12,30]. We prove that our analysis is sound, under the soundness assumption of the fxpoint solver.

We extend the analysis to handle a large subset of the OCaml language. In particular, it supports the dynamic creation of exceptions, mutable state, modules and functors. The analysis is so far limited to sequential programs that do not perform system calls, do not use the Gc or Obj modules, and do not employ recursive modules, general recursive defnitions of values, objects, classes, arrays, or foats. We implemented an OCaml prototype for this analyser. It reports the possibly thrown exceptions and an over-approximation of the data they carry, along with an abstraction of the call trace that led to the program point where the exception was raised. We discuss some implementation choices, and evaluate the precision and performance of our analyser on 290 programs, that include examples from the literature and from the OCaml compiler's test suite.

# 2 Overview

Let us consider the classic example of the factorial function, as written below in a continuation passing style.

```
let rec fact_cont n i k =
  if i >= n then k i else
  fact_cont n (i + 1) (fun x -> k (x * i))
let fact n = fact_cont n 1 (fun x -> x)
let result = fact 5
```
The fact\_cont function recursively calls itself with increasing values of its parameter i, until the value n is reached.

We are interested in fnding which values (and exceptions) this program might return. To answer this question, we frst need to fnd the possible continuations the function fact\_cont can be called with, and, importantly, we need an abstract domain in which we can express this set, or an over-approximation thereof.

With the abstract domain that we introduce in §4, we can express such a set as the following abstract value:

$$\mu\alpha.\{\mathsf{furs} : \{ (\lambda x.x) \mapsto \{ \} ; \ (\lambda x.k\ (x \ast i)) \mapsto \{ i \mapsto \{ \mathsf{ints} : [1, +\infty] \} ; k \mapsto \alpha \} ; \}\}$$

This abstract value represents a recursively-defned set—as indicated by the µ constructor—that is locally named α. This set is composed of function closures, that can be either the identity function, or the function λx. k (x ∗ i), considered in an environment where the variable i is bound to an integer that is greater or equal to 1, and where the variable k is recursively bound to the local variable α, i.e., to a value of the set we are defning.

Our abstract domain can also express structural invariants on data, such as the one for red-black trees [52], that forbids red nodes from having red children:

$$\mu\alpha.\left\{\mathsf{constraints}:\left\{R:\begin{pmatrix}\{\mathsf{constraints}:\{E:\left(\right),B:\left(\alpha,\{\mathsf{ints}:\top\},\alpha\right)\}\};\\R:\left\{\begin{matrix}\mathsf{ints}:\top\end{matrix}\right\},\\\{\mathsf{constraints}:\{E:\left(\right),B:\left(\alpha,\{\mathsf{ints}:\top\},\alpha\right)\}\}&\}\\B:\left\{\alpha,\{\mathsf{ints}:\top\},\alpha\right\}\end{pmatrix}\right\}\right\}$$

Our abstract domain bears a strong similarity with the theory of equi-recursive types [56], in the sense that recursion is a core aspect of our defnition. However, it difers from recursive types, as function types are absent: sets of closures are used instead. Moreover, it is parameterised by a non-relational abstract domain used to represent integers values—which is not possible with simple type systems.

We leverage our abstract domain and defne a static analysis for a call-by-value λ-calculus with pattern matching, exception handling, and frst-class exceptions (§3). In this language, the order of evaluation is made explicit by let bindings, and pattern matching is exhaustive and non-ambiguous [8]. These requirements drastically simplify the semantics of programs and their analysis. The analysis is defned as an abstract interpreter that performs a forward value analysis (§5).

Based on this small abstract interpreter, we sketch (§6) several extensions that we implemented to obtain a static analyser for a subset of OCaml programs. The implementation uses an intermediate language that is close to the one of §3, into which we translated the OCaml typed abstract syntax tree. We evaluated the precision and performance of our analyser on 290 OCaml programs, written in a variety of styles (direct, CPS, monadic, etc.). We discuss these experimental results (§7), cover related work (§8), and fnish with conclusive remarks (§9).

# 3 A λ-calculus With Exceptions

We introduce as an intermediate language a λ-calculus with pattern matching and exception handling. Its syntax resembles the monadic normal form, where the order of evaluation is made explicit with let bindings.

Defnition 1. Given C a set of constructor symbols, we give the following inductive defnition of patterns p, q, and expressions t, u, r:

$$\begin{array}{rllll} p, & q \in \mathbb{P} ::= x & n & c(p\_1, \ldots, p\_k) & p\_1 + p\_2 & p \nmid q \\ t, & u, & r \in \mathbb{T} ::= x & n & x\_1 \ o p \ x\_2 & c(x\_1, \ldots, x\_k) \\ & & & \mid & \mu f. \lambda x. t & f \ y & \mid \text{let } x = t \ \text{in } u \mid \text{ raste } x \\ & & & \mid & \mathtt{match} \ t \ \mathtt{with} \ p\_1 \Rightarrow u\_1 \mid \cdots \mid p\_n \Rightarrow u\_n \\ & & \mid & \mathtt{display} \ t \ \mathtt{with} \ \mathtt{val} \ x \Rightarrow u \mid \mathtt{exn} \ \mathtt{y} \Rightarrow r \end{array}$$

where n is a constant integer, c is a constructor of C, op is a binary operation on integers, and where the pattern q cannot contain any complement p<sup>1</sup> \ p2.

We consider a pattern syntax and formalism inspired from [8]. The pattern disjunction p+q matches any value matched by p or q, and the pattern complement p \ q matches any value that is matched by p but not by q.

As in the OCaml typed AST, variables carry a type. We may write x<sup>τ</sup> to denote that the variable x is of type τ . Patterns are linear, i.e., sub-patterns of constructor patterns cannot share variables. All functions are recursive by default. If f does not occur in the expression t, then we write λx. t instead of µf. λx. t.

The values of this language are integer constants, constructors applied to values, and function closures, that contain an environment of values:

$$\begin{array}{lcl} v \in \mathbb{V} ::= n & \mid \ c(v\_1, \dots, v\_k) \mid \ \langle E, \mu f. \lambda x. t \rangle & \text{where } \text{dom}\, E = \text{fv}(\mu f. \lambda x. t) \\ E \in \mathbb{E} ::= [] \quad | \ \mid E, x \mapsto v \end{array}$$

Patterns induce a matching relation over values, that is described, with regard to a given environment E, by recursion on patterns:

$$\begin{array}{ccccc} x & \prec\_E & v & \iff & E(x) = v \\ c(p\_1, \dots, p\_n) & \prec\_E & c(v\_1, \dots, v\_n) & \iff & \bigwedge\_{i=1}^n p\_i \prec\_E v\_i \\ p+q & \prec\_E & v & \iff & p \prec\_E v \lor \ll\_E v \\ p \nmid q & \prec\_E & v & \iff & p \prec\_E v \land \neq q \not\le v \end{array}$$

We say that a pattern p matches a value v, denoted p ≺≺ v, if there exists an environment E such that p ≺≺<sup>E</sup> v. In such case, we write E⟨p ≺≺ v⟩ the smallest environment such that p ≺≺E⟨p≺≺v⟩ v.

Thanks to this pattern-matching formalism, we can focus on the class of programs where pattern matching is exhaustive and non-ambiguous, i.e.: In a term match t with p<sup>1</sup> ⇒ u<sup>1</sup> | · · · | p<sup>n</sup> ⇒ u<sup>n</sup> where t : τ , we require that for any value v : τ , there exists a unique 1 ≤ i ≤ n such that p<sup>i</sup> ≺≺v. The work presented in [8] shows how to disambiguate patterns, i.e., how to make any pattern match non-ambiguous. We restrict ourselves to non-ambiguous patterns, because it simplifes both the dynamic semantics and the analysis of programs.

We present in Figure 1 a call-by-value big-step semantics for our language. We write t ⇓val v to denote that the expression term t reduces to the value v, and we write t ⇓exn v to denote that the reduction of t raises an exception evaluated as v. In this language, any value can be raised as an exception. The evaluation rules are mostly standard. We briefy explain the rules for match and dispatch.

E ⊢ x ⇓val E(x) Var E ⊢ n ⇓val n Int <sup>E</sup> <sup>⊢</sup> <sup>x</sup><sup>1</sup> op <sup>x</sup><sup>2</sup> ⇓val <sup>E</sup>(x1) <sup>J</sup>op<sup>K</sup> <sup>E</sup>(x2) Op E ⊢ raise x ⇓exn E(x) Raise E ⊢ c(x1, . . . , xk) ⇓val c(E(x1), . . . , E(xk)) Const E ′ = E| fv(µf. λx. t) E ⊢ µf. λx. t ⇓val ⟨E ′ , µf. λx. t⟩ Lam E(y) = ⟨E ′ , µf. λx. t⟩ E ′ , f 7→E(y), x7→E(z) ⊢ t ⇓<sup>m</sup> v E ⊢ y z ⇓<sup>m</sup> v App E ⊢ t<sup>1</sup> ⇓val v<sup>1</sup> E, x7→v<sup>1</sup> ⊢ t<sup>2</sup> ⇓<sup>m</sup> v<sup>2</sup> E ⊢ let x = t<sup>1</sup> in t<sup>2</sup> ⇓<sup>m</sup> v<sup>2</sup> Let E ⊢ t<sup>1</sup> ⇓exn v E ⊢ let x = t<sup>1</sup> in t<sup>2</sup> ⇓exn v LetRaise E ⊢ t ⇓val v p<sup>i</sup> ≺≺ v E, E⟨p<sup>i</sup> ≺≺ v⟩ ⊢ u<sup>i</sup> ⇓<sup>m</sup> v ′ 1 ≤ i ≤ n E ⊢ match t with p<sup>1</sup> ⇒ u<sup>1</sup> | · · · | p<sup>n</sup> ⇒ u<sup>n</sup> ⇓<sup>m</sup> v ′ Match E ⊢ t ⇓exn v E ⊢ match t with p<sup>1</sup> ⇒ u<sup>1</sup> | · · · | p<sup>n</sup> ⇒ u<sup>n</sup> ⇓exn v MatchRaise E ⊢ t ⇓<sup>m</sup> v E, x<sup>m</sup> 7→v ⊢ u<sup>m</sup> ⇓m′ v ′ E ⊢ dispatch t with val xval ⇒ uval | exn xexn ⇒ uexn ⇓m′ v ′ Dispatch

Fig. 1. Big-step semantics.

The non-ambiguous pattern-matching simplifes the semantics of the term match t with p<sup>1</sup> ⇒ u<sup>1</sup> | · · · | p<sup>n</sup> ⇒ un, as only one pattern can match the value of t, and thus only one branch is considered during the evaluation.

The rule Dispatch deals with exception handling: the evaluation of the term dispatch t with val xval ⇒ uval | exn xexn ⇒ uexn frst evaluates t. If t reduces to a value, then the value branch uval is evaluated. Otherwise, if t raises an exception, the exception branch uexn is evaluated. In both cases, the value or the exception is added to the environment of the corresponding branch.

# 4 An Abstract Domain for Regular Sets of Values

In this section, we defne an abstract domain that is able to represent inductively defned sets of values of our programming language. It is parameterised over a non-relational, numeric abstract domain I, that provides a concretisation function γI : I → ℘(Z), a test for the abstract inclusion pre-order, and operations for union, intersection and widening, with the standard soundness conditions. For instance, the soundness of abstract union is stated: γI(I1) ∪ γI(I2) ⊆ γI(I<sup>1</sup> ⊔<sup>I</sup> I2).

The defnition of our abstract domain follows:


#### 396 P. Lermusiaux, B. Montagu

An abstract value, written A, describes which integers it denotes (in the feld ints), and which values whose head is a constructor it denotes (in the feld constructs), and which function closures it denotes (in the feld funs). The integer values are described by a numeric abstract domain that is taken as parameter.

The constructed values are described by a map whose keys are the possible head constructors of the values, and whose data are tuples of abstract values, that denote the possible values for all the arguments of that constructor. The constructed values might also be described by ⊤, which means that the head constructor could be any constructor, and the arguments may be any value.

Similarly, the possible function closures are described by a map that associates possible codes of the function to abstract environments. The environments map free variables of the corresponding function code to abstract values, denoting the possible concrete values of these variables. The closures might also be described by ⊤, to represent any closure made from any function code with any environment.

Finally, we can construct recursive sets of values through the use of variables α, that are introduced by the µ constructor of fxpoints.

The bottom value is {ints : ⊥; constructs : {}; funs : {}}, and the top value is {ints : ⊤; constructs : ⊤; funs : ⊤}. We may completely omit some of the felds (ints, constructs or funs) when they are associated with a bottom value.

This informal explanation is formalised in the concretisation function:

Defnition 3 (Concretisation). Assume Γ is a fnite mapping from variables to abstract values. The concretisation γ<sup>Γ</sup> : A → ℘(V) is defned by γ<sup>Γ</sup> {ints : I; constructs : C; funs : F} = γ(I) ∪ γ<sup>Γ</sup> (C) ∪ γ<sup>Γ</sup> (F), where:

$$\begin{cases} \gamma\_{\Gamma}(\alpha) = \Gamma(\alpha) \\ \gamma\_{\Gamma}(\mu\alpha.\mathsf{A}) = \mathrm{lip}\_{\subseteq} (\mathsf{A}\mathcal{S}.\gamma\_{\Gamma,\alpha:S}(\mathsf{A})) \\ \gamma(\mathsf{I}) = \gamma(\mathsf{I}) \\ \gamma\_{\Gamma}(\mathsf{C}) = \left\{ \begin{cases} c(v\_{1},\ldots,v\_{n}) \mid c \in \mathcal{C} \land \forall 1 \le i \le n, v\_{i} \in \mathsf{V} \right\} & \text{if } \mathsf{C} = \mathsf{T} \\ c(v\_{1},\ldots,v\_{n}) \left| \begin{aligned} (c \mapsto (\mathsf{A}\_{1},\ldots,\mathsf{A}\_{n})) \in \mathsf{C} \\ \forall 1 \le i \le n, v\_{i} \in \gamma\_{\Gamma}(\mathsf{A}\_{i}) \end{cases} \right. & \text{otherwise} \end{cases} \right. \\ \gamma\_{\Gamma}(\mathsf{F}) = \left\{ \begin{cases} \langle E, \mu f.\lambda x.t \rangle \mid E \in \mathcal{E} \land t \in \mathsf{T} \right\} & \text{if } \mathsf{F} = \mathsf{T} \\ \left\{ \langle E, \mu f.\lambda x.t \rangle \mid (\mu f.\lambda x.t \to \mathsf{E}) \in \mathsf{F} \land E \in \gamma\_{\Gamma}(\mathsf{E}) \right\} & \text{otherwise} \end{cases} \right. \\ \gamma\_{\Gamma}(\mathsf{F}) = \left\{ E \mid \mathrm{dom}\, E = \mathrm{dom}\, E \land \forall x \in \mathrm{dom}\, E, E(x) \in \gamma\_{\Gamma}(\mathsf{E}(x)) \right\} \end{cases}$$

The defnition is justifed by the fact that the function λS.γΓ,α:S(A) is monotonic, and thus has a least fxed point, thanks to the Knaster-Tarski theorem. This is formalised by the following lemma:

Lemma 1. Consider the inclusion order ⊆ on ℘(S), and its pointwise extension on environments Γ. For any abstract value A, the function λΓ.γ<sup>Γ</sup> (A) is monotonic.

The fact that our abstract values may represent sets of values that might not all have the same types may seem surprising, since our goal is, ultimately, to analyse strongly typed programs. The crux of the explanation lies in the fact that our abstract domain can only represent regular sets of values. If we restricted our

abstract values so that they represent homogeneously typed values, it would be difcult to represent sets of values that are induced by a non-regular recursive type—like the type of fnger trees [23]—or by generalised algebraic data types (GADTs). Indeed, one would need to fnd an over-approximation of such sets, and we would often approximate with the ⊤ abstract value. The ability to describe regular sets of values that may not have all the same type gives us more freedom, and allows to fnd more precise approximations. For instance, we can represent fnger trees as a recursive set whose values are either trees or fngers, although trees and fngers have distinct types. In practice, the ⊤ value is never produced.

We write A1[α ← A2] to denote the capture avoiding substitution. We write γ(A) for γ[](A), i.e., when the environment is empty.

The unwinding of fxpoints preserves the concretisation of abstract values.

Lemma 2 (Unwinding). γ(µα.A) = γ(A[α ← µα.A])

To defne several operations on abstract values, we restrict them to well-formed values, using the standard contractiveness property for recursive types [16]:

Defnition 4 (Contractiveness). An abstract value A = µβ1. . . . µβn.A ′ is α-contractive if n ≥ 0 and A ′ does not start with µ and is not the variable α.

Well-formedness requires that fxpoints must be contractive, that constructors are used with the correct arity, and that the environment in closures only defne bindings for the free variables of the functions.

Defnition 5 (Well-formedness). An abstract value A is well-formed when the following conditions are satisfed:


Well-formedness rules out the abstract value µα.α, whose concretisation is the empty set. Well-formedness is preserved by substitution, provided contractiveness for the substituted variable is satisfed. This ensures that unwinding fxpoints preserves well-formedness. In the rest of this article, we only consider closed, well-formed abstract values.

For any abstract value A, we can retrieve the subset of integer values (respectively, constructed values, or function closures) by unwinding the top-level µs if there are any, and eventually getting the ints feld (respectively, constructs, or funs). This is formalised in the following defnition for projection on integers:

Defnition 6 (Projection on integers). The projection on integers of a wellformed abstract value A, written A.ints, is defned as follows:

$$\begin{array}{c} \{\mathsf{ints}:\mathsf{l};\,\mathsf{constraints}:\mathsf{C};\,\mathsf{furs}:\mathsf{F}\}.\mathsf{ints} = \mathsf{l} \\ \qquad (\mathsf{\mu}\alpha.\mathsf{A}).\mathsf{ints} = (\mathsf{A}[\alpha \leftarrow \mu\alpha.\mathsf{A}]).\mathsf{ints} \end{array}$$

The defnition for projection is well founded, thanks to the contractiveness of µs: only a fnite number of unwindings is necessary. The projections A.constructs and A.funs are defned in a similar way. Projection on integers is sound, as it over-approximates the set of integers an abstract value contains:

# Lemma 3 (Soundness of projection on integers). γ(A) ∩ Z ⊆ γ(A.ints)

Projections for constructors and closures enjoy similar soundness properties.

#### 4.1 Inclusion, Union and Intersection

Following the methodology employed in the context of recursive subtyping, we defne the inclusion relation between abstract values as a co-inductive relation.

Defnition 7 (Abstract inclusion). The inclusion between abstract values, written A<sup>1</sup> ⊑ A<sup>2</sup> is defned as a co-inductive relation by the following rules:


In this defnition, the relation ⊑<sup>I</sup> is provided by the abstract domain on integers. The inclusion relation unfolds fxpoints when necessary, and otherwise compares each feld (integers, constructed values, closures) separately, by treating the fnite maps for constructed values and closures as disjunctions, i.e., by using the standard Hoare ordering. In practice, the inclusion test is implemented by transforming abstract values into graphs that resemble tree automata: each graph node corresponds to a sub-term of an abstract value, and µ-nodes create cycles. Then, it sufces to check whether one automaton simulates the other [1,31,16].

Lemma 4 (Inclusion is a pre-order). The inclusion between closed, wellformed abstract values is a pre-order, i.e., a refexive and transitive relation.

The defnitions for abstract union and intersection are defned in the companion research report [34] in a similar way, as co-inductive relations that unwind fxpoints when needed.

The abstract operations enjoy the expected soundness properties:

Lemma 5 (Soundness of abstract operations). For any closed, well-formed abstract values A<sup>1</sup> and A2:


The proof of Lemma 5 crucially relies on Lemma 2, that proves that unwinding a recursive value preserves its concretisation.

Union and intersection are implemented by translating the values into graphs, on which union and intersection are easily computed. Then, we transform them back into trees with µ nodes. Our implementation exploits the locally nameless representation [5], where bound variables are encoded as de Bruijn indices. We leverage this canonical representation by hash-consing values and memoising the operations [13]. This has proved essential to obtain acceptable performance.

#### 4.2 Widening

The widening, written A1∇A2, is a binary operator on abstract values that overapproximates the union of abstract values, and is used to approximate the Kleene fxpoint iterations. The role of the widening is central in abstract interpretation, as it serves two purposes. Firstly, the widening must fnd generalisations of abstract values, in order to fnd invariants. This part impacts the precision of the analysis, and relies on heuristics. Secondly, it must ensure the termination of the analysis, by enforcing a stability property: every widening chain must reach a limit in fnite time. This part impacts the performance of the analyser.

In our abstract domain, the widening operator is responsible for fnding regularities in abstract values and for creating µ nodes. A similar idea was used in the analysis of Prolog programs using type graphs [22], that are trees that contain cycles. Our widening draws inspiration from type graphs.

We now give the informal procedure to compute the widening of two abstract values A<sup>1</sup> and A2. It operates in two phases. The frst phase proceeds as follows:

	- If the height of Anew is not greater than the height of A1, return Anew;
		- If, for each construct and each code of closures, the maximal number of occurrences in each tree path of Anew is less than those occurrences in A1, or a user-provided threshold, return Anew;
		- Otherwise, go to the shrinking phase.

Steps 2 and 3 allow the size of abstract values to grow enough, before a shrinking phase starts. In practice, this is important to fnd precise invariants.

The shrinking phase, which takes inspiration from the widening operation of type graphs, tries to shrink Anew, by introducing µ nodes at appropriate positions to "fold the abstract value on itself". It proceeds as follows:

	- Either, the two nodes have diferent sets of head constructors or codes of functions: this means that the two nodes might difer semantically.

#### 400 P. Lermusiaux, B. Montagu

– Or, the two nodes have diferent depths in the two trees: this means that some path was followed through a µ-unwinding.

	- We search for the closest ancestor of the clashing node that is semantically larger in the sense of the pre-order. If there is such an ancestor, then we merge it with the clashing node, thus creating a cycle.
	- If no such ancestor exists, we search for the closest ancestor that has at least the same head constructors and function codes as the clashing node, then we merge it with the clashing node too.
	- If no such ancestor exists, then we return Anew unchanged, which allows the abstract values to grow.

We repeat this operation until no clashing node remains, or until a maximal number of iterations is reached. In the latter case, we truncate Anew, i.e., we replace some nodes with ⊤, so that it has the same height as A1.

In practice, we could not fnd any case where the fnal truncation is needed. We have observed that our widening operator fnds precise generalisations in practice.

# 5 An Abstract Interpreter to Detect Uncaught Exceptions

To design our abstract interpreter, we took inspiration from the "Abstracting Defnitional Interpreter" approach [11]. This methodology prescribes to derive an abstract interpreter from a concrete big-step interpreter that computes in a monad, that is a parameter of the interpreter. Furthermore, the methodology fosters the use of the open recursive style: the interpreter should be a function that takes as extra parameter the function that was intended to be called recursively.

The frst aspect—being parameterised by a monad—is motivated by the fact that one could use a monad that computes over abstractions of values. In §5.1, we present a monad that is an abstraction of the exception monad. It is also an abstract domain, and is therefore well suited to defne an abstract interpreter.

The second aspect—using open recursive style—permits the use of dynamic fxpoint solvers [59,63,12,24,6,30]. Such solvers compute post-fxpoints, i.e., overapproximations of solutions of systems of equations over abstract values, for which the set of equations might be discovered dynamically, while solving the equations. New equations can be discovered, for instance, when the control fow of a program depends on its data fow. This is the case of higher-order programs, as the function that can be called at a given call site can possibly result from a computation. We present in §5.2 our abstract interpreter as a function that computes in the abstract exception monad, and is defned in open recursive style.

#### 5.1 The Abstract Exception Monad

A big-step interpreter for a programming language with exceptions can be defned in an elegant manner using the exception monad, which we briefy recall. In the exception monad, a computation is either a success value, or an exception that carries some value—typically of type exception—from the object language.

```
type m β = Success β | Exception V
return :: β → m β
return x = Success x
                               >>= :: m β1 → (β1 → m β2) → m β2
                               (Success x) >>= f = f x
                               (Exception e) >>= f = Exception e
```
In this monad, the raise function expresses the action of throwing an exception, while the dispatch function, corresponds to the dispatch construct of our prototype language (§3), and expresses the action of catching an exception.

$$\begin{array}{ll} \mathsf{raise} :: \mathsf{V} \to \mathsf{m} \mathsf{A} \\ \mathsf{raise} \, e = \mathsf{Except} \, e \\ \end{array} \begin{array}{ll} \mathsf{display} \, \mathsf{ch} :: \mathsf{m} \, \beta\_1 \to (\beta\_1 \to \mathsf{m} \, \beta\_2) \to (\mathsf{V} \to \mathsf{m} \, \beta\_2) \to \mathsf{m} \, \beta\_2 \\ \qquad \qquad \mathsf{display} \, \mathsf{ch} \, (\mathsf{Success} \, x) \, f \, g = f \, x \\ \qquad \qquad \qquad \mathsf{display} \, (\mathsf{Except} \, e) \, f \, g = g \, e \end{array}$$

The raise function simply injects its argument into the exception case, whereas the dispatch function takes two continuations, to handle, respectively, the success case, and the exception case, by performing a case analysis on the monadic value.

We can easily defne a monad that mimics the behaviour of the exception monad, with the diference that it deals with abstractions of sets of (possibly exceptional) values, instead of mere exceptional values. The construction is based on the observation that ℘(m β) is isomorphic to ℘(β) × ℘(V), that can itself be abstracted into ℘(β) × ℘(A) by using our abstract domain for sets of values. Thus, we defne the abstract exception monad, written m<sup>♯</sup> β, as follows:

$$\begin{array}{ll} \mathsf{type } \mathsf{m}^{\mathfrak{g}} \beta = \beta \times \mathsf{A} \\\mathsf{return } \mathsf{m}^{\mathfrak{g}} :: \beta \to \mathsf{m}^{\mathfrak{g}} \beta \\\mathsf{return } \mathsf{f}^{\mathfrak{g}}B = (B, \bot) \end{array} \qquad \begin{array}{ll} \mathsf{g} \stackrel{\mathfrak{g}}{=} \mathsf{f} \stackrel{\mathfrak{g}}{=} \mathsf{m}^{\mathfrak{g}} \beta\_{1} \rightarrow (\beta\_{1} \rightarrow \mathsf{m}^{\mathfrak{g}} \beta\_{2}) \rightarrow \mathsf{m}^{\mathfrak{g}} \beta\_{2} \\\ (B, \mathsf{A}) \gg = \mathsf{f} \stackrel{\mathfrak{g}}{=} f \stackrel{\mathfrak{g}}{=} (B', \mathsf{A}') = f \stackrel{\mathfrak{g}}{=} (B', \mathsf{A} \sqcup \mathsf{A}') \end{array}$$

The return<sup>♯</sup> operation records its argument as the set of possible values, and asserts that no exception is returned: the set of possible exceptions is ⊥. The >>=<sup>♯</sup> operation retrieves the value part of its monadic argument and passes it to the continuation. The fnal value is composed of the value part that was produced by the continuation, and of the union of the exceptions that might have been raised by the monadic value and by the evaluation of the continuation. The functions return<sup>♯</sup> and >>=<sup>♯</sup> satisfy the monad laws if (⊥, ⊔) is a monoid.

The fact that m<sup>♯</sup> β is a monad does not sufce to use it in an abstract interpreter, though. We also need m<sup>♯</sup> β to be an abstract domain, i.e., one must decide when two monadic values are included in each other, and how to compute abstract unions, intersections, and widening.

Interestingly, the monad m<sup>♯</sup> β acts as an abstract domain as soon as β is an abstract domain: this is the standard cartesian product of abstract domains, where operations are defned pointwise. In practice, we only need to consider the instance m<sup>♯</sup> A, i.e., the domain of exceptional abstract values.

The remaining pieces that are needed to use m<sup>♯</sup> β in an abstract interpreter are the abstract versions of raise and dispatch. They are defned as follows:

$$\begin{array}{ll} \mathsf{raise}^{\mathsf{g}} :: \mathsf{A} \to \mathsf{m}^{\mathsf{g}} \mathsf{A} & \mathsf{display} \mathsf{cch}^{\mathsf{g}} : \mathsf{m}^{\mathsf{g}} \beta \to (\beta \to \mathsf{m}^{\mathsf{g}} \mathsf{A}) \to (\mathsf{A} \to \mathsf{m}^{\mathsf{g}} \mathsf{A}) \to \mathsf{m}^{\mathsf{g}} \mathsf{A} \\\ \mathsf{raise}^{\mathsf{g}} \mathsf{A} = (\bot, \mathsf{A}) & \mathsf{display} \mathsf{cch}^{\mathsf{g}} \ (B, \mathsf{A}) \ F \ G = F \, B \sqcup G \, \mathsf{A} \end{array}$$

The raise<sup>♯</sup> operation raises a set of possible exceptions, by recording the abstract value for exceptions in the set of possibly returned exceptions, and by returning the bottom value, since it can never return any value. It is the dual of return<sup>♯</sup> .

The dispatch<sup>♯</sup> function executes the value continuation on the set of possible values, and executes the exception continuation on the set of possible exceptions, and then returns their abstract union in the domain of exceptional values.

We can easily show that the abstract operations compute over-approximations of their counterpart in the exception monad. Assume the type β is equipped with a concretisation function γ<sup>β</sup> : β → ℘(B) for some set B. Then, we defne the concretisation for the abstract monad:

$$\begin{array}{l} \gamma\_{\mathfrak{m}^{\mathsf{f}}\beta} : \mathfrak{m}^{\mathsf{f}}\beta \to \wp(\mathfrak{m}\boxtimes) \\\gamma\_{\mathfrak{m}^{\mathsf{f}}\beta}(B,\mathsf{A}) = \{\mathsf{Success}\,b \mid b \in \gamma\_{\beta}(B)\} \cup \{\mathsf{Exception}\,v \mid v \in \gamma(\mathsf{A})\} \end{array}$$

The concretisation specifes that the frst component of monadic values form the success values, and that the second component describe possible exceptions.

The soundness results for the abstract operations show that they compute over-approximations of their concrete counterparts:

Lemma 6. The following inclusions are satisfed:

– {returnb | b ∈ γβ(B)} ⊆ γm<sup>♯</sup> <sup>β</sup>(return♯B) – {m >>= f | m ∈ γm<sup>♯</sup> <sup>β</sup><sup>1</sup> (M), f ∈ γβ1→m<sup>♯</sup> <sup>β</sup><sup>2</sup> (F)} ⊆ γm<sup>♯</sup> <sup>β</sup><sup>2</sup> (M >>=<sup>♯</sup> F) – {raisev | v ∈ γ(A)} ⊆ γm<sup>♯</sup> <sup>A</sup>(raise♯A) – dispatch m f g m ∈ γm<sup>♯</sup> <sup>β</sup><sup>1</sup> (M), f ∈ γβ1→m<sup>♯</sup> <sup>β</sup><sup>2</sup> (F), g ∈ γ<sup>V</sup>→m<sup>♯</sup> <sup>β</sup><sup>2</sup> (G) ⊆ γm<sup>♯</sup> <sup>β</sup><sup>2</sup> (dispatch♯M F G) where γ<sup>β</sup>1→β<sup>2</sup> (F) = {f | ∀X, ∀x ∈ γ<sup>β</sup><sup>1</sup> (X), f x ∈ γ<sup>β</sup><sup>2</sup> (F X)}.

#### 5.2 A Monadic Abstract Interpreter in Open Recursive Style

In this section, we describe our whole-program static analyser. It infers an overapproximation of the values that a program might compute, and the exceptions that it might raise, with the possible values they carry. Although it analyses programs that can deal with frst-class functions, it is not defned as a control-fow analyser [60], but rather as an abstract interpreter that performs a value analysis. The insight is the following: since functions are frst-class citizens in the language, a value analysis also infers an approximation of the control fow. A value analysis will indeed compute which functions may be called at every call site.

Our analyser follows the open recursive style, and has the following type:

$$(\mathbb{T} \to \mathbb{E} \to \mathfrak{m}^{\mathfrak{f}} \mathbb{A}) \to (\mathbb{T} \to \mathbb{E} \to \mathfrak{m}^{\mathfrak{f}} \mathbb{A})$$

Assuming eval :: T → E → m <sup>♯</sup> <sup>A</sup>, we defne <sup>J</sup>·<sup>K</sup> eval · :: T → E → m <sup>♯</sup> A JxK eval <sup>E</sup> =return<sup>♯</sup>E(x) <sup>J</sup>c(x1, . . . , xn)<sup>K</sup> eval <sup>E</sup> = return<sup>♯</sup>⊥ if E(xi) = ⊥ for some 1≤i≤n return<sup>♯</sup> {constructs :{c 7→(E(x1), . . . , E(xn))}} otherwise JnK eval <sup>E</sup> =return<sup>♯</sup> {ints : {n}} <sup>J</sup>x<sup>1</sup> op <sup>x</sup><sup>2</sup><sup>K</sup> eval <sup>E</sup> =return<sup>♯</sup> {ints : <sup>E</sup>(x1).ints <sup>J</sup>op<sup>K</sup> <sup>E</sup>(x2).ints} <sup>J</sup>µf. λx. t<sup>K</sup> eval <sup>E</sup> =return<sup>♯</sup> {funs : {µf. λx. t 7→ E| fv(µf. λx. t) }} <sup>J</sup>x y<sup>K</sup> eval <sup>E</sup> =if E(y) = ⊥ then return<sup>♯</sup>⊥ else F (µf. λx.t7→E′)∈E(x).funs eval t E ′′ where E ′′ = E ′ , f 7→ F, x 7→ E(y) and F = {funs : {µf. λx.t 7→ E ′ }} <sup>J</sup>let <sup>x</sup> <sup>=</sup> <sup>t</sup> in <sup>u</sup><sup>K</sup> eval <sup>E</sup> <sup>=</sup>Jt<sup>K</sup> eval <sup>E</sup> >>=<sup>♯</sup> λv. if v = ⊥ then return<sup>♯</sup>⊥ else JuK eval E,x:v <sup>J</sup>match <sup>t</sup> with <sup>p</sup>1⇒t<sup>1</sup> | · · · | <sup>p</sup>n⇒t<sup>n</sup><sup>K</sup> eval <sup>E</sup> <sup>=</sup>Jt<sup>K</sup> eval <sup>E</sup> >>=<sup>♯</sup>λv.if v = ⊥ then return<sup>♯</sup>⊥ else F 1≤i≤n (p<sup>i</sup> ≺≺<sup>♯</sup> v) >>=<sup>♯</sup> λE ′ . if E ′ <sup>=</sup> <sup>⊥</sup> then return<sup>♯</sup><sup>⊥</sup> else <sup>J</sup>t<sup>i</sup><sup>K</sup> eval E,E′ <sup>J</sup>raise <sup>x</sup><sup>K</sup> eval <sup>E</sup> =raise<sup>♯</sup>E(x) <sup>J</sup>dispatch <sup>u</sup> with val <sup>x</sup>⇒t<sup>|</sup> exn <sup>y</sup>⇒r<sup>K</sup> eval <sup>E</sup> =dispatch<sup>♯</sup> JtK eval E (λv. if <sup>v</sup> <sup>=</sup> <sup>⊥</sup> then return<sup>♯</sup><sup>⊥</sup> else <sup>J</sup>u<sup>K</sup> eval <sup>E</sup>,x:v) (λe. if <sup>e</sup> <sup>=</sup> <sup>⊥</sup> then return<sup>♯</sup><sup>⊥</sup> else <sup>J</sup>r<sup>K</sup> eval E,y:e )

Fig. 2. Defnition of the abstract interpreter.

It takes as a parameter an analyser, that represents the information that has been discovered so far on the program, and produces an analyser as output, that exploits the input analyser to produce more analysis results, that are possibly less precise. The role of the fxpoint solver is to fnd a post-fxpoint of this functional. Similar approaches—leveraging fxpoint solvers to defne static analysers—have been successfully used in other work on static analysis [22,64,50,4].

Our abstract interpreter is defned in Figure 2, where <sup>J</sup>t<sup>K</sup> eval E denotes the abstract value of type m<sup>♯</sup> A obtained by analysing the program t under the abstract environment E, and using the analysis function eval for recursive calls. Importantly, the analyser does not call eval for every recursive call. Instead, eval is only used when the analyser cannot be called on a strict sub-term. In practice, this means that eval is only used to analyse function calls. In every other place, we have the guarantee that the analysis is demanded on a strict sub-term, and a standard recursive call is performed. This strategy saves time in practice, as it lightens the burden of the fxpoint solver, that only needs to fnd post-fxpoints for function calls rather than for every program point.

To analyse a variable, we return the abstract value found in the environment.

To analyse a construct, we retrieve the abstract values for every argument, and return the corresponding abstract value for that constructor, or ⊥ if some of the argument was ⊥, because of the eager semantics.

#### 404 P. Lermusiaux, B. Montagu

The analysis of an integer returns this integer injected in the integer domain. The analysis of binary operations on integers retrieves the integer parts of the abstract values for the two arguments, and returns the result of the transfer function from the integer domain for that binary operation.

The analysis of a function mimics the concrete semantics: it returns an abstract closure composed of the code of the function and its abstract environment.

The analysis of function calls is more interesting. If the abstract value for the argument is ⊥, then we return ⊥, because evaluation is eager. Otherwise, we retrieve all the possible closures for the value at the call position, and analyse their bodies by extending their environments with the abstract value for the argument, and with the abstract closure itself (we are dealing with recursive functions). The fnal result is the union—at the level of the abstract monad—of the analyses of all the possible function bodies. Because the bodies of the functions that are analysed are not strict sub-terms of the original term x y, we perform an external recursive call to the analyser, by using the eval parameter.

The analysis of let bindings chains the analyses of its two parts, and, because evaluation is eager, checks for emptiness before analysing the second sub-term.

The pattern matching construct is analysed by frst analysing the scrutinee, and then analysing each branch of the match independently. For each branch, we retrieve the environment produced by matching the abstract value with the pattern (written p ≺≺<sup>♯</sup> v), and then we analyse the code of that branch if the matching was possible. Then, we take the union—at the level of the abstract monad—of the analysis results from each branch. Notably, the exceptions that any branch might raise are reported in the fnal result. The defnition for matching abstract values against patterns is available in the companion research report [34].

Analysing the raise construct is easy: a call to the raise<sup>♯</sup> function sufces. Finally, the analysis of dispatch amounts to calling the dispatch<sup>♯</sup> function from the abstract monad, on the analysis of the scrutinee, and on two continuations, that will analyse the codes of the two branches, if they are given non-⊥ arguments.

#### 5.3 Soundness of the Abstract Interpreter

We show that the abstract interpreter of Figure 2 is sound, in the sense that it computes an over-approximation of the behaviour of programs.

Defnition 8 (Behaviour of programs). Let S be a set of evaluation environments: EVAL<sup>S</sup> t = S E∈S {Success v | E ⊢ t ⇓val v} ∪ {Exception e | E ⊢ t ⇓exn e}

The behaviour of a program t as a function EVAL that takes a set of evaluation environments as input, and produces a set of values with a tag that indicates whether it results from normal or from exceptional evaluation.

Then, the soundness of the abstract interpreter follows:

Theorem 1 (Soundness). Assume eval is a post-fxpoint, i.e., <sup>J</sup>t<sup>K</sup> eval <sup>E</sup> ⊑ evalt E for every t and E. Then, EVALγ(E) <sup>t</sup> <sup>⊆</sup> <sup>γ</sup><sup>m</sup> <sup>A</sup>(Jt<sup>K</sup> eval E ).

Proof. We have to show that for every E ∈ γ(E), m ∈ {val, exn} and v ∈ V, if <sup>E</sup> <sup>⊢</sup> <sup>t</sup> ⇓<sup>m</sup> <sup>v</sup>, then <sup>r</sup> <sup>∈</sup> <sup>γ</sup><sup>m</sup> <sup>A</sup>(Jt<sup>K</sup> eval E ), where r = Success v when m = val, and r = Exception v when m = exn. The proof proceeds by induction on the evaluation judgement, generalising over m and E. The only interesting case is the one for function application, which exploits the induction hypothesis, the post-fxpoint property of eval and the soundness of abstract inclusion ⊑. All other cases result from the soundness of the abstract operations and from induction hypotheses. ⊓⊔

The soundness theorem assumes that eval is a post-fxpoint, i.e., <sup>J</sup>t<sup>K</sup> eval <sup>E</sup> ⊑ evalt E. This property is ensured by the soundness of the fxpoint solver, that always returns a post-fxpoint. The function eval is, indeed, the result of the fxpoint solver called on the function <sup>λ</sup>eval.λt.λE.Jt<sup>K</sup> eval E .

# 6 An Abstract Interpreter for OCaml Programs

Based on the abstract interpreter of §5, we implemented a static analyser for OCaml programs (version 4.14), that returns a map from top-level identifers of the program to their abstract values. Our prototype and its test suite (see §7) are available as a companion artefact [35].

We have implemented several optimisations, that are crucial to obtain decent performance. For example, nodes of the analysed AST are indexed by program points using unique integers as identifers. This enables efcient comparison of sub-terms and allows using efcient data structures like Patricia trees [53]. Moreover—this is of paramount importance for performance—we perform hashconsing of abstract values and memoise the operations on these abstract values.

We present in the next sections some key implementation details that we needed to analyse OCaml programs.

#### 6.1 Refnements With Respect to the Formal Presentation

The abstract interpreter we implemented follows the structure we have presented in §5.2, but implements three more refnements, that we purposely elided to follow the presentation more easily. A thorough presentation of these refnements would go beyond the scope of the current paper.

Context sensitivity. Our analyser is context sensitive: we implemented a form of call site sensitivity, that is akin to an abstraction of the call stack. Following [50], we retain full sensitivity until the list of call sites becomes maximal, i.e., when a program point appears more than once in that list, which may indicate a recursive call to some function. In addition, we always remember the last call site. In practice, the list of call sites is an additional parameter to the abstract interpreter. Following [50] again, we use this list of call sites to decide when widening on the environments should be performed: it is performed only when eval is called on a maximal list of call sites. The same list of call sites is also used to derive dynamic exception names and abstract pointers (see §6.4 and §6.5).

Flow sensitivity. Our abstract interpreter is able to exploit information that is learned when a branch in a match is taken, or when branching on an arithmetic test. For example, in the program match (x, y) with (None, \_) ⇒ x | \_ ⇒ t, our analyser is able to refne the possible environments, by taking into account that x = None in the frst branch, and that this frst branch necessarily returns the value None. This is done by performing a backward analysis of the scrutinee (x, y). This backward analysis infers an over-approximation of the environment, knowing that the scrutinee successfully matched against the pattern (None, \_).

Dynamic partitioning. Finally, we have employed a form of dynamic partitioning to avoid confating some analyses results, that could degrade precision. Based on a notion of similarity on the shapes of abstract values found in environments, we decide whether to confate contexts or not. The technique is inspired by the silhouettes used in shape analysis [39].

# 6.2 Transformation of Typed OCaml ASTs

The actual language that our interpreter takes as input is more complex than the one we presented in §3, but undoubtedly simpler than the OCaml AST. The main diferences between our intermediate language and the OCaml AST, is that we deal with only one construct for pattern matching, and only one construct for exception handling, and that those two constructs implement orthogonal features in our language. This is in contrast with OCaml's try t1 with p -> t2 and match t with p1 -> t1 | exception p2 -> t2, that confate pattern matching with exception handling. The transformation into our two constructs is mostly straightforward, and greatly simplifes the job of the static analyser.

Our intermediate language makes the evaluation order explicit using let bindings. While the evaluation order in OCaml is generally unspecifed, we did our best to mimic the choices that the OCaml compiler makes.

We added specifc application nodes for OCaml primitives. To ensure they are called with the correct arity, we inserted λ-abstractions when they were partially applied, or additional application nodes when they were given more arguments than expected. We also handled specifcally the short-circuiting primitives on boolean expressions && and ||, as they change the evaluation order.

We kept the n-ary application nodes of the OCaml AST (instead of the binary applications from §3), as this is important for the semantics of labelled/optional function arguments. Nevertheless, the transformation from the OCaml AST into our intermediate language needed a lot of care and efort. In particular, missing labelled arguments required the insertion of λ-abstractions, which can be particularly subtle when interacting with optional arguments.

#### 6.3 Pattern Disambiguation

The last major diference between OCaml and our intermediate language is the exhaustive and non-ambiguous requirements on pattern matching. These

properties not only simplify the semantics of our intermediate language, but also facilitate the analysis of programs. Indeed, each branch of the pattern-matching can be analysed independently of the other ones, whereas in OCaml, branches must be considered in order, until one pattern matches the inspected value. The OCaml type-checker still provides warnings to verify the utility of each branch and the exhaustiveness of the overall pattern matching.

Enforcing exhaustive and non-ambiguous pattern matchings in OCaml would require to use of cumbersome patterns, and, furthermore, it is not always possible to write such patterns in OCaml. It is, indeed, allowed to match on values whose types may have an infnity of constructors, e.g., arrays, strings, or extensible variant types (see §6.4 for details). To reach these requirements, we extend the language of patterns with a complement p \ q [8]. A value v matches a pattern p \ q if and only if it matches p but not q. In an ordered pattern matching match t with p<sup>1</sup> ⇒ u<sup>1</sup> | · · · | p<sup>n</sup> ⇒ un, we can express that the value v of the term t matches the i th pattern, unambiguously. It sufces to add that v does not match any of the preceding patterns p<sup>j</sup> with j < i, i.e., v matches p<sup>i</sup> \ (Σp<sup>j</sup> ) ≺≺ v.

The method presented in [8] shows how to solve the disambiguation problem [32]. It relies on the notion of pattern semantics <sup>J</sup>p<sup>K</sup> that is the set of values matched by a pattern: <sup>J</sup>p<sup>K</sup> <sup>=</sup> {<sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>|</sup> <sup>p</sup> ≺≺ <sup>v</sup>}. The idea is to reduce any pattern p into a purely disjunctive pattern q, i.e., a pattern containing no complements \, while preserving its semantics : <sup>J</sup>p<sup>K</sup> <sup>=</sup> <sup>J</sup>qK. The reduction relies on rewriting rules that correspond to algebraic laws of set theory: a constructor c behaves like a labelled cartesian product, the disjunction + like set union, and the complement \ like set diference. Note that the pattern language proposed in §3 confates the diferent forms of OCaml constructors (constructor variant, polymorphic variant, records, arrays and tuples) as they behave similarly w.r.t. to their semantics.

In order to fully reduce a pattern, the method also relies on the observation that a variable x<sup>τ</sup> of a variant type τ must be matched by a value whose head is a constructor of the type τ . Therefore, the semantics of this variable x<sup>τ</sup> can be described as the union of semantics of all constructor instances of <sup>τ</sup> : <sup>J</sup>x<sup>τ</sup> <sup>K</sup> <sup>=</sup> S c∈C<sup>τ</sup> <sup>J</sup>c(z1, . . . , zn)K, where <sup>C</sup><sup>τ</sup> is the fnite set of constructors of co-domain τ . Similarly, the utility [40] approach, implemented in the OCaml compiler, relies on the ability to enumerate all the constructors of a type to provide a non-ambiguous description of the useful patterns. For types that may not be fnitely described, the semantic approach can still be used to partially reduce the complements [7]. We keep anti-patterns—patterns of the form x \ q where q contains no complements—when there exists a value v such that x\q≺≺v.

Finally, to guarantee the exhaustiveness of pattern matching, it sufces to add a rule z \ (p<sup>1</sup> +· · ·+pn) ⇒ raise Match\_failure when necessary. Again, generating such a non-ambiguous rule, for data types that may not be fnitely described, is only possible thanks to pattern complements.

#### 6.4 Dynamic Exceptions

The exception type in OCaml is an extensible variant type: it can be dynamically extended with new variant constructors. This means that new exception con408 P. Lermusiaux, B. Montagu

t ::= . . . | let exception c of τ<sup>1</sup> ∗ · · · ∗ τ<sup>n</sup> in t | let exception b = c in t v ::= . . . | d | d(v1, . . . , vk)

$$\begin{array}{c} E(\overline{c}) = d\\ S; E \vdash \overline{c}(x\_1, \ldots, x\_k) \Downarrow S; (E(x\_1), \ldots, E(x\_n))\\ S \nrightarrow \{d\}; E, \overline{c} \vdash d \vdash t \Downarrow S'; v\\ S; E \vdash \textbf{let } \mathtt{exception } \overline{c} \text{ of } \tau\_1 \star \cdots \star \tau\_n \text{ in } t \Downarrow m \text{ } S'; v\\ S; E, \overline{b} \vdash d \vdash t \Downarrow s \text{ } \Downarrow S'; v\\ \hline S; E \vdash \textbf{let } \mathtt{exception } \overline{b} = \overline{c} \text{ in } t \Downarrow m \text{ } S'; v\\ \end{array} \\ \begin{array}{c} E; E, \overline{b} \Rightarrow d \vdash t \Downarrow s \text{ } \forall \downarrow \text{m } S'; v\\ S; E \vdash \textbf{let } \mathtt{exception } \overline{b} = \overline{c} \text{ in } t \Downarrow m \text{ } S'; v\\ \end{array} \\ \begin{array}{c} \mathtt{A} ::= \{\ldots, \uparrow \text{name } \mathtt{es} \lor \mathtt{V} \} \mid \alpha \mid \mu \alpha \text{ } \mathtt{A} \qquad \text{(Abstract value)}\\ \text{(Abstract names)}\\ \end{array} \\ \begin{array}{c} \mathtt{[let  $\mathtt{exception } \overline{c}$  of  $\tau\_1 \ \cdots \ \star \tau\_n$  in  $t \Downarrow \mathtt{E}$ } \mathtt{exception } \overline{c} \text{ in } t] \textbf{[}\mathtt{E} \Rightarrow \{\mathtt{let} \} \mid \mathtt{exms } \mathtt{es} \langle \mathtt{c}, \delta \rangle\} \\ \end{array}$$

Fig. 3. Changes to support dynamic exception naming (excerpts).

structors are dynamically generated during the execution of programs. Although this section focuses on the exception type, the techniques we present apply to any extensible variant type as well.

To model the dynamic behaviour of type extension, we introduce dynamic constructors, written c, that, unlike static constructors c, are dynamically associated to a variant name d during the evaluation. We update the language of §3 and its semantics to support these dynamic constructors (Figure 3).

The let exception c of τ<sup>1</sup> ∗ · · · ∗ τ<sup>n</sup> in t construct defnes the new exception constructor c, that is dynamically bound to a fresh variant name in the subterm t. The exception alias construct let exception b = c in t defnes the exception constructor b, that is bound in the sub-term t to the variant name of c. Constructed values can now have a dynamic variant name as their head constructor.

To account for the generative aspect of dynamic constructors, the evaluation rules now carry an execution state S, that contains the set of the already generated variant names. These are akin to the time-stamps from the CFA literature [25,44], that are used to allocate data in memory locations. In the analysis, we use an overapproximation δ of the list of call sites—that we used already in §6.1 to control the widening strategy—to give abstract names (c, δ) to dynamic constructors.

Finally, as the variant name of an exception constructor is resolved dynamically, the pattern matching relation depends on the evaluation environment E: c(p1, . . . , pn) ≺≺ d(v1, . . . , vn) if and only if E(c) = d, and p<sup>i</sup> ≺≺ v<sup>i</sup> for all i ∈ [1, n].

As the exception type is extensible, a fnite number of constructor patterns never forms an exhaustive set of patterns for the exception type. Therefore, the utility approach on pattern matching [40] used in OCaml for exhaustiveness checking cannot provide an exhaustive list of non-ambiguous counter-examples: that list is not known statically. In contrast, the disambiguation approach from §6.3 is particularly well suited to such types, by leveraging anti-patterns [7]. Moreover,

$$\begin{array}{lcll} t & ::= \ldots \bot & \{f\_{1} = x\_{1}; \ldots; f\_{n} = x\_{n}\} \quad \mid \ x.f \mid \ x.f \leftarrow y \\ v & ::= \ldots \bot & \ell \\ S & ::= \{\ell \mapsto r\_{1}; \ldots; \ell\_{n} \mapsto r\_{n}\} & \text{(Memory happens)} \\ r & ::= \{f\_{1} \mapsto v\_{1}; \ldots; f\_{n} \mapsto v\_{n}\} & \text{(Record blocks)} \\ \end{array}$$

$$\begin{array}{lcll} \ell \not\preceq \mathtt{dom} S & S' = S, \ell \mapsto \{f\_{1} \mapsto E(x\_{1}); \ldots; f\_{n} \mapsto E(x\_{n})\} & \text{ALLO} \\ S; E \vdash \{f\_{1} = x\_{1}; \ldots; f\_{n} = x\_{n}\} \Downarrow S; \ell \end{array} \text{ALLO}$$

$$\begin{array}{lcl} E(x) = \ell & S(\ell) = \{f\_{1} \mapsto v\_{1}; \ldots; f\_{n} \mapsto v\_{n}\} & 1 \le i \le n \\ S; E \vdash x, f\_{i} \Downarrow S, v\_{i} & \\ \end{array} \text{GETField}$$

$$\begin{array}{lcl} \Sigma(x) = \ell & S(\ell) = \{f\_{1} \mapsto v\_{1}; \ldots; f\_{n} \mapsto v\_{n}\} \\ \hline 1 \le i \le n & S' = S, \ell \mapsto \{f\_{1} \mapsto v\_{1}; \ldots; f\_{i} \mapsto E(y); \ldots; f\_{n} \mapsto v\_{n}\} \\ \hline & S; E \vdash x, f\_{i} \leftrightarrow v\_{i} \Downarrow S'; \langle \rangle \\ \hline \end{array}$$

$$\begin{array}{lcl} \Lambda ::= \{\ldots, \{\ldots, \{\xi\_{1}^{i}, \ldots, \xi\_{n}^{i}\}\} \quad \text$$

::= {f<sup>1</sup> 7→ A1; . . . ; f<sup>n</sup> 7→ An} (Abstract record blocks)

Fig. 4. Changes to support mutable records (excerpts).

the equality of two exception constructors b and c of the same arity can only be resolved dynamically. Therefore, there is no way to statically prove, or disprove, the utility of a pattern b(q1, . . . , qn) against a pattern c(p1, . . . , pn). On the other hand, in our pattern formalism, we can simply write b(q1, . . . , qn) \ c(p1, . . . , pn) to guarantee the non-ambiguity between the two.

#### 6.5 Mutable Records and Global State

OCaml supports mutable records. While immutable records can be modelled in the programming language of §3 in the form of constructs—an immutable record is a variant with a single case—mutable records require extending the semantics with a global memory heap S (Figure 4).

Heaps are maps from memory locations ℓ to record blocks. Record blocks are structured memory blocks, that contain values for all the registered felds of the record. The standard notion of reference can be modelled as a mutable record with a single feld. This is exactly how the type of references is defned in OCaml.

We adapt the big-step semantics in a standard way, so that it takes a heap as input and returns an updated heap as output. The evaluation rules for record creation, access, and update, either query or modify the memory heap as expected.

OCaml features pattern matching on mutable records. We adapt the rules for pattern matching, so that matching on a mutable record frst queries the memory heap to retrieve the values for the felds of the record, before matching continues.

To analyse programs that involve mutable records, we add a new feld to abstract values, that contains the possible abstract locations ℓ <sup>♯</sup> a value might be equal to. Abstract locations denote sets of concrete locations. Similarly to the

dynamic extension constructors of §6.4, fresh abstract locations are chosen by following a naming scheme that is based on the abstract call stack.

The abstract interpreter is easily adapted to support global state, by lifting the abstract exception monad to the state monad, where states are abstract heaps. Abstract heaps map abstract locations to abstract record blocks, that themselves map record felds to abstract values. The operations on abstract heaps and the transfer functions on records are standard, and elided from the presentation.

#### 6.6 Modules and Functors

The OCaml language includes an expressive module system [36], that supports hierarchical structures, higher-order functors, and frst-class modules. In this section, we give the reader the main insights for the analysis of OCaml modules.

First, we consider an untyped semantics of modules, i.e., we do not propagate type information. In particular, we do not take type abstraction boundaries into account. We carefully to keep track of module coercions, however: signature ascriptions may have, indeed, a computational content, as they can remove some module felds. Coercions are automatically applied at functor applications to "reshape" the functor argument. Coercions distribute on functors, contravariantly on their formal arguments, and covariantly on their results.

Embracing further the untyped nature of our approach, we made the choice of having a single class of values, that comprises both values from the core language and values for module structures and functor closures. This simplifes both the concrete semantics (for example, transfers from the module language to the core language and back are no-ops), and the design of the abstract domain. As we sketched in the previous sections, it sufces to add new felds to abstract values to describe the possible structures and functor closures.

We represent structures as unordered records, i.e., maps from feld names to values. Functor closures hold the functor code, an environment, and coercions for the argument and the result, that shall be applied when the functor is called.

Importantly, the support of dynamic exceptions (§6.4) was required to support functors, since an exception might be declared in a functor's body: this leads to the creation of a fresh exception every time this functor is instantiated.

The analysis functions for the core language and the module language, of types T → E → A and M → E → A, are mutually recursive. Still, the approach of using a fxpoint solver to defne our abstract interpreter remains applicable. The two functions can be transformed into a single function of type (T+M) → E → A, then given to the solver, and split back into two functions. Our untyped approach was again crucial, as we could keep a single type of abstract values, and a single type of abstract environments, which made the previous transformation possible.

# 7 Experiments

We tested our prototype analyser for OCaml programs on 290 programs, that range from small, manually written programs, to larger examples extracted


from the literature or from the OCaml compiler's test suite. The test programs include some classic functions such as the factorial program from §2, Takeuchi's function, McCarthy's 91 function, fxpoint combinators, programs that compute over church numerals, transformations of abstract syntax trees for arithmetic expressions or logical formulas, and the algorithm for Knuth-Bendix completion of rewriting systems. The test suite covers a large array of coding styles, e.g., direct style, continuation-passing style, monadic style, or imperative style, and exhibits diferent language features, e.g., assertions, exception-based control fow, GADTs and non-regular types, polymorphic recursion, second-order polymorphism, etc.

We present in Table 1, a selection of the test results on some key examples. The complete test results are reproducible via the companion artefact [35]. The experimental results are encouraging, both in terms of performance and precision.

In terms of precision, our analyser infers the best achievable abstract values on several programs: For McCarthy's 91 function mc91, the result is shown to be greater than 91; for the skolemisation of logical formulas skolemize, the analyser correctly infers the form of returned terms, i.e., they cannot contain existential quantifers. For other programs, the analyser only infers an over-approximation: for the red\_black\_tree program, it correctly infers the general shape of trees, but cannot infer the structural invariant that no red node has red children.

The map\_merge example calls the Map.Make functor of fnite maps from the standard library, builds several maps, and calls the merge function on those maps, that merges the maps. The merge function has the following signature:

val merge: (key -> 'a option -> 'b option -> 'c option) -> 'a M.t -> 'b M.t -> 'c M.t

Its frst argument specifes what should be done when a key/value pair is found in one of the maps, or in both. This argument is never called for keys that are absent in both maps, i.e., the case where the second and third arguments are both equal to None is unreachable. OCaml programmers often write assert false in the corresponding pattern matching branch. The analyser infers that the Assertion\_failure exception is never raised, which means that this branch cannot be reached. The analyser cannot show, however, that every assertion present in Map.Make is satisfed: in the re-balancing function for pseudo-balanced trees, assertion failures are reported, because the analyser cannot infer that the heights that are recorded in the trees are strictly positive.

In terms of performance, most examples, and even some large programs, are analysed in a couple of seconds, or in less than a second. In contrast, some examples like boyer need approximately one hour for the analysis to terminate. boyer is a tautology checker, that is run on a large formula (its defnition takes about 1000 lines). This formula, of mutable type, requires the creation of several hundreds of abstract pointers, which makes abstract operations on abstract heaps very costly. If we reduce context sensitivity to "the last call site", fewer abstract pointers are created, and the analysis completes in 31 s. This suggests that context sensitivity choices for naming abstract pointers need further investigation.

Our experiments show that the minimisation of abstract values during widening and unions (§4.2) may impact performance positively or negatively. For instance, for AST transformations like skolemize and negative\_normal\_form, minimisation decreases the analysis time from about 45 m down to a few seconds. For boyer, however, minimisation incurs a heavy cost, as it doubles the analysis time. Further investigations are needed to reduce the cost of minimisation.

# 8 Related Work

The static detection of uncaught exceptions for ML programs has been the topic of many related work. We only discuss a selection of them, and some results on static analysis of functional programs that are also relevant to the current work.

Set Constraints. Several static analyses for functional programs were based set constraints [21]. The principle is to transform a program into a constraint, that features unions, intersections, negations, and a form of conditional constraint. Then, the constraint is simplifed and given to a solver, from which the analysis result is obtained. Fähndrich and his coauthors built a exception analysis tool that infers types and efects for SML programs [14,15] using the BANE constraint analysis engine, using a mix of set constraints and type constraints.

Type and Efect Systems. Pessaux and Leroy have developed ocamlexc [38,54,55], a tool that detects uncaught exceptions in OCaml programs. They use a type and efect system to analyse programs modularly. Their analyser extends unifcation-based type inference, and makes use of row variables [57] and polymorphism to produce precise types for functions. They type variants

structurally using equi-recursive types. Recursion may also occur through the efect annotations on arrow types. They also describe an algorithm to improve the accuracy of their analysis, that uses polymorphic recursion for row variables. The programming language Koka [33] also leverages row variables to type algebraic efects. Recently, de Vilhena and Pottier [62] devised a type system based on row variables for a language that supports the dynamic creation of algebraic efects.

Control-Flow Analyses. An important family of analyses for higher-order programs are control-fow analyses (CFA) [60,65,51,45,19]. The goal of CFA is to determine which functions might be called at a call site, and on which arguments. CFA can be expressed as instances of abstract interpretation [46,44,47,50]. CFA can easily be extended to analyse exceptions. Yi developed an abstract interpreter that detects uncaught exceptions in SML [68,67,66]. It implements an analysis that is close to a 0-CFA analysis extended to support exceptions.

Abstract Domains in CFA. Most previous work on CFA share a common representation for abstract values: Although they need to represent some inductively defned sets, they refrain from using a native device to express fxpoints, such as our µ constructor. Instead, cyclic defnitions are encoded using indirections through abstract pointers, that point to an abstract heap. For example, the inductive set of continuations from §2 is expressed as follows in CFA domains:

$$\begin{array}{l} \{\mathsf{furs} : \{ (\lambda x. x) \mapsto \{\} ; \ (\lambda x. k \ (x \ast i)) \mapsto \{i \mapsto p\_i ; k \mapsto p\_k \} \} \} \\ \text{where:} \quad \hat{h}(p\_i) = \{\mathsf{int} : [1, +\infty] \} \\ \hat{h}(p\_k) = \{ (\lambda x. x) \mapsto \{\} ; \ (\lambda x. k \ (x \ast i)) \mapsto \{i \mapsto p\_i ; k \mapsto p\_k \} \} \end{array}$$

In this abstract value, the closures' environments contain the pointers p<sup>i</sup> and pk, that are defned in the abstract heap hˆ. This abstract heap contains a cycle, since p<sup>k</sup> is used in the defnition of the abstract value pointed by pk. This is in contrast to our approach, where we make use of µ nodes to introduce cycles directly, without referring to a heap. We only use the abstract heap for mutable data. In CFA domains, all data (constructs, closures, etc.) are "abstractly allocated" in the abstract heap, regardless of whether they are mutable or not.

A beneft of the approach with heap indirections is that abstract values have a bounded height, and cycles need no special treatment: The equality of abstract pointers is used to compute on abstract values. While this makes the operations of CFA abstract domains easy to defne, using pointer names limits drastically the detection of semantically equivalent values. We argue that our approach allows to detect more semantics inclusions, therefore decreasing the number of iterations of the analysers, at the cost of more complex abstract domain operations.

Tree Grammars. Several analyses for functional languages have been defned using tree grammars. For example, Reynolds [58] defned an analysis for pure frstorder LISP using data sets, i.e., tree grammars that denote the possible outputs of function symbols. Extended tree grammars, i.e., grammars with selectors of the form X → Y.hd, have been used by Jones and his coauthors to analyse full LISP

#### 414 P. Lermusiaux, B. Montagu

[28], and, later, strict and lazy λ-calculi [26,27]. From a λ-term, they produce tree grammars with selectors, that denote the possible inputs and outputs of function symbols. Selectors can then be eliminated in order to simplify the grammars. Deterministic tree grammars have been identifed as an abstract domain to recast analyses based on set constraints into the abstract interpretation framework [10].

Tree Automata. Generalising string automata, tree automata are an established formalism to represent sets of trees. They have been used to defne static analysers for term-rewriting systems (TRSs) [3] and higher-order programs [20]. They have been extended to lattice tree automata to support arbitrary non-relational abstract domains at their leaves [17,18], and improve the performance of analysers for TRSs. Recently, tree automata were combined with relational numeric abstract domains [29], to express relations between scalar data contained in trees. Recent work report on the design of relational domains for algebraic data types [2,61].

Cyclic Abstract Domains. Type graphs [22] are a form of deterministic tree grammars, that are represented as cyclic graphs with no sharing, i.e., trees with cycles. They have been used to analyse Prolog programs. We used a similar graphbased representation as an intermediate form to compute union, intersection and widening. We use, however, a term-based representation with binders as our main representation, as it allows easy and efcient hash-consing and memoisation [13]. Our widening operator (§4.2) is inspired by the one from type graphs.

Mauborgne [42,43,41] studied graph-based abstract domains for sets of trees, and defned ways to have minimal, canonical representations of such abstract values. Using Mauborgne's structures natively could improve our analyser's performance, as we could avoid translating back and forth from terms to graphs.

Finally, recursive types [56] were a strong inspiration for the abstract domain of §4. Recursive types have been thoroughly studied in the context of subtyping [16,31,1], where polynomial algorithms have been devised to decide inclusion. They proceed by translating types into variants of tree automata, that can also deal with the contravariance of arrow types.

Fixpoint Solvers. To the best of our knowledge, Le Charlier and Hentenryck [6] were the frst to exploit a dynamic fxpoint solvers to defne static analysers. They used the top-down solver to analyse Prolog programs. The same approach has been followed for the Goblint static analyser for C programs [64,59], and for the analysis of WebAssembly programs [4]. Recent work introduced combinators to defne dynamic fxpoint solvers in a modular manner [30]. Several dynamic fxpoint solvers have been successfully formally verifed [24,63].

# 9 Conclusive Remarks and Future Work

We have introduced a λ-calculus that features pattern matching primitives and exception handling, in which exceptions are frst-class citizens. We have presented a static analysis for this language, in the form of a monadic abstract interpreter, that can be used as an efective static analyser. This analyser detects uncaught exceptions, and provides a description of the values that a program may return. The abstract interpreter relies on a generic abstract domain, that is parameterised over a domain for scalars, and that can represent regular sets of values of our programming language. This is achieved by a fxpoint constructor in the syntax of abstract values, that denotes an inductive set of values.

The abstract interpreter is defned in an open recursive style, where the recursive knot is tied by calling a dynamic fxpoint solver. Importantly, the analyser does not call the solver for every recursive call: it performs standard recursive calls on strict sub-terms, but calls the solver to analyse function calls.

Based on this approach, we implemented a static analyser for OCaml programs. We presented some extensions of our formalism to support several core features of OCaml, including dynamic generation of exceptions, mutable records, the module system. Our analyser starts with transforming the OCaml typed AST into a simpler language where evaluation order is explicit. This transformation required a lot of care and demanded a substantial implementation efort. One key aspect of this transformation is the disambiguation of pattern matching, as we chose to work with an exhaustive and non-ambiguous pattern matching primitive in order to simplify the analysis of programs.

Our experiments on 290 OCaml programs show some encouraging results, both in terms of performance and precision. Still, some improvements are needed for the analysis to be applicable to larger code bases. In particular, the minimisation of abstract values requires some more study and fne tuning: while it plays a crucial role to analyse some examples in a reasonable time, it can also severely undermine the analyser's performance in some other cases.

At the moment, the analyser can deal with whole programs only. To analyse libraries more modularly, we plan to experiment with generating abstract values that over-approximate the inputs of a library's function, based on their types. In the near future, we also plan to extend the analyser with OCaml features that are yet to be supported (e.g., arrays, laziness, foats, objects, recursive modules, interactions with the operating system, etc.), most of which will require substantial formalisation and implementation eforts. Recently introduced features, such as algebraic efects and one-shot continuations, are also on our agenda, and are likely to raise interesting challenges.

Finally, we hope that our abstract interpreter can be extended to perform other kinds of static analyses for OCaml programs, such as a purity analysis, or the detection of whether the behaviour of a program might depend on the order of evaluation. We would also like our implementation to serve as a basis for experimenting with recent relational domains for trees and scalars [29,61,2], and with relational analyses of functional programs [49].

Data-Availability Statement. The companion artefact [35] is hosted on the Zenodo platform and referenced by the DOI 10.5281/zenodo.10457925.

#### 416 P. Lermusiaux, B. Montagu

# References


420 P. Lermusiaux, B. Montagu


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Formalizing Date Arithmetic and Statically Detecting Ambiguities for the Law

Raphaël Monat<sup>1</sup><sup>⋆</sup> , Aymeric Fromherz<sup>2</sup><sup>⋆</sup> , and Denis Merigoux<sup>2</sup> (B)

<sup>1</sup> Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France 2 Inria Paris, Paris, France {raphael.monat,aymeric.fromherz,denis.merigoux}@inria.fr

Abstract. Legal expert systems routinely rely on date computations to determine the eligibility of a citizen to social benefts or whether an application has been fled on time. Unfortunately, date arithmetic exhibits many corner cases, which are handled diferently from one library to the other, making faithfully transcribing the law into code error-prone, and possibly leading to heavy fnancial and legal consequences for users. In this work, we aim to provide a solid foundation for date arithmetic working on days, months and years. We frst present a novel, formal semantics for date computations, and formally establish several semantic properties through a mechanization in the F<sup>⋆</sup> proof assistant. Building upon this semantics, we then propose a static analysis by abstract interpretation to automatically detect ambiguities in date computations. We fnally integrate our approach in the Catala language, a recent domainspecifc language for formalizing computational law, and use it to analyze the Catala implementation of the French housing benefts, leading to the discovery of several date-related ambiguities.

Keywords: Verifcation, Semantics, Abstract Interpretation

# 1 Introduction

From flesystems to web servers, time representations are pervasive in modern computer systems. While several libraries and standards were proposed throughout the years, current well-established approaches such as Unix time [53] used in the standard C library or Windows' FILETIME [36] represent dates and time as a number of seconds or nanoseconds that have elapsed since an arbitrary date.

This approach is sufcient for many usecases, in particular when dates are only used for logging purposes, or for determining the chronology of two events. However, it does not permit more complex arithmetic, for instance the addition of months or years, that span a variable number of days. For these usecases, mainstream programming languages ofer diferent libraries that adopt diferent conventions. For example, Python's datetime module [46] forbids the addition of months, while Java's java.time library [43] silently rounds invalid dates onto the largest pre-existing date, hiding ambiguous computations from programmers.

<sup>⋆</sup> R. Monat and A. Fromherz—Equal contribution.

#### 422 R. Monat, A. Fromherz and D. Merigoux

Given the variety of libraries and behaviors across languages, programming with date arithmetic is thus highly error-prone, and developers' assumptions about how dates behave might vary from project to project. When developing systems whose correctness is critical and that heavily depend on date computations, such as expert legal systems that rule our social and fnancial lives, this issue becomes highly concerning. As an example, consider the following excerpt from Section 121 of the US Internal Revenue Code [25], which defnes the "Exclusion of gain from sale of principal residence".

In the case of a sale or exchange of property by an unmarried individual whose spouse is deceased on the date of such sale, paragraph (1) shall be applied by substituting "\$500,000" for "\$250,000" if such sale occurs not later than 2 years after the date of death of such spouse and the requirements of paragraph (2)(A) were met immediately before such date of death.

This paragraph diferentiates between two cases, depending on whether a sale occurred not later than 2 years after a given date. While applying this paragraph is straightforward in most real-world cases, corner cases raise interesting questions. In particular, when considering leap years, what should be the result of adding two years to February 29th? When manually computing taxes, lawyers would be able to detect the ambiguity, and to reach a decision based on legal precedents. If handled automatically by a computer however, the computation may be done incorrectly; computing February 29 2004 + 2 years in Java using java.time would return February 28 2006, while performing the same computation using the date utility from Coreutils returns March 1 2006.

Similar computations are pervasive in expert legal systems; the corresponding regulations rely on them to determine whether a citizen is eligible to social benefts or a resident for tax purposes. Errors in such systems can have dramatic consequences; case in point, the incorrect implementation of Louvois, the former French military payroll system, led to several families either receiving over-payments that they had to reimburse years later, or incomplete paychecks totaling a few cents [42]. For such critical software, it is therefore paramount to provide clear semantics for date computations to avoid mistakes based on erroneous assumptions about a library's behavior. Additionally, such a semantics can form the basis for further analyses, paving the way for the automated detection of date-related ambiguities as part of the development process.

Unfortunately, while elegant in theory, a universal semantics for dates and date arithmetic would not be usable in practice; when possible ambiguities are identifed in law texts, legislators oftentimes extend or modify the law itself to avoid them. For instance, article 641 of the French civil procedure code [30] specifes that, when adding a positive duration to a date to compute a deadline, the rounding, if needed, should go down. Such articles often have narrow application scopes; similar articles in other branches of the law might either leave rounding unspecifed, or adopt a diferent convention. In the US, date computations when fling motions are heavily specifed, however the complexity and amount of corner cases led to no less than 27 subsequent notes and amendments to provide clarifcations [14]. Other regulations instead attempt to escape ambiguities due to month or year additions by reducing such computations to a nonambiguous number of days. Such regulations heavily vary depending on the country and the branch of law considered: acts from the Council of European Communities consider that a month should be treated as 30 days [15], while the Indian Supreme Court took the opposite approach, enacting that the duration of a month for customs purposes is variable [4]. To enable their adoption in a variety of contexts, date libraries therefore require their semantics to be confgurable by developers.

The lowest granularity of date arithmetic we focus on is the day level. Our literature review and communications with lawyers in diferent countries have indeed shown that this kind of date arithmetic is sufcient for the kind of tax and social benefts computations that are the core application target of Catala.

In this paper, we aim to provide a sound foundation for critical software relying on date computations, through the following contributions:

Formally Capturing Date Computations. We frst present a formal semantics of date computations (Sec. 2). Our formalization relies on a base semantics, which is universal and does not specify a rounding mode but instead provides facilities to round on-demand. We leverage these facilities to derive a roundingspecifc semantics for diferent rounding policies. We mechanize this semantics in the F<sup>⋆</sup> proof assistant, and prove several theorems establishing necessary conditions for, e.g., the monotonicity or associativity of computations (Sec. 3). As part of this mechanization, we also identify seemingly intuitive properties that do not hold in practice, and exhibit counter-examples.

Automatically Detecting Date Ambiguities. Building on the semantics, we defne a notion of rounding-insensitivity, which captures that the result of evaluating a program's expression does not depend on the chosen rounding policy (Sec. 4). Aiming to automatically identify possibly harmful ambiguities, we then propose a new static analysis based on abstract interpretation [16] targeting this 2-safety hyperproperty. We implement our analysis in the Mopsa static analyzer [28, 29]. We show that with relational numerical abstract domains, our analysis enables precise reasoning. In addition, our implementation provides actionable counter-example hints which will help users understand why a given expression is rounding-sensitive.

Contribution to Date Arithmetic Libraries. To enable the adoption of this work in existing projects, we implement an OCaml library abiding by our formal semantics, which exposes common rounding modes, as well as an option to abort when ambiguous computations are detected. Our library is standalone and open-source, and easily integrable in OCaml developments. We also survey the behavior of mainstream date arithmetic libraries (Sec. 6), and provide litmus tests that can be used to easily understand how a library behaves with respect to date rounding.

Case Study: Integration in the Catala Language. To demonstrate the applicability of our approach in real-world programs, we replace previous han424 R. Monat, A. Fromherz and D. Merigoux

date unit δ ::= y | m | d rounding mode r ::= ↑ | ↓ | ⊥ values v ::= (y, m, d) | ⊥ expressions e ::= v | e +<sup>δ</sup> n | rnd<sup>r</sup> e period p ::= (nd, nm, ny)

#### Fig. 1. Date expressions

dling of dates in the Catala language [34], a recent domain-specifc language for formalizing computational law, by our library. We also extend the Mopsa [28, 29] static analyzer to support a subset of the Catala language, enabling us to analyze Catala programs for rounding-insensitivity. We evaluate our approach against an existing Catala implementation of the French housing benefts, and automatically identify several date-related ambiguities in the Catala model. This work is in the process of being upstreamed in the Catala compiler.

# 2 Formalizing Date Arithmetic

We start this section by presenting a base semantics for date computations, which does not explicitly specify a rounding policy to handle ambiguous dates. Dates expressions are presented in Fig. 1. Dates values are represented in the yearmonth-day format of the standard Gregorian calendar, where each component will be represented as an integer. We also include a ⊥ element, which represents an error case. Date expressions consist of either date values, or of the application of one of the date operators. Date expressions also contain variables, however their treatment is straightforward and orthogonal to this work; we omit them as well as their associated environment in our presentation. Operators are of two kinds: the addition +<sup>δ</sup> of n years, months, or days, where n is an integer, and the rounding rnd<sup>r</sup> of a date. Our semantics supports three types of rounding: rnd<sup>↑</sup> rounds up the current date to the nearest valid date; rnd<sup>↓</sup> rounds down, and rnd<sup>⊥</sup> raises an error if the current date is invalid. A period is a triple of relative integers, respectively representing the numbers of days, months and years.

We now defne a formal semantics for evaluating expressions. We start by describing the semantics of date addition, presented in Fig. 2. To match standard date formats, we start counting at 1 for valid days and months; to simplify the presentation, we will often represent months using their name instead of their number (e.g., Jan instead of 1). Our semantics is designed to preserve the following invariant: assuming the date on the left is initially valid, any non-ambiguous computation will return a valid date. When the computation is ambiguous, the resulting date is between the largest smaller and the smallest larger valid date.

Our semantics is defned recursively. Consider for instance the addition of a number of days n. If n is small enough to remain in the same month and year, we are in the terminal case and the rule Add-Days applies. The frst premise of the rule ensures that the date is initially valid. It relies on an auxiliary function nb\_days, omitted for brevity, which computes the number of days for a month in a given year (e.g., 31 for January, and 28 or 29 for February depending


Fig. 2. Semantics for date addition

on the year). Otherwise, we either add a month (rule Add-Days-Over) or remove a month (rule Add-Days-Under2) and perform a new addition with an updated number of days. When the initial date is invalid, we return ⊥ to avoid propagating large errors and maintain important properties about date semantics that we prove in Sec. 3. When composing additions, it might therefore be necessary to apply rounding operators presented later in this section to avoid ⊥. One last point of interest in these semantics is the dissymmetry between the Add-Days-Over and Add-Days-Under-\* rules. Since adding a number of days is never ambiguous, we wish to ensure that, assuming the initial date is valid, we never apply the Add-Days-Err1 or Add-Days-Err2 rules. To do so, when updating the month or year during day addition, we always go through an intermediate state corresponding to the frst day of the month, which is always a valid day independently of the month and year. For brevity, we also omit several redundant error cases, where the current month does not belong to the interval [1; 12]; these cases return ⊥. Following standard notations, we will denote the transitive closure of our small-step semantics as <sup>∗</sup>→.

The last step is now to defne semantics for rounding, shown in Fig. 3. Compared to additions, the rounding semantics is simpler: if the date is already valid, any mode of rounding leaves the date unchanged (Round-Noop). Otherwise, rounding down (Round-Down) returns the last day of the current month, rounding up (Round-Up) returns the frst day of the next month, while the

Fig. 3. Semantics for date rounding

strict rounding mode (Round-Err2) raises an error. In all cases, if the day is initially negative, rounding returns ⊥; we will prove in Sec. 3 that this never happens when starting from a valid date.

Separating additions and rounding has several benefts. Diferent use cases might require diferent rounding modes, and diferent ways of adding days, months, and years. For instance, when adding a period such as 1 year and 10 months, some settings might specify that months should be added frst, or that rounding must be performed after adding months, and again after adding years; our formal semantics enables this fexibility.

The last remaining step is to defne additions not just for individual days, months, or years, but for composite time periods. Building upon our semantics, we can defne this generically for a rounding mode r as follows, and avoid the need for users to manually call rounding operators.

$$e +\_r (y, m, d) ::= \text{rnd}\_r(((e +\_y y) +\_m m)) +\_d d$$

One point of interest in our derived forms is that we only apply rounding after performing addition of years and months. Indeed, adding a year should be equivalent to adding 12 months. However, if we performed rounding after each operation, adding 1 year and 1 month to February 29 2020 with the rounding-up mode would return April 1, 2021 instead of Mar 29, 2021. We emphasize that, in cases where this behavior would be expected, defning derived forms corresponding to this semantics would be straightforward using our base semantics.

Based on this semantics, we can now formally defne the notion of an ambiguous date expression in Defnition 1.

Defnition 1 (Ambiguous expression). A date expression e is ambiguous if and only if rnd⊥(e) <sup>∗</sup>→ ⊥.

Note that this intensional defnition of ambiguity is equivalent to stating that the an expression e is ambiguous if and only if rounding e in diferent modes yields diferent dates.

While the semantics presented in this section focuses on the core, possibly ambiguous computations, our work also includes other non-ambiguous operators (omitted for brevity), e.g., to retrieve the frst or last day of a given month. This allows to encode a variety of patterns, for instance, the second-to-last day of a month by combining date arithmetic with the "last day of month" operator, or to rely on a preprocessing phase if months must be treated as 30 days [15]. Our semantics supports reasoning on computations mixing rounding modes.

# 3 Mechanizing Semantics

Building upon the semantics presented in the previous section, we now present several properties of interest related to date computations that we will rely upon when designing a static analysis in Sec. 4. As part of our contributions, we mechanize our semantics, related properties and their proofs inside the F<sup>⋆</sup> proof assistant [52].

### 3.1 Semantic properties

As part of our proof development, we separate semantic properties in two categories: properties established on the base semantics, valid for all derived forms, and properties derived on specifc rounding modes. In many cases, proofs on derived forms can be performed efciently by composing lemmas on base semantics, thus simplifying the proof efort. During development, we also encode our OCaml implementation of date computations and corresponding theorems into qcheck [54], a QuickCheck [13] inspired property-based testing framework for OCaml. We mostly used QuickCheck as a fast sanity check before spending time proving lemmas in F<sup>⋆</sup> . In particular, our initial intuition for several of the lemmas and theorems presented was often unreliable, omitting corner cases; we used QuickCheck to gain more confdence in our intuition before moving to F<sup>⋆</sup> . This encoding allowed us to automatically fnd most of the counter-examples presented in Sec. 3.2.

We start by proving that expressions in our semantics always evaluate to a value (possibly ⊥), i.e., reduction is never stuck and it terminates.

#### Theorem 1 (Normalization). For any date d, and any integer n, there exists a value v<sup>δ</sup> such that d +<sup>δ</sup> n <sup>∗</sup>→ vδ.

In addition to normalization, a useful property about our semantics is a characterization of valid computations: when using any of the non-abort rounding modes, an addition starting from a valid date will always return a valid date; the defnition of validity is straightforward, but omitted for brevity. To prove it, we need the following properties on base semantics, which we prove by induction on the reductions.

Lemma 1 (Well-formedness of day addition). For any valid date d, any integer n, and any value v, d +<sup>d</sup> n <sup>∗</sup>→ v ⇒ v ̸= ⊥.

Lemma 2 (Well-formedness of year/month addition). For any valid date d, any integer n, any value v, and δ ∈ {y, m}, we have d +<sup>δ</sup> n <sup>∗</sup>→ v ⇒ v ̸= ⊥ ∧ day\_of(v) ≥ 1.

Lemma 3 (Well-formedness of rounding). For any date d such that d ̸= ⊥, any value v, and r ∈ {↑, ↓}, we have rnd<sup>r</sup> d <sup>∗</sup>→ v ⇒ valid(v).

We can now state the following theorem on the derived semantics.

Theorem 2 (Well-formedness). For any valid date d, any period p, any value v, and r ∈ {↓, ↑}, we have d +<sup>r</sup> p <sup>∗</sup>→ v ⇒ valid(v).

We now present several theorems related to the monotonicity of the addition in our semantics. Date comparison is defned in the standard way, as the lexicographical order on (y, m, d). To simplify the presentation, we lift the comparison operators to operate on date expressions, defned as the comparison on the values obtained by evaluating the expressions.

Theorem 3 (Monotonicity). For any dates d1, d2, for any period p, for r ∈ {↓, ↑}, if d<sup>1</sup> < d2, then d<sup>1</sup> +<sup>r</sup> p ≤ d<sup>2</sup> +<sup>r</sup> p.

A point of interest in this theorem is the discrepancy between bounds: while the bound in the premise is strict, the bound in the conclusion is loose. Unfortunately, a stronger version with strict bounds on both sides does not hold; for instance, two additions involving rounding down of April 30 and April 31 respectively yield the same result. To prove this theorem, we again need several intermediate lemmas operating on base semantics. First, we establish an equivalence between adding years and adding months. We then state and prove several monotonicity properties on the base semantics. The proof of Theorem 3 follows by direct application of these lemmas.

Lemma 4 (Equivalence of year and month addition). For all date d, for all integer n, d +<sup>y</sup> n = d +<sup>m</sup> (12 ∗ n).

Lemma 5 (Monotonicity of year and month addition). For all dates d1, d2, for any integer n, for δ ∈ {y, m}, d<sup>1</sup> < d<sup>2</sup> ⇒ d<sup>1</sup> +<sup>δ</sup> n < d<sup>2</sup> +<sup>δ</sup> n.

Lemma 6 (Monotonicity of day addition). For all valid dates d1, d2, for any integer n, d<sup>1</sup> < d<sup>2</sup> ⇒ d<sup>1</sup> +<sup>d</sup> n < d<sup>2</sup> +<sup>d</sup> n.

Lemma 7 (Monotonicity of rounding). For all dates d1, d2, for r ∈ {↓, ↑}, d<sup>1</sup> < d<sup>2</sup> ⇒ rndr(d1) ≤ rndr(d2).

Finally, we state the following lemma, which guarantees that rounding down will always return a smaller date than rounding up. Additionally, when the addition is not ambiguous, the two rounding modes return the same result.

#### Theorem 4 (Rounding).


We fnally characterize the ambiguity of month addition, a property that we will need to prove the soundness of the static analysis presented in Sec. 4.

Theorem 5 (Characterization of ambiguous month additions). For all valid date d, for all integer n, for all value v such that d +<sup>m</sup> n <sup>∗</sup>→ v, we have nb\_days(year\_of(v), month\_of(v)) < day\_of(v) ⇔ rnd⊥(v) <sup>∗</sup>→ ⊥.

#### 3.2 Non-properties and counter-examples

We now present several seemingly intuitive and ideally useful properties about date semantics that do not hold in practice.

Non-Property 1 (Commutativity of addition) For all date d, for all periods p1, p2, for all r ∈ {↓, ↑}, we have (d +<sup>r</sup> p1) +<sup>r</sup> p<sup>2</sup> = (d +<sup>r</sup> p2) +<sup>r</sup> p<sup>1</sup>

Consider the case where d = March 31, p<sup>1</sup> = 1 day, and p<sup>2</sup> = 1 month. When adding p<sup>1</sup> frst and rounding down, the addition returns April 30, while the result when adding p<sup>2</sup> frst will be May 1. Similar examples exist when rounding up, for instance, by setting d = January 29 2023 , p<sup>1</sup> = 30 days, and p<sup>2</sup> = 1 month.

Non-Property 2 (Associativity of addition) For all date d, for all periods p1, p2, for r ∈ {↓, ↑}, we have (d +<sup>r</sup> p1) +<sup>r</sup> p<sup>2</sup> = d +<sup>r</sup> (p<sup>1</sup> + p2)

Consider the case where d = March 31, p<sup>1</sup> = 1 month, and p<sup>2</sup> = 1 month. In all rounding modes, adding p<sup>1</sup> followed by p<sup>2</sup> will require rounding, ultimately yielding May 30 or June 1, while directly adding p<sup>1</sup> + p<sup>2</sup> returns May 31.

As the addition being associative and commutative is common among most datatypes, we emphasize that its invalidity for dates can be a source of confusion for programmers; common optimizations or rewritings of date computations in a seemingly equivalent way (e.g., replacing 1 month + 1 month by 2 months) can lead to diferent outcomes. However, these disparities are exclusively due to occurrences of rounding in computations. We thus aim to help programmers when handling date computations by proposing a static analysis that automatically detects when rounding might impact the evaluation of expressions.

# 4 A Static Analysis For Rounding-Insensitivity

In this section, we leverage our formal semantics to defne a sound static analysis automatically verifying date computations programs. Our goal is to statically detect ambiguous computations, whose result depends on the chosen rounding mode. Indeed, when writing programs whose specifcation is the law, choosing the rounding mode arbitrarily is not a possibility; this would amount to a legal interpretation that exposes the administration operating the program to be challenged in court if the rounding mode is unfavorable to a user. The cost of bearing the responsibility for making technical regulatory choices for administration personnel has been documented by Torny [55].

A naive approach would be to fag any program which contains an ambiguous addition. However, this solution can be overly restrictive: computations can be ambiguous while having no impact on the fnal outcome of the program. Consider for example the expression d + 1 month <= March 15 2023. If no rounding happens when adding d and 1 month, then the expression is obviously safe. Otherwise, we notice that the rounding may only happen to yield the last day of a month, or the next day of the upcoming month. In both cases, comparing

```
1 date current = random_date();
2 date birthday = random_date();
3 date intermediate = birthday + [2 years, 0 months, 0 days];
4 date limit = first_day_of(intermediate);
5 assert(sync(current < limit));
```
Fig. 4. Example extracted from Catala code modeling the French housing benefts

this result with a date in the middle of a month is thus safe. Instead, we consider a more interesting property called rounding-insensitivity, capturing that the evaluation of an expression is the same for both rounding modes.

At a high-level, our analysis works by tracking constraints over the day, month, and year of a date, through the YMD domain (Sec. 4.1). The YMD domain is fully parametric in a numerical abstract domain, and works by translating date constraints into numerical constraints. We discuss the choice of numerical abstract domains in Sec. 4.2, in order to obtain the best precision in the presence of linear constraints and unconstrained dates. We analyze the computations with both rounding modes and compare the result to decide roundinginsensitivity, which is a 2-safety hyperproperty. We explain how we lift the YMD domain to these double computations in Sec. 4.3. We implemented our analysis within the Mopsa static analysis platform [28, 29], described in Sec. 4.4. We have taken special care in ensuring that actionable counter-examples can be generated in Sec. 4.5, paving the way for use by non-experts.

We think that abstract interpretation hits a sweet spot to perform this analysis. Its full automation makes it usable by non-specialists, especially with the provided counter-example hints. It allows to derive tailored approximations thanks to Th. 5. The current defnition of date addition is recursive and there are nonlinear arithmetic constraints involved, which does not work well with SMT.

We use as a motivating example the program shown in Fig. 4. This program has been extracted from a Catala code snippet used to formalize the French housing benefts [33, Sec. 3.1]. We will provide more details on Catala and the extraction to date programs in Sec. 5. In this program, we pick two arbitrary, unconstrained dates, perform a date-duration addition of two years, and project the resulting date onto the frst day of its month. The assertion at line 5 expresses the rounding-insensitivity of the comparison between an arbitrary, unconstrained date and the computed date.<sup>3</sup> The sync predicate, formally defned in Sec. 4.3, holds if and only if the evaluation of its expression in both rounding modes yields the same result, meaning that the expression is rounding-insensitive.

The programs we consider in this section are written in a standard, toy imperative language.

#### 4.1 The YMD domain combinator

The YMD domain translates constraints on the year, month and day of a date into numerical constraints over three integer variables. These numerical constraints are handled by a numerical abstract domain, described in Defnition 2.

<sup>3</sup> Here sync(current < limit) could be reduced to sync(limit). However our analysis will not need it, and will be able to provide counter-example hints also targeting the values of current, improving readability of the output.

$$\begin{aligned} \text{data\\_dom}: \left\{ \begin{array}{l} (\mathcal{V} \rightarrow \mathbb{Z}) \rightarrow \mathcal{P}(\mathcal{V})\\ \rho \end{array} \rightarrow \left\{ v \mid \text{year}(v), \text{month}(v), \text{day}(v) \in \text{dom}(\rho) \right\} \\\ \gamma \text{YMD}: \left\{ \begin{array}{l} \mathcal{N}^{\#} \rightarrow \mathcal{P}(\mathcal{V} \rightarrow \mathcal{D})\\ \eta \not\equiv \cup\_{\rho \in \gamma\_{\mathcal{N}}(\iota^{\pm 2})} \{e \mid \text{dom}(e) = \text{date\\_dom}(\rho) \wedge \forall v \in \text{dom}(e), e(v) = (y, m, d) \} \\\ \wedge \text{valid}(y, m, d) \wedge y = \rho(\text{year}(v)) \wedge m = \rho(\text{month}(v)) \wedge d = \rho(\text{day}(v)) \} \end{array} \right. \end{aligned} \right. \end{aligned}$$

#### Fig. 5. Concretization of the YMD domain

The YMD domain can be seen as a domain combinator, or a functor relying on a numerical abstract domain – we will discuss the chosen instantiation in Sec. 4.2. This domain works at a fxed rounding mode.

Defnition 2 (Numerical abstract domain). In the following, a numerical abstract domain is a lattice N # on which the following operations are defned:


This domain is further defned by a concretization function γ<sup>N</sup> : N # → P(V → Z) mapping numerical abstract environments to a set of concrete integer environments it represents. We assume the numerical abstract domain is sound.

Given a date variable v, the YMD domain will create new auxiliary (or ghost) variables year(v), month(v), day(v), which do not exist in the original program but simplify reasoning. This is an approach we borrow from the deductive verifcation community, and that has been used in static analyses both in the work of Chevalier and Feret [12] as well as in Mopsa.

We provide a formal defnition of the concretization, which defnes the meaning of the YMD domain, and illustrate it on an example.

Defnition 3 (Concretization of the YMD domain). The concretization of the YMD domain is formally defned in Fig. 5. It explains how an abstract numerical environment n # ∈ N # can be interpreted into a set of date environments e ∈ V → D mapping variables to dates. To construct these date environments, we frst pick an integer environment ρ ∈ V → Z from the concretization of the numerical abstract domain γ<sup>N</sup> (n #). The date environments will have as domain defnition the date domain of function ρ, dates\_dom(ρ), which is the set of variables where auxiliary year, month and day variables are defned in ρ. For each of those variables v ∈ dates\_dom(ρ), e(v) corresponds to the date defned by the auxiliary variables in ρ, provided that the date is valid.

Example 1 (Concretization). Let us assume our numerical domain is a map from variables to intervals, and consists of the following state: n # = day(d) ∈ [1, 31]∧ month(d) ∈ [1, 12] ∧ year(d) = 2023. In that case, the concretization is the set of date environments e defned on variable d such that e(d) can be any valid date of 2023. Thus, there is a date environment e ∈ γYMD(n #) such that e(d) = (2023, 1, 31). However, there is no date environment such that e(d) = (2023, 2, 29) and e ∈ γYMD(n #) because the date is invalid (2023 is not a leap year).

The YMD domain handles the following transfer functions:


Transfer function for month addition. We provide a simplifed OCaml implementation for the month addition transfer function in Fig. 6. The transfer function takes as parameter a date, represented as a variable; a concrete number of months; an input abstract state; and a rounding mode chosen for date computations. It will return a case disjunction<sup>4</sup> of type cases: a list of case, each consisting in an expression and an abstract state. We start by defning day, month, year, which are expressions representing the day, month and year number of date through auxiliary variables. The resulting month and year are computed through non-linear expressions. Similarly to the semantics, we encode months as integers to perform arithmetic operations, and start our numbering at 1 for January. The transfer function performs a case disjunction to detect if date rounding will happen, following the characterization of ambiguous month additions (Th. 5). This case disjunction checks whether the day of the date is compatible with the number of days in the resulting month (and year, as February has one more day during leap years). This disjunction is encoded thanks to the switch utility, which takes as input an abstract state and a list of tuple of expressions and continuations. Given a tuple (cond, k), the input abstract state is fltered to satisfy the expression cond (by delegation to the numerical

<sup>4</sup> These disjunctions can be seen as a partitioning of the abstract state. In this section we consider everything is partitioned to improve the precision. Our implementation supports limiting the number of partitions.

abstract domain). The resulting abstract state is fed to the continuation, which yields a case. The cases we encounter during the addition are:

Rounding to 29 Feb. of a leap year. If the resulting month is February of a leap year, and the current day number is greater than 29, we will have to perform date rounding. We do so using the auxiliary round function. Depending on the rounding mode, it either chooses the provided date, or the frst of the month afterwards. This date is then returned in its corresponding abstract state using mk\_date, whose implementation is not detailed.

Rounding to 28 Feb. of a non-leap year. Similar case omitted for brevity. Rounding to a 30-day month. If the current day number is 31 but the resulting month has 30 days (i.e, it is either April, June, September or November), we also have to perform a rounding, either to the 30th of the resulting month, or the 1st of the month after.

Other cases. No rounding happens, the day number remains the same.

Note that add\_months, round and is\_leap defne syntactic expressions, which will be delegated through assign and assume to the numerical abstract domain. The expressions at lines 6, 13, 14, 21–22, 26, 28, 30 are not directly evaluated: they will be interpreted by the assume of the numerical abstract domain during the evaluation of the switch function. The defnition of the transfer function for month addition assumes the number of months to add is known as a concrete integer. This is not restrictive in practice: all programs we extracted from Catala in Sec. 5 only perform date-month addition with a concrete number of months.

The proof of soundness of the abstract month addition, is not formalized in F<sup>⋆</sup> and omitted for brevity. However, it is a direct application of the characterization of ambiguous month additions established in Th. 5, and proved formally in F<sup>⋆</sup> .

The analysis may refne constraints on a day, month or year auxiliary variable. These constraints could then entail new constraints on other auxiliary variables of the same date to represent only valid dates. This propagation phase is performed by the strengthening operator described below, which is sound as it only removes invalid dates, which are not taken into account by the concretization.

Strengthening operator. The strengthening operator enforces the following:


Comparison transfer function. The transfer function for date comparisons is dates\_lt in Fig. 6; it encodes a lexicographic comparison.

#### 4.2 Instantiating YMD with a combination of numerical domains

The YMD domain is fully generic in the numerical abstract domain it relies on to translate date constraints into constraints over integers. We describe how we

```
1 type case = expr * state
2 type cases = case list
3
4 let switch abs = List.map (fun (cond : expr, k : state -> case) -> k (assume cond abs))
5
6 let is_leap (y : expr) : expr = (y % 4 = 0 && y % 100 <> 0) || (y % 400 = 0)
7
8 let round (r : rounding) (d m y : expr) (abs : state) : case =
9 match r with
10 | RoundDown ->
11 mk_date d m y abs
12 | RoundUp ->
13 let succ_m = 1 + res_month % 12 in
14 let succ_y = y + res_month / 12 in
15 mk_date 1 succ_m succ_y abs
16
17 let add_months (r : rounding) (date : var) (nb_m : int) (abs : state) : cases =
18 let day = day_of date in
19 let month = month_of date in
20 let year = year_of date in
21 let res_month = 1 + (month - 1 + nb_m) % 12 in
22 let res_year = year + (month - 1 + nb_m) / 12 in
23 switch abs
24 [
25 (* Rounding to 29 Feb. of a leap year *)
26 day > 29 && res_month = Feb && is_leap res_year, round r 29 res_month res_year;
27 (* Rounding to 28 Feb. of a non-leap year *)
28 day > 28 && res_month = Feb && not (is_leap res_year), round r 28 res_month res_year;
29 (* Rounding to a 30-day month *)
30 day > 30 && is_one_of res_month [Apr;Jun;Sep;Nov], round r 30 res_month res_year;
31 (* No rounding *)
32 mk_true, mk_date day res_month res_year
33 ]
34
35 let dates_lt (d1 d2 : var) (abs : state) : cases =
36 switch abs
37 [
38 (year_of d1) < (year_of d2), mk_true;
39 (year_of d1) > (year_of d2), mk_false;
40 (year_of d1) = (year_of d2) && (month_of d1 < month_of d2), mk_true;
41 (year_of d1) = (year_of d2) && (month_of d1 > month_of d2), mk_false;
42 (year_of d1) = (year_of d2) && (month_of d1 = month_of d2)
43 && (day_of d1 < day_of d2), mk_true;
44 (year_of d1) = (year_of d2) && (month_of d1 = month_of d2)
45 && (day_of d1 >= day_of d2), mk_false;
46 ]
```
Fig. 6. Abstract transfer functions for month addition and date comparison

chose a combination of numerical abstract domains to get the best precision possible in the presence of non-linearity and unconstrained dates.

We initially started using intervals and congruences for our frst tests. Due to its convexity, the interval domain was unable to precisely represent months where the day number may be rounded to 30 days during the date-month addition (line 30 of Fig. 6). Thus, we added a domain of powerset of integers (of size at most 4) to be precise enough for this usecase. When month is not a constant, the congruence domain will be unable to precisely represent the resulting month (line 21 of Fig. 6), and refne the potential values of month given constraints on res\_month. This situation happens often in our evaluation; it is shown in our motivating example. We resolved this precision issue by switching from the congruence domain to the relational, linear congruence domain [5]. We also added the polyhedra domain [17] to keep track of equalities between diferent day,

month and year variables, which happens during analyses on programs with unconstrained dates, as we will show in the upcoming examples.

Our current numerical abstract domain is a reduced product between grids, polyhedra, intervals, and a bounded powerset of integers. The relational domains rely on the Apron library [27]. The approximation of non-linear computations is performed through linearization techniques [37].

Example 2. Let us consider the program below picking an arbitrary, unconstrained date d and then adding one month to d. We illustrate the diferent cases of the transfer function add\_months in this case, assuming we round down.

```
date d = random_date(); date d2 = d + [0 years, 1 months, 0 days];
```
Rounding to 29 Feb. of a leap year. In the frst case of the transfer function, the numerical domain is able to deduce from the expression day > 29 && res\_month = Feb that the day of d is either 30 or 31, and the month is January. In the rounding down mode, d2 is thus February 29th. The relational domain additionally expresses that year(d) = year(d2).

Rounding to 28 Feb. of a non-leap year. Similar case, omitted for brevity. Rounding to a 30-day month. The numerical abstract domain infers that d represents the 31st of March, May, August or October, tracked thanks to the bounded set of integers domain. As we round down, we deduce that the day of d2 is 30, and month(d) ∈ {Apr, Jun, Sep, Nov}. In that case, the relational domain can also infer that year(d) = year(d2), as m / 12 will always be zero.

Other cases. In the last case, the intervals and powerset domains cannot express interesting constraints on d and d2. The relational domains are however able to capture key relations:


12 year(d) + month(d) ≤ 12 year(d2) + 11 ∧ 12 year(d2) ≤ 12 year(d) + month(d) + 1

Example 3 (Addition and strengthening). We use our running example from Fig. 4, and show what the date addition and the strengthening operator yield for dates birthday and intermediate. In this example, we assume the dates are rounded up. As we add two years to birthday, two of the four cases described in the month addition previously presented will not apply; we omit them below. Rounding to 28 Feb. of a non-leap year. In that case, birthday is a Feb. 29th, and intermediate rounds up to March 1st. We additionally know that year(birthday) + 2 = year(intermediate). The strengthening ensures that year(birthday) is divisible by 4.

No rounding. The day and month numbers of birthday and intermediate are equal. The year condition is similar to the one provided in Ex. 2.

Example 4 (Comparison). Let us continue with our running example, assuming we are focusing on the partition where intermediate has been rounded up to March 1st (as shown in Ex. 3). In that case, limit is equal to intermediate. Assuming the comparison current < limit holds, we have three diferent cases, described by the line number in Fig. 6. Line 38 yields year(current) < year(limit). Line 40 enforces year(current) = year(limit), month(current) < month(limit), so month(current) ∈ {Jan, Feb}. Line 42 yields that the year and month numbers of current and limit are the same and day(current) < day(limit). This last case is impossible given that 1 ≤ day(current) ≤ 31 and day(limit) = 1.

#### 4.3 Lifting to both rounding modes

The YMD domain operates at a given, fxed rounding mode. In this section, we leverage the YMD domain to perform date computations in both rounding modes and thus prove rounding-insensitivity. This lifting is inspired by Delmas et al. [21], who analyze product programs to prove endianness portability of C programs. Here, we keep the product of programs implicit: only the rounding mode changes between the two executions we will consider.

We start by explaining how the concrete semantics are lifted from a single rounding mode to both. We assume we have a semantics of expressions (respectively statements) <sup>E</sup>rJexpr<sup>K</sup> (resp. <sup>S</sup>rJstmtK) parameterized by a date rounding mode r ∈ {↑, ↓}. They take as input sets of environments (E = V → Val) mapping variables to values (which are either integers or dates), and produce values (resp. environments).

$$\mathbb{E}\_r[expr]: \mathcal{P}(\mathcal{E}) \to \mathcal{P}(\text{Val}) \qquad \qquad \mathbb{S}\_r[stmt]: \mathcal{P}(\mathcal{E}) \to \mathcal{P}(\mathcal{E}) \qquad$$

We defne in Fig. 7 the concrete semantics evaluating expressions and statements over both rounding modes, written respectively <sup>E</sup>↕Jexpr<sup>K</sup> and <sup>S</sup>↕JstmtK. We do not delve into the details of product programs, which are defned in the work of Delmas et al. [21]. In this semantics, the state is now duplicated: D = E × E. We ensure that random operations return the same value in both rounding modes, to avoid spurious desynchronizations. The sync predicate returns true if and only if the expression evaluates to the same values in both rounding modes, capturing the rounding-insensitivity of the contained expression. We use it in the programs we analyze to target the expressions we want to check, as we have already seen in Fig. 4. The evaluation of other expressions is performed pointwise on both rounding modes, and similarly for the assignments.

Defnition 4. An expression e is rounding-insensitive in a state d if and only if <sup>E</sup>↕Jsync(e)K({d}) = {(true, true)}. This property is encoded in programs by the statement assert(sync(e)).

$$\begin{split} & \mathbb{E}\_{\natural}[\exp r] : \mathcal{P}(\mathcal{D}) \to \mathcal{P}(\text{Val}^{2}) \\ & \mathbb{E}\_{\natural}[\text{random\\_date}(]](D) = \{ (d,d) \mid d \in \mathbb{Z}^{3}, \\ & \mathbb{E}\_{\natural}[\text{sync}(e)](D) = \bigcup\_{\langle \rho\_{\uparrow}, \rho\_{\downarrow} \rangle \in \mathcal{D}} \{ (b\_{u} == b\_{d}, b\_{u} == b\_{d}) \mid (b\_{u}, b\_{d}) = \mathbb{E}\_{\natural}[e](\rho\_{\uparrow}, \rho\_{\downarrow}) \} \\ & \mathbb{E}\_{\natural}[\exp r](D) = \bigcup\_{\langle \rho\_{\uparrow}, \rho\_{\downarrow} \rangle \in \mathcal{D}} \{ (v\_{\uparrow}, v\_{\downarrow}) \mid v\_{\uparrow} = \mathbb{E}\_{\natural}[e]\rho\_{\uparrow}, v\_{\downarrow} = \mathbb{E}\_{\natural}[e]\rho\_{\downarrow} \} \\ & \mathbb{S}\_{\natural}[\text{sst} \boldsymbol{n}] : \mathcal{P}(\mathcal{D}) \to \mathcal{P}(\mathcal{D}) \\ & \mathbb{S}\_{\natural}[\boldsymbol{v} = \boldsymbol{e}](D) = \bigcup\_{\langle \boldsymbol{\rho}\_{\uparrow}, \boldsymbol{\rho}\_{\downarrow} \rangle \in D} \{ (\mathbb{S}\_{\natural}[\boldsymbol{v} = \boldsymbol{v}\_{\uparrow}] \boldsymbol{\rho}\_{\uparrow}, \mathbb{S}\_{\natural}[\boldsymbol{v} = \boldsymbol{v}\_{\downarrow} | \boldsymbol{\rho}\_{\downarrow}), (v\_{\uparrow}, v\_{\downarrow}) \in \mathbb{E}\_{\natural}[\boldsymbol{e}]\{\rho\_{\uparrow}, \rho\_{\downarrow}\} \}. \end{split}$$

#### Fig. 7. Concrete semantics over double evaluation of rounding modes

The abstract semantics mimics the concrete behavior, but works on a single abstract state instead of a set of concrete double states. The double state is represented by duplicating variables according to their rounding mode in the numerical abstract domain. A variable x is thus written ↑x (resp. ↓x) to represent the variable when the upper (resp. lower) rounding mode is used. This duplication is performed in a shallow fashion to improve usability: when performing an assignment x = e, if e evaluates into the same value in both rounding modes, the variable x will not be duplicated into the numerical abstract domain.

Example 5 (Rounding-sensitivity of the comparison). Back to our running example, we have shown so far how the YMD domain analyzes the program when rounding up (Ex. 4). Continuing with the same relational abstract domain, we show part of the abstract state in the partition focusing on rounding to Feb. 28 of a non-leap year in Eq. (1). In the rounding mode down, intermediate rounds to Feb. 28, and thus limit rounds down to Feb. 1st.

$$\text{day}(\textbf{current}) \in [1, 31], \text{month}(\textbf{current}) \in [1, 12], \text{year}(\textbf{current}) \in [-\infty, +\infty]$$

$$\text{day}(\textbf{birthday}) = 29, \text{month}(\textbf{birthday}) = \textbf{Feb}, \text{year}(\textbf{birthday}) \equiv\_{4} 0$$

$$\text{day}(\textbf{intermediate}) = 1, \text{month}(\textbf{intermediate}) = \textbf{Mar}$$

$$\text{day}(\textbf{intermediate}) = 28, \text{month}(\textbf{intermediate}) = \textbf{Feb} \tag{1}$$

$$\text{year}(\textbf{intermediate}) = \text{year}(\textbf{intermediate}) = \text{year}(\textbf{birthday}) + 2$$

$$\text{day}(\textbf{1}\textbf{nit}) = 1, \text{month}(\textbf{1}\textbf{nit}) = \text{kar} \downarrow$$

$$\text{year}(\textbf{1}\textbf{nit}) = \text{year}(\textbf{1}\textbf{nit}) = \text{year}(\textbf{birthday}) + 2$$

We exhibit an abstract state where we cannot prove that the expression current < limit is rounding-insensitive. The static analysis will consider all cases in the comparison and the evaluation in both rounding modes. For the sake of presentation here, we only highlight one case. The date comparison operator between current and the rounded up version of limit yields a partition where the years are the same and the month number is less. This partition refnes the abstract state above with the following constraints:

$$\text{year}(\textbf{current}) = \text{type}(\textbf{1}\textbf{init}) \land \text{(month}(\textbf{1}\textbf{init}) < \text{month}(\textbf{current}) = \textbf{Var} \qquad (2)$$

Let us now consider the case where the comparison with the rounded down version of limit does not hold, when the years and months are the same but the days are not. We get the following additional constraints:

```
year(current) =↓year(limit) ∧ month(current) =↓month(limit) = Feb ∧
day(current) ≥↓day(limit) = 1
                                                                        (3)
```
Combining the constraints from Eqs. (2) and (3) on the abstract state from Eq. (1) gives the following result on current:

$$\text{year}(\textbf{current}) = \text{year}(\textbf{birthday}) + 2 \land \text{month}(\textbf{current}) = \textbf{Feb} \tag{4}$$

To summarize, our analysis has been unable to prove the rounding-insensitivity of the expression current < limit, in particular in the case of the abstract state presented in Eq. (1), refned with constraints from Eqs. (2) and (3). Thanks to partitioning and relational abstract domains, we know that the proof fails when birthday is a Feb. 29th (of a year y which is divisible by 4, a sound but not complete way to express it is leap). In that case, intermediate will either be Feb. 28th or March 1st of y + 2. This entails that limit will either be Feb. 1st or March 1st of y + 2. In the cases where current is a day of February of y + 2 (Eq. (4)), the comparison will efectively be rounding-sensitive.

The original program did not contain any constraints on birthday or current. Note that if we add in the program that the day of birthday is less than 28, our analysis is able to automatically prove the program to be rounding-insensitive.

#### 4.4 Implementation

We implemented our approach in the Mopsa static analysis platform [28, 29]. Mopsa is able to analyze C, Python and multilanguage Python/C programs [40, 41, 44], to prove the absence of runtime errors, and to perform portability analysis of C programs [21]. We modifed the front-end of a toy imperative language also available in Mopsa to analyze programs performing date arithmetic. We chose to extend this language for our analysis as we do not require advanced features from C nor Python. Thanks to Mopsa's modular architecture, we have been able to reuse iterators for intraprocedural analysis with little code changes.

Fig. 8. Date analysis confguration

The confguration used by Mopsa for our analysis is illustrated in Fig. 8. The "D.bidates" domain corresponds to the abstract domain and transfer functions described in Sec. 4.3. The "U.ymd" domain is the YMD domain (Sec. 4.1). The last part enclosed in a gray box corresponds to the numerical abstract domain on top of which the YMD domain was built (Sec. 4.2).

```
5: assert(sync(current < limit));
               ^^^^^^^^^^^^^^^
Desynchronization detected: (current < limit). Hints:
↑month(limit) = 3, ↑day(limit) = 1, ↓month(limit) = 2, ↓day(limit) = 1,
↑month(intermediate) = 3, ↑day(intermediate) = 1,
↓month(intermediate) = 2, ↓day(intermediate) = 28,
month(birthday) = 2, day(birthday) = 29, month(current) = 2, day(current) = [1,29],
year(birthday) =[4] 0, year(current) = ↑year(intermediate) = ↑year(limit)
= ↓year(intermediate) = ↓year(limit) = year(birthday) + 2
```
Fig. 9. Mopsa's output on the running example

#### 4.5 Generating counter-example hints

We have extended our implementation to provide counter-examples hints when a synchronization assertion cannot be proved safe. Given our usecase, it is paramount to provide meaningful feedback to users translating law articles into Catala code so they understand why their date computations might be ambiguous (Sec. 5). These hints are precise constraints on the considered program that may lead an expression to be rounding-sensitive. They are especially helpful to provide more precise date ranges for unconstrained dates that may afect rounding sensitivity. As our approach is incomplete, these hints may be spurious; we however did not encounter this issue in our case study on Catala programs.

This generation of counter-example hints is atypical for static analyses by abstract interpretation. This approach is permitted here by a simplifed setting (variables are assigned once, and the abstract state is partitioned to ensure a high precision) and the use of powerful relational abstract domains. In a general setting with multiple variable assignments, joins and widenings, most approaches need to perform backward analyses [1, 38, 49].

This generation of hints works in two steps: it frst starts by heuristically selecting the best partition of the abstract state. The YMD domain may partition the abstract state in order to keep the best precision. Our heuristic selects the partition with the highest number of desynchronized variables (meaning there has been signifcant roundings), and the highest number of auxiliary variables for days and months which are constants. The second step of the hint generation extracts the relevant constraints from the considered abstract state. This extraction starts by collecting all date variables defned in the program. For these variables, we evaluate the auxiliary day, month and year variables into intervals, and keep only intervals providing meaningful information (i.e., intervals strictly included in [1, 31] for day variables, strictly included in [1, 12] for month variables, and bounded intervals for year variables). We then project the relational abstract domain onto the set of auxiliary variables where no meaningful intervals has been extracted to provide linear relations for those. We show in Fig. 9 the exact, unedited output of the hints generated by Mopsa in the case of our running example and highlight their readability. They correspond exactly to the constraints previously described in Ex. 5.

# 5 Case Study: Application to Catala

This section highlights how the results and methods established in the previous section can be applied in the setting of legal expert systems, and more specifcally

within Catala [34], a recent domain-specifc programming language designed to be understandable by lawyers and close to the structure of legal texts, with formal semantics that clearly defne its behavior to reduce discrepancies between legal texts and their implementation.

We start by describing rulings and implementations of the law where precise and well-defned date arithmetic is paramount to ensure expected results. Then, we describe how Catala's implementation of date rounding has recently evolved: from the issues we noticed in Catala's previous of-the-shelf implementation, to the port to our date calculation library and the introduction of a function-local rounding defnition when legal references or interpretations are known, reducing the number of cases where the rounding mode is unspecifed. We fnish by explaining the latest implemented feature, which allows the Catala compiler to extract date computations and relies on Mopsa to (dis)prove rounding-insensitivity.

#### 5.1 Date arithmetic and the law

Critical software relying on date computations is commonly used by companies or government agencies to automatically enforce legal dispositions, e.g., to check if an application has been fled within the correct time period, to compute agerelated conditions, or to aggregate periods between dates and compare the result to a fxed duration for eligibility calculation.

In all these cases, there can be heavy fnancial and legal consequences when a date computation goes wrong or is subject to diverging interpretation. In the case Bowles v. Russell, 551 U.S. 205 (2007) cited by Bailey [7], the court gave Bowles a 17-days notice to fle an appeal but this notice was incorrectly computed from Rule 4(a)(6) and paragraph 2107(c), as it should have been 14 days. When Bowles fled his appeal on the 17th day, the court system dismissed the appeal on the basis that Bowles should have fled on the 14th day and not trust the notice the court gave him earlier. In more mundane cases, an incorrect date computation can deprive someone of their social benefts, or impose a higher late fee than what should be.

These doubts about date computation in software applying the law are all the more concerning that previous research in code open-sourced by French government agencies did not show a great deal of transparency or trustful practices on that particular matter. For instance, the custom programming language M, used by the French tax authority to compute income tax [35], encodes dates as mere foating-point numbers where the date is just a decimal number in the format DDMMYYYY. The French unemployment agency, whose IT system is mostly implemented in Java, uses a custom date library for its computation (fr.unedic.util.temps.Damj) but its implementation is omitted from their only open-source release [47].

#### 5.2 Catala's policy about date rounding

Recently, the Catala project [24, 34] has aimed to bring more accountability and transparency to programs computing taxes or social benefts inside government agencies. The Catala language is specifcally designed to allow the easy translation of computational law into code; in particular, it is based on prioritized default logic [10], which enables programmers to closely follow the base case/exception pattern that permeates the law. To increase confdence and explainability in its programs, Catala also comes with a formal semantics which is formalized in the F<sup>⋆</sup> proof assistant. These formal semantics mostly focus on Catala's default calculus, the encoding of prioritized default logic as a programming language, and do not specify all Catala expressions, including date computations.

Initially, the semantics of the date computations was defned by the behavior of the calendar OCaml library [50] used inside its interpreter. However, this library relies on the POSIX behavior which is not always monotonic and may appear quirky (for instance, it computes Jan 31st + 1 month as March 3rd for non-leap years) despite its very complete set of features. These unusual behaviors prompted a deeper investigation about the corner cases of date computations and led to the implementation of the library presented in this paper. While now integrated in the Catala interpreter, our library is standalone, and freely available with an open-source license. As the Catala compiler is implemented in OCaml, so is our library<sup>5</sup> , currently packaged with opam; however, by relying on our semantics, its implementation is straightforward. We do not foresee any difculty porting it to other languages, and plan to do so to support more of the Catala backends, including Python and JavaScript.

The default behavior of our date computation library inside the Catala interpreter is to raise a runtime exception whenever a date rounding is needed during a computation. This choice of behavior has been made conservatively because the decision to round up or down date computations in software enforcing legal rules is itself a legal rule that has to be specifed, as we described in the introduction of this paper. To avoid runtime exceptions, rounding rules can be specifed at the scope level (a precise defnition of Catala's scopes is outside the range of this paper, but it can be considered as a sort of function in Catala) and should be justifed, for example by a legal reference or interpretation.

We applied this methodology to fx the code of the biggest Catala program so far, which computes the French housing benefts [32]. Articles L822-4, R823-4 of the construction and housing Code, as well as article L512-3 of the social security Code, all feature a comparison of the age of the user to an age constant. However, as the input to the Catala program is not the age of the user but their birth date, we know such a comparison can be ambiguous if the user was born on February 29th on a leap year and if the current date is March 1st. In those situations, we took the decision to round up the date addition, as shown in Fig. 10, with the date rounding increasing mention. We are currently trying to contact the relevant government agencies operating the system for clarifcations about how this issue should be handled.

<sup>5</sup> Our F<sup>⋆</sup> formalization can be extracted to executable but non-idiomatic OCaml code. In practice, we thus manually reimplement our library in OCaml to use features such as named arguments or exceptions to provide a more idiomatic API.

```
1 declaration scope CheckingAgeInferiorEqual:
2 input birth_date content date
3 input current_date content date
4 input target_age content duration # always a number of years
5 output age_is_inferior_or_equal_target content boolean
6
7 scope CheckingAgeInferiorEqual:
8 definition age_is_inferior_or_equal_target equals
9 birth_date + target_age <= current_date
10 date rounding increasing
```
Fig. 11. Catala date ambiguity analysis pipeline

o+ Hints

To best beneft the recipient and be in line with the general principle underpinning legal interpretations of social security law in France, a better solution would be to perform the computation twice, by rounding up and down, and select the outcome most favorable to the user in case of disagreement. The fexibility ofered by our library allows us to do that, and we intend to explore this avenue in future work. Being able to control precisely where the rounding is done and how is key for developers and maintainers of such programs, as they are responsible for the legal efect of the program itself [22].

#### 5.3 Detecting potentially ambiguous computations

Choosing the rounding mode for each date computation allows us to precisely control the outcome of ambiguous computations. However, given the pervasiveness of such computations in legal texts, it is also extremely tedious, and fguring out the cases where an ambiguous computation could happen is complex. For these reasons, we expect some developers to delay this step and wait for incidents to fgure out the policy of the institution operating the program on the matter. But fguring out this policy might itself be tricky because of the automation frontier [33] strictly separating the developers from the decision-makers in charge of legal policy decisions.

To help developers reach out to the legal services of their institution with concrete examples of where things can go wrong before production incidents, we integrated the semantics and abstract domains presented in this paper inside the ongoing initiative to provide a proof platform for Catala programs [20]. By connecting the Catala compiler to the Mopsa static analyzer, we are able to check whether a date computation can be ambiguous in the context of the program, and often exhibit a counter-example if it is the case. We present in Fig. 11 our analysis pipeline. It consists of three main phases: program slicing, verifcation condition crafting, and analysis – which may generate counter-examples.

First, we scan the Catala program in one of its intermediate representation and look for Catala expressions susceptible of raising a runtime exception because of an ambiguous date computation. We use classic techniques of program slicing for this step, selecting only the target sub-expression and then adding the defnitions of variables used in that sub-expression recursively to extract a small, self-contained program with sufcient information to be analyzed. This will simplify the counter-example hint generation of Mopsa, which outputs constraints on variables rather than subexpressions of a computation.

Second, we augment the sliced program with the assertions and other information about its variables that are declared in the original Catala program to further constrain the search space. So far, our analysis is intraprocedural, but we are planning to implement an inlining pass to make it inter-procedural. We then translate the sliced program to Mopsa's toy language (using the .u extension), which can then be fed to the static analyzer.

Finally, we run Mopsa on the generated program. As we have mentioned in Sec. 4.5, Mopsa is able to exhibit potential counter-examples hints. While these hints are approximate due to incompleteness of the analysis, they are often sufcient to yield real, actionable counter-examples on the Catala programs that we analyzed. We extract relevant intervals and linear constraints and display them to the user, in the format illustrated by Fig. 9. While the intervals and constraints presented are descriptive, and sufcient for a programmer to identify concrete counter-examples, they can however be difcult to grasp for non-experts. Formatting these constraints in a more readable format is an interesting question, requiring further interaction with lawyers; we leave it as future work.

The implementation of housing benefts in Catala currently consists of about 20,000 lines (including the text of the law directly specifying it) that were written prior to this work. While automatically analyzing this implementation using our verifcation pipeline, we found issues in two date computations (one of them being our running example). In both cases, Mopsa was able to provide actionable counter-example hints. Several other computations were age computations, which are now handled by a custom scope with a legally interpreted date rounding mode, as shown in Fig. 10. Finally, remaining computations rely on durations defned outside of the analyzed scope, which requires an inter-scope analysis in Catala, which is being implemented. In the meantime, we performed a manual duration extraction in these cases and detected 16 new unsafe (roundingsensitive) date comparisons, which are real issues. In all cases, the provided counter-example hints are actionable. In 10 cases, the issues can only happen with a current date before 2023. By constraining the year to be greater or equal to 2023, these 10 cases are proved safe. All date arithmetic programs we have currently extracted or written are small and analyzed within three seconds.

As the number of Catala programs grows, we hope to apply our analyzer at a larger scale, possibly suggesting future avenues for improvement.

# 6 Related Work

We start by surveying the behavior of mainstream implementations of date arithmetic. We created a suite of litmus tests involving date-duration additions, and the expected result depending on the rounding mode. We wrote test drivers for each library, running those tests to decide which rounding mode applies.

The java.time library [43] provides a LocalDate class for dates and a Period class to express durations. In our tests, the addition is performed by rounding down. This behavior is explicitly described in the documentation [26]. To the best of our knowledge, there is no option to use another rounding mode, or fail during ambiguous computations. In the Python standard library, the datetime module [46] provides a date class and timedelta to express durations. However these durations cannot be defned in terms of months, but only in terms of days. A third party library called dateutil [45] provides a replacement feature, relativedelta, able to express durations in months and years. This library seems widely used, as it ranks within the top 20 most downloaded Python packages. On our tests, this library rounds down. This seems to be confrmed by the documentation stating that "adding one month will never cross the month boundary." Similarly to Java, this rounding behavior is not confgurable. The boost C++ [9] and the luxon [31] JavaScript libraries exhibit similar behaviors.

The coreutils implementation of date arithmetic follows a diferent principle, which is not expressible in our semantics. When adding months, this implementation frst computes an adjusted date which might not be valid. This adjusted date d<sup>a</sup> is then normalized using POSIX's mktime function. For example, adding one month to 2023-03-31 yields adjusted date 2023-04-31, which does not exist and is normalized into 2023-05-01. In this case, the behavior is the same as the upper rounding. There are however cases where its behavior differs: adding one month to 2023-01-31 yields adjusted date 2023-02-31, which is normalized into 2023-03-03. This behavior breaks monotonicity of the addition in the date argument (2023-02-01 + 1 month is 2023-03-01). In ambiguous computations, the debug mode of the date utility outputs a warning with the following message "when adding relative months/years, it is recommended to specify the 15th of the months" – which is a sufcient condition to avoid any ambiguity. This semantics is also followed by the calendar [50] library of OCaml.

We fnish this survey with the case of spreadsheet editors (such as Google Sheets), and highlight an inconsistent behavior we have found in them. The EDATE function adds a given number of months to a date. In our experiments, this function silently rounds down. As such, adding 18 years (that is, 216 months) to 2004-02-29 yields the date 2022-02-28. These spreadsheets applications also ofer the DATEDIF function, which can compute the duration in years between two dates. In that case, DATEDIF(2004-02-29, 2022-02-28) yields 17 years (18 years are reached when the second date is 2022-03-01). This behavior is inconsistent with EDATE. Cheng and Rival [11] focus on performing a type analysis of spreadsheet applications, given that a runtime type casting may silently happen and provide unwanted results (similarly to what JavaScript does). This analysis supports a variety of types, including dates, but as it focuses on type information there is no mention of the value semantics of operations on dates.

The book of Reingold and Dershowitz [48] can be seen as the hacker's delight of calendar computations, with many efcient formulas for day additions, and a wide range of diferent calendars being presented. Their work does not mention nor address the issue of month addition, and potential date rounding, which is at the core of our work. Although we have not needed it for now, we could leverage their approach to optimize the recursive computations of our library. Similarly, ISO 8601 defnes the representation of dates in the Gregorian calendar, but does not address date-duration additions with years or months.

The Formal Vindications start-up developed a mechanized, formally verifed implementation of a time management library [2, 3] in Coq, computing over dates and time, including specifc technical points (timezones, leap seconds). Their duration of a month is defned as 30 days. Some recent changes allow to round down dates. A similar efort was developed in Lean 4 by Bailey [6], but this library only supports the addition of days to a date. As a reminder, the Catala project currently targets laws that do not need to go beyond the precision of a day in terms of time management. Formal Vindications developed a formally verifed, high-precision tachograph software for enforcing truck drivers scheduling laws [19].

We fnish this related work by highlighting similarities between foating-point and date arithmetic. Floating-point arithmetic is more complex and widely used, but both settings have rounding operators with diferent modes available. This similarity has guided us in our search for properties that hold and counterexamples presented in Sec. 3. The static analysis to prove non-ambiguity of date computations presented in Sec. 4 can be seen as the abstract execution of the computation under both rounding modes, to compare results. To the best of our knowledge, no such static analysis for foating-point programs try to bound the diference in computations between two rounding modes. Tools such as Daisy [8, 18], Fluctuat [23] and FPTaylor [51] usually aim at upper-bounding errors between ideal computations over reals and a machine computation using foatingpoint.

# 7 Conclusion and Future Work

Legal expert systems rely on date computations, which are ambiguously defned in some corner cases. There are diferent ways of solving these ambiguities through diferent rounding operators, where no operator prevails over the others. We have thus defned semantics for date computations, taking into account these ambiguities to either raise errors, or round the result (either up or down). This semantics has been implemented into a publicly available OCaml library. We have studied this semantics and have formally proved several properties they satisfy, and exhibited counter-examples to usual properties they do not satisfy. We have defned and implemented an analysis that is able to prove an expression to be rounding-insensitive in a given program. This analysis relies on partitioning and relational abstract domains to maintain the best possible precision, and can generate understandable counter-examples hints. Both our library and the rounding-sensitivity analysis have been integrated within the Catala language – which focuses on implementing computational laws. Through our analysis, we found rounding-sensitivity issues in the implementation of the French housing benefts in Catala. We surveyed the behavior of mainstream date arithmetic libraries, and developed litmus tests that can be used to test new libraries.

There are limitations to our static analysis: its soundness has not been proved mechanically, but the proofs simply lift theorems that have been formally verifed. The current analyzed language is a core imperative language which was sufcient for our case studies. Having an inter-scope analysis within the Catala to Mopsa translation would improve our precision in the case study. We plan to craft human-readable error messages from Mopsa's output. We believe the relevant constraints are already properly extracted by Mopsa and the rest of the work consists in engineering, in order to inverse the translation from Catala date computations to Mopsa programs.

In spite of these limitations, we believe this paper to be a crucial step into clarifying and improving the robustness of many computer programs implementing "business logic", often overlooked by formal methods. The widespread use of date arithmetic in programs used by companies or government agencies to operate massive fnancial transfer should have prompted a formal analysis of date rounding a long time ago, but the existing literature only indicates a recent interest from the formal methods community on the matter.

This work was triggered by the problems we found during interdisciplinary investigations about French housing benefts using the Catala programming language. From these investigations surfaced the need for various formal analysis, which we have thus started integrating into the programming language. We hope to further develop the integration of static analysis into the Catala proof platform, thus benefting both legal and computer science users by including formal methods advances into development processes of Catala programs.

Artifact Availability Statement. All our development is under open-source licenses, public or in the process of being upstreamed into a public development. To foster reproducibility of our results, we provide an artefact [39] containing the formal proofs written in F<sup>⋆</sup> , our date calculation library, and our ambiguity detection analysis as well as supporting evidence of our case study.

Acknowledgements. We thank the anonymous reviewers for their constructive feedback and support of our work. We are obliged to Abdelraouf Ouadjaout for making his implementation of partitioning within Mopsa available to us. We are grateful to David Delmas for the discussions around double semantics, Liane Huttner & Sarah Lawsky for the interesting discussions around the properties this work targets, and Louis Gesbert for his technical help around the Catala compiler. We appreciated the many discussions and valuable feedback about this work we got from the whole Catala team.


LexUriServ/LexUriServ.do?uri=CELEX%3A31971R1182%3AEN%3AHTML (1971)


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Author Index**

#### **A**

Ågren Thuné, Anders II-59 Avanzini, Martin II-31

#### **B**

Balcer, Piotr II-150, II-180 Broman, David II-302

### **C**

Caires, Luís I-206 Chen, Liang-Ting I-115 Cohen, Cyril I-239, I-269 Colledan, Andrea II-3 Crance, Enzo I-239, I-269

#### **D**

D'Souza, Deepak II-245 Dal Lago, Ugo II-3 Dongol, Brijesh II-150, II-180 Dvir, Yotam II-121

**E** Eriksson, Oscar II-302

#### **F**

Fedyukovich, Grigory II-245 Felicissimo, Thiago I-143, I-171 Fromherz, Aymeric II-421 Fujinami, Hiroya II-90

#### **G**

Gavazzo, Francesco I-22 Guo, Liye II-331

#### **H**

Hasuo, Ichiro II-90 Helm, Dominik II-361 Hu, Jason Z. S. I-52 Hughes, Jack I-83 Hummelgren, Lars II-302 **I** Itzhaky, Shachar II-212

#### **K**

Kammar, Ohad II-121 Kanabar, Hrutvik II-275 Keidel, Sven II-361 Ko, Hsiang-Shang I-115 Kop, Cynthia II-331 Korban, Kacper II-275 Kudlicka, Jan II-302

#### **L**

Lahav, Ori II-121, II-150, II-180 Laurent, Théo I-302, I-332 Lennon-Bertrand, Meven I-302, I-332 Lermusiaux, Pierre II-391 Li, Elaine I-176 Lindley, Sam I-3 Lundén, Daniel II-302

#### **M**

Mahboubi, Assia I-239, I-269 Maillard, Kenji I-302, I-332 Matache, Cristina I-3 Matsuda, Kazutaka II-59 Merigoux, Denis II-421 Mezini, Mira II-361 Monat, Raphaël II-421 Montagu, Benoît II-391 Moser, Georg II-31 Moss, Sean I-3 Myreen, Magnus O. II-275

#### **O**

Orchard, Dominic I-83

#### **P**

Péchoux, Romain II-31 Perdrix, Simon II-31 Pientka, Brigitte I-52 Pujet, Loïc I-275

© The Editor(s) (if applicable) and The Author(s) 2024 S. Weirich (Ed.): ESOP 2024, LNCS 14577, pp. 451–452, 2024. https://doi.org/10.1007/978-3-031-57267-8

#### **R**

Raad, Azalea II-150, II-180, II-185 Roth, Tobias II-361

#### **S**

S, Sumanth Prabhu II-245 Shoham, Sharon II-212 Staton, Sam I-3 Stefanesco, Léo II-185 Stutz, Felix I-176

#### **T**

Tabareau, Nicolas I-275 Toninho, Bernardo I-206 Treglia, Riccardo I-22

#### **V**

Vafeiadis, Viktor II-185 Vanoni, Gabriele I-22 Vizel, Yakir II-212

#### **W**

Wang, Meng II-59 Wickerson, John II-150, II-180 Wies, Thomas I-176 Wu, Nicolas I-3

#### **Y**

Yang, Zhixuan I-3