**Constantin Enea Akash Lal (Eds.)**

# **Computer Aided Verification**

**35th International Conference, CAV 2023 Paris, France, July 17–22, 2023 Proceedings, Part III**

## **Lecture Notes in Computer Science 13966**

Founding Editors

Gerhard Goos Juris Hartmanis

### Editorial Board Members

Elisa Bertino, *Purdue University, West Lafayette, IN, USA* Wen Gao, *Peking University, Beijing, China* Bernhard Steffen , *TU Dortmund University, Dortmund, Germany* Moti Yung , *Columbia University, New York, NY, USA*

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education.

LNCS enjoys close cooperation with the computer science R & D community, the series counts many renowned academics among its volume editors and paper authors, and collaborates with prestigious societies. Its mission is to serve this international community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings. LNCS commenced publication in 1973.

Constantin Enea · Akash Lal Editors

## Computer Aided Verification

35th International Conference, CAV 2023 Paris, France, July 17–22, 2023 Proceedings, Part III

*Editors* Constantin Enea LIX, Ecole Polytechnique, CNRS and Institut Polytechnique de Paris Palaiseau, France

Akash Lal Microsoft Research Bangalore, India

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-37708-2 ISBN 978-3-031-37709-9 (eBook) https://doi.org/10.1007/978-3-031-37709-9

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## **Preface**

It was our privilege to serve as the program chairs for CAV 2023, the 35th International Conference on Computer-Aided Verification. CAV 2023 was held during July 19–22, 2023 and the pre-conference workshops were held during July 17–18, 2023. CAV 2023 was an in-person event, in Paris, France.

CAV is an annual conference dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems. The primary focus of CAV is to extend the frontiers of verification techniques by expanding to new domains such as security, quantum computing, and machine learning. This puts CAV at the cutting edge of formal methods research, and this year's program is a reflection of this commitment.

CAV 2023 received a large number of submissions (261). We accepted 15 tool papers, 3 case-study papers, and 49 regular papers, which amounts to an acceptance rate of roughly 26%. The accepted papers cover a wide spectrum of topics, from theoretical results to applications of formal methods. These papers apply or extend formal methods to a wide range of domains such as concurrency, machine learning and neural networks, quantum systems, as well as hybrid and stochastic systems. The program featured keynote talks by Ruzica Piskac (Yale University), Sumit Gulwani (Microsoft), and Caroline Trippel (Stanford University). In addition to the contributed talks, CAV also hosted the CAV Award ceremony, and a report from the Synthesis Competition (SYNTCOMP) chairs.

In addition to the main conference, CAV 2023 hosted the following workshops: Meeting on String Constraints and Applications (MOSCA), Verification Witnesses and Their Validation (VeWit), Verification of Probabilistic Programs (VeriProP), Open Problems in Learning and Verification of Neural Networks (WOLVERINE), Deep Learning-aided Verification (DAV), Hyperproperties: Advances in Theory and Practice (HYPER), Synthesis (SYNT), Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), and Verification Mentoring Workshop (VMW). CAV 2023 also hosted a workshop dedicated to Thomas A. Henzinger for this 60th birthday.

Organizing a flagship conference like CAV requires a great deal of effort from the community. The Program Committee for CAV 2023 consisted of 76 members—a committee of this size ensures that each member has to review only a reasonable number of papers in the allotted time. In all, the committee members wrote over 730 reviews while investing significant effort to maintain and ensure the high quality of the conference program. We are grateful to the CAV 2023 Program Committee for their outstanding efforts in evaluating the submissions and making sure that each paper got a fair chance. Like recent years in CAV, we made artifact evaluation mandatory for tool paper submissions, but optional for the rest of the accepted papers. This year we received 48 artifact submissions, out of which 47 submissions received at least one badge. The Artifact Evaluation Committee consisted of 119 members who put in significant effort to evaluate each artifact. The goal of this process was to provide constructive feedback to tool developers and help make the research published in CAV more reproducible. We are also very grateful to the Artifact Evaluation Committee for their hard work and dedication in evaluating the submitted artifacts.

CAV 2023 would not have been possible without the tremendous help we received from several individuals, and we would like to thank everyone who helped make CAV 2023 a success. We would like to thank Alessandro Cimatti, Isil Dillig, Javier Esparza, Azadeh Farzan, Joost-Pieter Katoen and Corina Pasareanu for serving as area chairs. We also thank Bernhard Kragl and Daniel Dietsch for chairing the Artifact Evaluation Committee. We also thank Mohamed Faouzi Atig for chairing the workshop organization as well as leading publicity efforts, Eric Koskinen as the fellowship chair, Sebastian Bardin and Ruzica Piskac as sponsorship chairs, and Srinidhi Nagendra as the website chair. Srinidhi, along with Enrique Román Calvo, helped prepare the proceedings. We also thank Ankush Desai, Eric Koskinen, Burcu Kulahcioglu Ozkan, Marijana Lazic, and Matteo Sammartino for chairing the mentoring workshop. Last but not least, we would like to thank the members of the CAV Steering Committee (Kenneth McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important aspects of organizing CAV 2023.

We hope that you will find the proceedings of CAV 2023 scientifically interesting and thought-provoking!

June 2023 Constantin Enea Akash Lal

## **Organization**

## **Conference Co-chairs**


## **Artifact Co-chairs**


## **Workshop Chair**


Mohamed Faouzi Atig Uppsala University, Sweden

## **Verification Mentoring Workshop Organizing Committee**


## **Fellowship Chair**


#### **Website Chair**


## **Sponsorship Co-chairs**


## **Proceedings Chairs**


## **Program Committee**

Aarti Gupta Princeton University, USA Alexander Nadel Intel, Israel Ankush Desai Amazon Web Services Anna Slobodova Intel, USA

Arjun Radhakrishna Microsoft, India

Cezara Dragoi Amazon Web Services, USA

Abhishek Bichhawat IIT Gandhinagar, India Aditya V. Thakur University of California, USA Ahmed Bouajjani University of Paris, France Aina Niemetz Stanford University, USA Akash Lal Microsoft Research, India Alan J. Hu University of British Columbia, Canada Alessandro Cimatti Fondazione Bruno Kessler, Italy Anastasia Mavridou KBR, NASA Ames Research Center, USA Andreas Podelski University of Freiburg, Germany Anthony Widjaja Lin TU Kaiserslautern and Max-Planck Institute for Software Systems, Germany Arie Gurfinkel University of Waterloo, Canada Aws Albarghouthi University of Wisconsin-Madison, USA Azadeh Farzan University of Toronto, Canada Bernd Finkbeiner CISPA Helmholtz Center for Information Security, Germany Bettina Koenighofer Graz University of Technology, Austria Bor-Yuh Evan Chang University of Colorado Boulder and Amazon, USA Burcu Kulahcioglu Ozkan Delft University of Technology, The Netherlands Caterina Urban Inria and École Normale Supérieure, France

Corina Pasareanu CMU, USA

Juneyoung Lee AWS, USA Kshitij Bansal Google, USA Kyungmin Bae POSTECH, South Korea Marcell Vazquez-Chanlatte Alliance Innovation Lab

Markus Rabe Google, USA Michael Emmi AWS, USA

Christoph Matheja Technical University of Denmark, Denmark Claudia Cauli Amazon Web Services, UK Constantin Enea LIX, CNRS, Ecole Polytechnique, France Cristina David University of Bristol, UK Dirk Beyer LMU Munich, Germany Elizabeth Polgreen University of Edinburgh, UK Elvira Albert Complutense University, Spain Eunsuk Kang Carnegie Mellon University, USA Gennaro Parlato University of Molise, Italy Hossein Hojjat Tehran University and Tehran Institute of Advanced Studies, Iran Ichiro Hasuo National Institute of Informatics, Japan Isil Dillig University of Texas, Austin, USA Javier Esparza Technische Universität München, Germany Joost-Pieter Katoen RWTH-Aachen University, Germany Jyotirmoy Deshmukh University of Southern California, USA Kenneth L. McMillan University of Texas at Austin, USA Kristin Yvonne Rozier Iowa State University, USA Kuldeep Meel National University of Singapore, Singapore (Nissan-Renault-Mitsubishi), USA Marieke Huisman University of Twente, The Netherlands Marta Kwiatkowska University of Oxford, UK Matthias Heizmann University of Freiburg, Germany Mihaela Sighireanu University Paris Saclay, ENS Paris-Saclay and CNRS, France Mohamed Faouzi Atig Uppsala University, Sweden Naijun Zhan Institute of Software, Chinese Academy of Sciences, China Nikolaj Bjorner Microsoft Research, USA Nina Narodytska VMware Research, USA Pavithra Prabhakar Kansas State University, USA Pierre Ganty IMDEA Software Institute, Spain Rupak Majumdar Max Planck Institute for Software Systems, Germany Ruzica Piskac Yale University, USA

Serdar Tasiran Amazon, USA Shaz Qadeer Meta, USA Swarat Chaudhuri UT Austin, USA

Sebastian Junges Radboud University, The Netherlands Sébastien Bardin CEA, LIST, Université Paris Saclay, France Sharon Shoham Tel Aviv University, Israel Shuvendu Lahiri Microsoft Research, USA Subhajit Roy Indian Institute of Technology, Kanpur, India Suguman Bansal Georgia Institute of Technology, USA Sylvie Putot École Polytechnique, France Thomas Wahl GrammaTech, USA Tomáš Vojnar Brno University of Technology, FIT, Czech Republic Yakir Vizel Technion - Israel Institute of Technology, Israel Yu-Fang Chen Academia Sinica, Taiwan Zhilin Wu State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China

## **Artifact Evaluation Committee**

Alvin George IISc Bangalore, India Amit Samanta University of Utah, USA Anan Kabaha Technion, Israel Andres Noetzli Cubist, Inc., USA Avraham Raviv Bar Ilan University, Israel Ayrat Khalimov TU Clausthal, Germany Charles Babu M. CEA LIST, France

Alejandro Hernández-Cerezo Complutense University of Madrid, Spain Aman Goel Amazon Web Services, USA Anna Becchi Fondazione Bruno Kessler, Italy Arnab Sharma University of Oldenburg, Germany Baoluo Meng General Electric Research, USA Benjamin Jones Amazon Web Services, USA Bohua Zhan Institute of Software, Chinese Academy of Sciences, China Cayden Codel Carnegie Mellon University, USA Chungha Sung Amazon Web Services, USA Clara Rodriguez-Núñez Universidad Complutense de Madrid, Spain Cyrus Liu Stevens Institute of Technology, USA Daniel Hausmann University of Gothenburg, Sweden

Daniela Kaufmann TU Wien, Austria Debasmita Lohar MPI SWS, Germany Denis Mazzucato Inria, France Ferhat Erata Yale University, USA Filipe Arruda UFPE, Brazil Florian Sextl TU Wien, Austria Frédéric Recoules CEA LIST, France Goktug Saatcioglu Cornell, USA Grégoire Menguy CEA LIST, France Hadrien Renaud UCL, UK Ignacio D. Lopez-Miguel TU Wien, Austria John Kolesar Yale University, USA Kirby Linvill CU Boulder, USA Luke Geeson UCL, UK

Deivid Vale Radboud University Nijmegen, Netherlands Dorde Žikeli´c Institute of Science and Technology Austria, Austria Ekanshdeep Gupta New York University, USA Enrico Magnago Amazon Web Services, USA Filip Cordoba Graz University of Technology, Austria Florian Dorfhuber Technical University of Munich, Germany Francesco Parolini Sorbonne University, France Goran Piskachev Amazon Web Services, USA Guy Amir Hebrew University of Jerusalem, Israel Habeeb P. Indian Institute of Science, Bangalore, India Haoze Wu Stanford University, USA Hari Krishnan University of Waterloo, Canada Hünkar Tunç Aarhus University, Denmark Idan Refaeli Hebrew University of Jerusalem, Israel Ilina Stoilkovska Amazon Web Services, USA Ira Fesefeldt RWTH Aachen University, Germany Jahid Choton Kansas State University, USA Jie An National Institute of Informatics, Japan Joseph Scott University of Waterloo, Canada Kevin Lotz Kiel University, Germany Kush Grover Technical University of Munich, Germany Levente Bajczi Budapest University of Technology and Economics, Hungary Liangcheng Yu University of Pennsylvania, USA Lutz Klinkenberg RWTH Aachen University, Germany Marek Chalupa Institute of Science and Technology Austria, Austria

Matthias Hetzenberger TU Wien, Austria Mertcan Temel Intel Corporation, USA Michele Chiari TU Wien, Austria

Omkar Tuppe IIT Bombay, India

Sankalp Gambhir EPFL, Switzerland

Mario Bucev EPFL, Switzerland Mário Pereira NOVA LINCS—Nova School of Science and Technology, Portugal Marius Mikucionis Aalborg University, Denmark Martin Jonáš Masaryk University, Czech Republic Mathias Fleury University of Freiburg, Germany Maximilian Heisinger Johannes Kepler University Linz, Austria Miguel Isabel Universidad Complutense de Madrid, Spain Mihai Nicola Stevens Institute of Technology, USA Mihály Dobos-Kovács Budapest University of Technology and Economics, Hungary Mikael Mayer Amazon Web Services, USA Mitja Kulczynski Kiel University, Germany Muhammad Mansur Amazon Web Services, USA Muqsit Azeem Technical University of Munich, Germany Neelanjana Pal Vanderbilt University, USA Nicolas Koh Princeton University, USA Niklas Metzger CISPA Helmholtz Center for Information Security, Germany Pablo Gordillo Complutense University of Madrid, Spain Pankaj Kalita Indian Institute of Technology, Kanpur, India Parisa Fathololumi Stevens Institute of Technology, USA Pavel Hudec HKUST, Hong Kong, China Peixin Wang University of Oxford, UK Philippe Heim CISPA Helmholtz Center for Information Security, Germany Pritam Gharat Microsoft Research, India Priyanka Darke TCS Research, India Ranadeep Biswas Informal Systems, Canada Robert Rubbens University of Twente, Netherlands Rubén Rubio Universidad Complutense de Madrid, Spain Samuel Judson Yale University, USA Samuel Pastva Institute of Science and Technology Austria, Austria Sarbojit Das Uppsala University, Sweden Sascha Klüppelholz Technische Universität Dresden, Germany Sean Kauffman Aalborg University, Denmark


## **Additional Reviewers**

Azzopardi, Shaun Baier, Daniel Belardinelli, Francesco Bergstraesser, Pascal Boker, Udi Ceska, Milan Chien, Po-Chun Coglio, Alessandro Correas, Jesús Doveri, Kyveli Drachsler Cohen, Dana Durand, Serge Fried, Dror Genaim, Samir Ghosh, Bishwamittra Gordillo, Pablo

Guillermo, Roman Diez Gómez-Zamalloa, Miguel Hernández-Cerezo, Alejandro Holík, Lukáš Isabel, Miguel Ivrii, Alexander Izza, Yacine Jothimurugan, Kishor Kaivola, Roope Kaminski, Benjamin Lucien Kettl, Matthias Kretinsky, Jan Lengal, Ondrej Losa, Giuliano Luo, Ning Malik, Viktor

Markgraf, Oliver Martin-Martin, Enrique Meller, Yael Perez, Mateo Petri, Gustavo Pote, Yash Preiner, Mathias Rakamaric, Zvonimir Rastogi, Aseem Razavi, Niloofar Rogalewicz, Adam Sangnier, Arnaud Sarkar, Uddalok Schoepe, Daniel Sergey, Ilya

Stoilkovska, Ilina Stucki, Sandro Tsai, Wei-Lun Turrini, Andrea Vafeiadis, Viktor Valiron, Benoît Wachowitz, Henrik Wang, Chao Wang, Yuepeng Wies, Thomas Yang, Jiong Yen, Di-De Zhu, Shufang Žikeli´c, Ɖor de Zohar, Yoni

## **Contents – Part III**

#### **Probabilistic Systems**





**Author Index** . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499

## **Probabilistic Systems**

## A Flexible Toolchain for Symbolic Rabin Games under Fair and Stochastic Uncertainties

Rupak Majumdar<sup>1</sup>, Kaushik Mallik2(B) , Mateusz Rychlicki<sup>3</sup>, Anne-Kathrin Schmuck<sup>1</sup>, and Sadegh Soudjani<sup>4</sup>

<sup>1</sup> MPI-SWS, Kaiserslautern, Germany {rupak,akschmuck}@mpi-sws.org <sup>2</sup> ISTA, Klosterneuburg, Austria kaushik.mallik@ist.ac.at

<sup>3</sup> School of Computing, University of Leeds, Leeds, UK scmkry@leeds.ac.uk <sup>4</sup> Newcastle University, Newcastle upon Tyne, UK

Sadegh.Soudjani@newcastle.ac.uk

Abstract. We present a flexible and efficient toolchain to *symbolically* solve (standard) Rabin games, fair-adversarial Rabin games, and 2<sup>1</sup>/2 player Rabin games. To our best knowledge, our tools are the first ones to be able to solve these problems. Furthermore, using these flexible game solvers as a back-end, we implemented a tool for computing correctby-construction controllers for stochastic dynamical systems under LTL specifications. Our implementations use the recent theoretical result that all of these games can be solved using the same symbolic fixpoint algorithm but utilizing different, domain specific calculations of the involved predecessor operators. The main feature of our toolchain is the utilization of two programming abstractions: one to separate the symbolic fixpoint computations from the predecessor calculations, and another one to allow the integration of different BDD libraries as back-ends. In particular, we employ a multi-threaded execution of the fixpoint algorithm by using the multi-threaded BDD library Sylvan, which leads to enormous computational savings.

## 1 Introduction

Piterman and Pnueli [17] derived the currently best known symbolic algorithm for solving two-player Rabin games over finite graphs with a theoretical complexity of O(n<sup>k</sup>+1k!) in time and space, where <sup>n</sup> is the number of states and <sup>k</sup> is the number of pairs in the winning condition. This work did not provide an

Authors ordered alphabetically. R. Majumdar and A.-K. Schmuck are partially supported by DFG project 389792660 TRR 248-CPEC. A.-K. Schmuck is additionally funded through DFG project (SCHM 3541/1-1). K. Mallik is supported by the ERC project ERC-2020-AdG 101020093. M. Rychlicki is supported by the EPSRC project EP/V00252X/1. S. Soudjani is supported by the following projects: EPSRC EP/V043676/1, EIC 101070802, and ERC 101089047.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 3–15, 2023. https://doi.org/10.1007/978-3-031-37709-9\_1

implementation. In a series of papers [3,4,15,16], Mallik et al. showed that this symbolic algorithm can be extended to solve different automated design questions for reactive hardware, software, and cyber-physical systems under fair or stochastic uncertainties. The main contribution of their work is to show that these extensions only require a very mild syntactic change of the Piterman-Pnueli fixed-point algorithm (with very little effect on its overall complexity) and domain-specific realizations of two types of predecessor operators used therein.

Using this insight, we present a *toolchain* for the *efficient symbolic solution of different extensions of Rabin games*. We have created three inter-connected libraries for solving different parts of the problem from different levels of abstraction. The first library, called Genie, offers a set of virtual classes to implement the fixpoint algorithm—abstractly, leaving open (i.e. virtual) the predecessor computation. Alongside, we created two other libraries, called FairSyn and Mascot-SDS, where FairSyn solves fair-adversarial [4] and <sup>2</sup><sup>1</sup>/<sup>2</sup>-player Rabin games [3], while Mascot-SDS solves abstraction-based control problems [15,16]. FairSyn and Mascot-SDS use the optimized fixpoint computation provided by Genie, with domain specific implementations of the predecessor operations.

The flexibility of our toolchain comes from two different programming abstractions in Genie. Firstly, Genie offers multiple high-level optimizations for solving the Rabin fixpoint, such as parallel execution (requires a thread-safe BDD library like Sylvan) and an acceleration technique [13], while abstracting away from the low-level implementations of the predecessor functions. As a result, any synthesis problem using the core Rabin fixpoint of Genie can use the optimizations without spending any extra implementation effort. We used these optimizations from FairSyn and Mascot-SDS, and achieved remarkable computational savings. Secondly, Genie offers easy portability of codes from one BDD library to another, which is important as different BDD libraries have different pros and cons, and the choice of the best library depends on the needs. We empirically showed how switching between the two BDD libraries Sylvan and CUDD impacts the performance of FairSyn and CUDD: overall, the Sylvanbased experiments were significantly faster, whereas the CUDD-based experiments consumed considerably lower amount of memory. Using the combined power of multi-threaded BDD operations using Sylvan and the optimizations offered by Genie, Mascot-SDS was between one and three orders of magnitude faster than the state-of-the-art tool in our experiments.

Comparison with Existing Tools: We are not aware of any available tool to directly solve (normal or stochastic) Rabin games *symbolically*. However, it is wellknown how to translate *stochastic* Rabin games into (standard) Rabin games [5], and Rabin games into parity games, for which efficient solvers exist, e.g. oink [9]. Yet, efficient solutions of stochastic Rabin games via parity games are difficult to obtain, because: (i) the translation from a stochastic Rabin game to a Rabin game involves a quadratic blow-up, and the translation from a Rabin game to a parity game results in an exponential blow-up in the size of the game, (ii) symbolic fixpoint computations become cumbersome very fast for parity games, as the number of vertices and/or colors in the game graph increases, leading to high computation times in practice, and (iii) the only known algorithms capable of handling fair and stochastic uncertainties efficiently are all *symbolic* in nature, while most of the efficient parity game solvers are non-symbolic. Additionally, unlike the Rabin fixpoint, the nesting of the parity fixpoint does not enable parallel execution.

While it is well known that for normal parity games, computational tractability can be achieved by different non-symbolic algorithms, such as Zielonka's algorithm [22], tangle learning [8] or strategy-improvement [19], implemented in oink [9], it is currently unclear if and how these algorithms allow for the efficient handling of fair or stochastic uncertainties. We are therefore unable to compare our toolchain to the translational workflow via parity games in a fair manner.

In the area of temporal logic control of stochastic systems, Mascot-SDS has two powerful features: (a) it can handle synthesis for the rich class of omegaregular (infinite-horizon) specifications, and (b) it provides both over- and underapproximations of the solution, thus enabling a quantitative refinement loop for improving the precision of the approximation. The features of Mascot-SDS is compared with other tools in the stochastic category of the recent ARCH competition (see the report [1] for the list of participating tools). As concluded in the report of the competition, other state-of-the-art tools in stochastic category are either limited to a fragment of ω-regular specifications or do not provide any indication of the quality of the involved approximations. The only tool [10] that supports ω-regular specifications uses a different alternate non-symbolic approach, against which Mascot-SDS fares significantly well in our experiments (see Sect. 4.2). Even if we leave stochasticity aside, our tool implements a new and orthogonal heuristic for multi-threaded computation of Rabin fixpoints, which is not considered by other controller synthesis tools [11].

#### 2 Theoretical Background

We briefly state the synthesis problems our toolchain is solving. We follow the same (standard) notation for two-player game graphs, winning regions, strategies and μ-calculus formulas, as in [4].

#### 2.1 Solving Rabin Games Symbolically

Given a game graph <sup>G</sup> = (V,V<sup>0</sup>, V<sup>1</sup>, E), a Rabin game is specified using a set of Rabin pairs <sup>R</sup> <sup>=</sup> {(Q<sup>1</sup>, R<sup>1</sup>,),...,(Q<sup>k</sup>, R<sup>k</sup>)}, with <sup>Q</sup><sup>i</sup>, R<sup>i</sup> <sup>⊆</sup> <sup>V</sup> for every <sup>i</sup> <sup>∈</sup> [1; k], and ϕ := - i∈[1;k] (♦¬R<sup>i</sup> <sup>∧</sup> ♦Q<sup>i</sup>) being the Rabin acceptance condition. Piterman and Pnueli [17] showed that the winning region of a Rabin game can be computed using the μ-calculus expression given in (2), where the set transformers *Cpre* : 2<sup>V</sup> <sup>→</sup> <sup>2</sup><sup>V</sup> and *Apre* : 2<sup>V</sup> <sup>×</sup> <sup>2</sup><sup>V</sup> <sup>→</sup> <sup>2</sup><sup>V</sup> are defined for every S, T <sup>⊆</sup> <sup>V</sup> as:

$$\begin{aligned} Cpre(S) &:= \{ v \in V\_0 \mid \exists v' \in S \ . \ (v, v') \in E \} \\ &\cup \{ v \in V\_1 \mid \forall v' \in V \ . \ (v, v') \in E \implies v' \in S \} \,, \tag{1a} \\ Apre(S, T) &:= Cpre(T) . \end{aligned} \tag{1b}$$

Fair-Adversarial Rabin Games. A Rabin game is called *fair-adversarial* when there is an additional fairness assumption on a set of edges originating from The symbolic fixpoint algorithm for solving Rabin games with <sup>R</sup> <sup>=</sup> {(Q1, R1),..., (Q*k*, R*k*)} and <sup>K</sup> = [1; <sup>k</sup>]:

$$\nu Y\_{\mathcal{P}0} \cdot \mu X\_{\mathcal{P}0} \cdot \bigcup\_{p\_1 \in K} \nu Y\_{p\_1} \cdot \mu X\_{p\_1} \cdot \bigcup\_{p\_2 \in K \backslash \{p\_1\}} \nu Y\_{p\_2} \cdot \mu X\_{p\_2} \cdot \dots \cdot \bigcup\_{p\_k \in K \backslash \{p\_1, \dots, p\_{k-1}\}} \nu Y\_{p\_k} \cdot \mu X\_{p\_k} \cdot \left[ \bigcup\_{j=0}^k \mathcal{C}\_{p\_j} \right], \text{(2)}$$

where

$$\mathcal{L}\_{p\_j} := \left( \bigcap\_{i=0}^j \overline{R}\_{p\_i} \right) \cap \left[ \left( Q\_{p\_j} \cap Cpre(Y\_{p\_j}) \right) \cup \left( Apre(Y\_{p\_j}, X\_{p\_j}) \right) \right],$$

and the definitions of *Cpre* and *Apre* are problem specific.

*Player* <sup>1</sup> vertices in G. Let E <sup>⊆</sup> E∩(V<sup>1</sup>×V ) be a given set of edges, called the *live* edges. Given E and a Rabin winning condition <sup>ϕ</sup>, we say that *Player* <sup>0</sup> wins the *fair-adversarial Rabin game* from a vertex v if *Player* <sup>0</sup> wins the (normal) game for the modified winning condition ϕ := e=(v,v)∈E- (♦<sup>v</sup> <sup>=</sup><sup>⇒</sup> ♦e) =⇒ ϕ. Based on the results of Banerjee et al. [4], fair-adversarial Rabin games can be solved via (2), by defining for every S, T <sup>⊆</sup> V

$$\begin{aligned} Cpre(S) &:= \{ v \in V\_0 \mid \exists v' \in S \ . \ (v, v') \in E \} \\ &\cup \{ v \in V\_1 \mid \forall v' \in V \ . \ (v, v') \in E \implies v' \in S \} \end{aligned} \tag{3a}$$
 
$$(A.27) \quad \begin{array}{cccc} \alpha & \text{car } & \text{car } & \text{car } & \text{car } \wedge & \text{car } & \text{car } & \text{car } \wedge & \text{car } \wedge \end{array} \rangle)$$

$$Apre(S, T) := Cpre(T) \cup \left\{ v \in Cpre(S) \cap V\_1 \mid \exists v' \in T \ . \right. \left. \left( v, v' \right) \in E^{\ell} \right\}.\tag{3b}$$

We see that (3) coincides with (1) if <sup>E</sup> is empty.

<sup>2</sup><sup>1</sup>/<sup>2</sup>-Player Rabin Games. <sup>A</sup> <sup>2</sup><sup>1</sup>/<sup>2</sup>-player game is played on a game graph (V,V<sup>0</sup>, V<sup>1</sup>, V<sup>r</sup>, E), and the only difference from a <sup>2</sup>-player game graph is the additional set of vertices <sup>V</sup><sup>r</sup> which are called the *random* vertices. The sets <sup>V</sup><sup>1</sup>, <sup>V</sup><sup>2</sup>, and <sup>V</sup><sup>r</sup> partition <sup>V</sup> . Based on the results of [3] <sup>2</sup><sup>1</sup>/<sup>2</sup>-Player rabin games can be solved via (2) by defining for all S, T <sup>⊆</sup> V

$$\begin{aligned} Cpre(S) &:= \left\{ v \in V\_0 \mid \exists v' \in S \ . \ (v, v') \in E \right\} \\ &\cup \left\{ v \in V\_1 \cup V\_r \mid \forall v' \in V \ . \ (v, v') \in E \Rightarrow v' \in S \right\}, \quad \text{(4a)} \\ &\quad (C \cup \mathsf{m}) \quad \quad C \quad \quad (\mathsf{m}) \cup \{ \mathsf{m} \to \mathsf{C} \ . \ (\mathsf{C}) \supset V \ . \ \exists \ . \ (\ \wedge \ \neg \mathsf{m}) \quad \quad \quad (\mathsf{1}) \end{aligned}$$

$$Apre(S, T) := Cpre(T) \cup \{ v \in Cpre(S) \cap V\_r \mid \exists v' \in T \text{ .} \ (v, v') \in E \} \,. \tag{4b}$$

#### 2.2 Computing Symbolic Controllers for Stochastic Dynamical Systems

A discrete-time stochastic dynamical system S is represented using a tuple (X, U, W, f), where X <sup>⊆</sup> <sup>R</sup><sup>n</sup> is a *continuous* state space, <sup>U</sup> is a *finite* set of control inputs, W <sup>⊂</sup> <sup>R</sup><sup>n</sup> is a *bounded* set of disturbances, and <sup>f</sup> : <sup>X</sup> <sup>×</sup> <sup>U</sup> <sup>→</sup> <sup>X</sup> is the nominal dynamics. If x<sup>k</sup> <sup>∈</sup> <sup>X</sup> and <sup>u</sup><sup>k</sup> <sup>∈</sup> <sup>U</sup> are the state and control input of S at some time k <sup>∈</sup> <sup>N</sup>, then the state at the next time step is given by:

$$x^{k+1} = f(x^k, u^k) + w^k,\tag{5}$$

where <sup>w</sup><sup>k</sup> is the disturbance at time k which is sampled from W using some (possibly unknown) distribution. Without loss of generality we assume that W

is centered around the origin, which can be easily achieved by shifting f if needed. <sup>A</sup> *path* of S originating at x<sup>0</sup> <sup>∈</sup> <sup>X</sup> is an infinite sequence of states <sup>x</sup>0x<sup>1</sup> ... for a given infinite sequence of control inputs u0u<sup>1</sup> ..., such that (5) is satisfied.

Let ϕ be a given Rabin specification—called the *control objective*—defined using a finite set of predicates over X. For every controller C : X <sup>→</sup> U, the domain of C, written *Dom*(C), is the set of states from where the property ϕ can be satisfied with probability <sup>1</sup>. For a fixed ϕ, a controller C<sup>ˆ</sup> is called *optimal* if *Dom*(Cˆ) contains the domain of every other controller C. The problem of computing such an optimal controller for the system in (5) is in general undecidable. Following [15], we compute an approximate solution instead.

This approximate solution is obtained by a discretization of the state space. For this, we assume that the state space X is a closed and bounded subset of the n-dimensional Euclidean space <sup>R</sup><sup>n</sup> for some n > <sup>0</sup>, and use the notation [[a, b)) to denote the set i∈[1;n] [a<sup>i</sup>, b<sup>i</sup>). Now, consider a grid-based discretization <sup>X</sup> of <sup>X</sup>, where <sup>X</sup> <sup>=</sup> {[[a, b)) <sup>|</sup> a, b <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>=</sup> X}. One of the key ingredients of our abstraction process is a function f providing hyper-rectangular over-approximation of the one-step reachable set of the nominal dynamics f of the system S: for every grid element x <sup>∈</sup> X , we have f (x, u ) = [[a , b )) ⊇ {x <sup>∈</sup> X | ∃x <sup>∈</sup> x.x <sup>=</sup> f(x, u)}. The function f is known to be available for a wide class of commonly used forms of the function f, and in our implementation we assumed that f is mixed-monotone and f is the so-called decomposition function (see standard literature for details [7]).

Given the over-approximation of the nominal dynamics obtained through f , we define, respectively, the over- and the under-approximation of the *perturbed* dynamics as <sup>g</sup>(x, u ) := <sup>W</sup> <sup>⊕</sup> <sup>f</sup> (x, u ) and g(x, u ) := W (−f (x, u )), where ⊕ and respectively denote the Minkowski sum and the Minkowski difference. Next, we transfer <sup>g</sup> and <sup>g</sup> to the abstract state space <sup>X</sup> to obtain, respectively, the over- and the under-approximation in terms of the *abstract transition* function<sup>1</sup>, i.e., <sup>h</sup>(x, u ) := x <sup>∈</sup> X <sup>|</sup> g(x, u ) <sup>∩</sup> x <sup>=</sup> <sup>∅</sup> and <sup>h</sup>(x, u ) := <sup>x</sup> <sup>∈</sup> X <sup>|</sup> g(x, u ) <sup>∩</sup> x <sup>=</sup> <sup>∅</sup> . With h and h available, it was shown by Majumdar et al. [16] that the over-approximation of the optimal controller can be solved by using the fixpoint algorithm in (2), where the predecessor operators are defined for every S, T <sup>⊆</sup> X as

$$Cpre(S) := \left\{ \widehat{x} \in \widehat{X} \mid \exists u \in U \text{ }. \,\,\overline{h}(\widehat{x}, u) \subseteq S \right\} \tag{6a}$$

$$Apre(S, T) \coloneqq \left\{ \widehat{x} \in \widehat{X} \mid \exists u \in U \text{ } \overline{h}(\widehat{x}, u) \subseteq S \land \underline{h}(\widehat{x}, u) \cap T \neq \emptyset \right\}.\tag{6b}$$

#### 3 Implementation Details

We develop three interconnected tools, Genie, FairSyn, and Mascot-SDS, which work in close harmony to implement efficient solvers for the solution of (2) with

<sup>1</sup> Here we assume that f (x, u ) <sup>⊆</sup> <sup>X</sup>; otherwise we need to take some extra steps. Details can be found in the work by Majumdar et al. [16].

Fig. 1. A schematic diagram of interaction among the three tools. Each block represents one class in the respective tool, and an arrow from class A to class B denotes that B depends on A. The dependency within each tool is shown using solid arrows, while the dependencies of Mascot-SDS and FairSyn on Genie is shown using dashed arrows.

pre-operators defined via (3), (4) and (6), respectively. The tools use binary decision diagrams (BDD) to symbolically manipulate sets of vertices/states of the underlying system, and to manage the BDDs, we offer the flexibility to choose between two of the well-known existing BDD libraries, namely CUDD [20] and Sylvan [21]. The two libraries have their own merits: while CUDD has significantly lower memory footprint, Sylvan offers superior computation speed through multi-threaded BDD operations. Thus, the optimal choice of the library depends on the size of the problem, the computational time limit, and the memory budget, and through our implementation it is possible to choose one or the other by, in some cases, changing only a single line of code and, in the other cases, changing the value of just one flag. Moreover, we expect that integrating other BDD libraries having the same basic BDD operations in our tools will be easy and seamless—thanks to the programming abstraction offered by Genie. Such extensions will possibly bring more diverse set of computational strengths for solving the fundamental synthesis problems that we address.

The tools are primarily written using C++, with some small python scripts implementing parts of visualizations of outputs. The main classes of the three tools and their interactions are depicted in Fig. 1. We briefly describe the core functionalities of the tools in the following.

#### 3.1 Genie

Genie implements the fixpoint algorithm (2) in the class BaseFixpoint through two layers of abstraction. One abstraction is through the virtual definitions of the *Cpre* and *Apre* operators, whose concrete implementations are provided in the front-end synthesis tools (in our case FairSyn and Mascot-SDS). Using this abstraction, we implemented two different optimizations for the efficient iterative computation of the Rabin fixpoint in (2)—independently from the actual implementations of the *Apre* and *Cpre* operators. The first optimization is a multi-threaded computation of the Rabin fixpoint, exploiting the fixpoint's inherent parallel structure due to the independence among different sequences of (p1, p2,...) used to compute <sup>k</sup> <sup>j</sup>=0 Cp<sup>j</sup> . The second optimization is an accelerated computation of the Rabin fixpoint, achieved through bookkeeping of intermediate values of the BDD variables. The core of the acceleration procedure for general μ-calculus fixpoints was proposed by Long et al. [13], and the details specific to the fixpoint in (2) can be found in the paper by Banerjee et al. [4].

The other abstraction in Genie is the set of virtually defined low-level BDD operations in the auxiliary class BaseUBDD, which enable us to easily switch between different off-the-shelf BDD libraries. The virtual BDD operations in BaseUBDD are concretely realized in the classes CuddUBDD and SylvanUBDD, which work as interfaces between, respectively, the CUDD and the Sylvan BDD libraries. Support for additional BDD libraries can be easily built by creating new interface classes. More details on the functionalities of Genie can be found in the longer version of this paper [14].

#### 3.2 FairSyn

The core of FairSyn is written as a header-only library, which offers the infrastructure to solve (2) with pre-operators defined via (3) and (4). The main component of FairSyn is the class Fixpoint, which derives from the class BaseFixpoint from Genie, and implements the concrete definitions of *Cpre* and *Apre* in (3) and (4).

How to Use: For computing the winning region and the winning strategy in a fair-adversarial Rabin game (resp. a <sup>2</sup><sup>1</sup>/<sup>2</sup>-player Rabin game) using FairSyn, one needs to write a program to create the game as a Fixpoint object. One possible way of constructing a Fixpoint object is through a synchronous product of a game graph (an object of class Arena) and a specification Rabin automaton (an object of class RabinAutomaton) with an input alphabet of sets of nodes of the Arena object. Following is a snippet:

```
// typedef Genie::CuddUBDD UBDD; // use this for CUDD
typedef Genie::SylvanUBDD UBDD; // use this for Sylvan
UBDD base;
...
Arena<UBDD> A(base, vars, nodes, sys_nodes, env_nodes, edges,
    live_edges); // the game graph
RabinAutomaton<UBDD> R(base, vars, inp_alphabet, filename); // the
    specification automaton
Fixpoint<UBDD> Fp(base, "under", A, R); // the synchronous product
// UBDD strategy = Fp.Rabin(true, 20, Fp.nodes_, 0); // sequential
    fixpoint solver
UBDD strategy = Fp.Rabin(true, 20, Fp.nodes_, 0,
    Genie::ParallelRabinRecurse); // parallel fixpoint solver
...
```
where vars is a (possibly initially empty) set of integers which will contain the set of newly created BDD variables, nodes, sys\_nodes, and env\_nodes are, respectively, vectors of indices of various types of vertices, edges and live\_edges are, respectively, vectors of the respective types of edges, inp\_alphabet is a std::map object that maps input symbols of the Rabin automaton to the respective BDDs representing sets of nodes in the Arena, and filename is the name of the file in which the Rabin automaton is stored (using the standard HOA format [2]). The game is solved by calling Fp.Rabin, a member function of the Genie::BaseFixpoint class (see Sect. 3.1).

#### 3.3 Mascot-SDS

The core of Mascot-SDS is also written as a header-only library. It is built on top of the well-known tool called SCOTS [18], with several classes of Mascot-SDS still retaining their original identities from SCOTS, owing to the close similarity of the basic uniform grid-based abstraction used in both tools. The main difference between the two tools is that Mascot-SDS synthesizes controllers for *stochastic* systems, while SCOTS synthesizes controllers for only *non-stochastic* systems.

The two main classes of Mascot-SDS are called SymbolicSet and SymbolicModel, which respectively model the abstract spaces obtained through uniform grid-based discretizations (like X in Sect. 2.2) and the abstract transition relations (h and h in Sect. 2.2). The abstract transition relations are computed using an auxiliary class called SymbolicModelMonotonic (not shown in Fig. 1). Notice that we offer the flexibility to use both CUDD and Sylvan while creating objects from SymbolicSet and SymbolicModel. A Fixpoint object is a child of the class BaseFixpoint from Genie, which is created by taking a synchronous product between a SymbolicModel object and a RabinAutomaton object specifying the control objective given as user input. The class Fixpoint implements the concrete definitions of the *Cpre* and *Apre* operator according to (6).

How to Use: For ease of use, we have written a pair of tools called Synthesize and Simulate using the library of Mascot-SDS. Synthesize synthesizes controllers for stochastic dynamical systems whose nominal dynamics is mixedmonotone, and Simulate visualizes simulated closed-loop trajectories using the synthesized controller. The inputs to Synthesize include the dynamic model of the system and the control objective; the latter can be specified either in LTL or using a Rabin automaton. To use Synthesize, simply use the following syntax:

```
<path-to-Synthesize binary>/Synthesize <path-to-input-file>/<input.cfg>
    <sylvan/cudd flag>
```
where the <input.cfg> is an input configuration file containing all the inputs, and the <sylvan/cudd flag> is either 1 or 0 depending on whether the parallel version using Sylvan is to be run or the sequential version using CUDD.

Some of the main ingredients in the input.cfg file are: (a) the description of the dynamical system's variable spaces (like state space, input space, etc.) including their discretization parameters, (b) the file where the decomposition function of the nominal dynamics of the system is stored, (c) the absolute value of maximum disturbance, and (d) the specification either as an LTL formula or as the filename where a Rabin automaton is stored (in HOA format [2]). The decomposition function is required to be given as a C-compatible header file so that Synthesize can link to (use) this function at runtime (see the mascot-sds/examples/ directory for examples). When the specification is given as a Rabin automaton (over a labeling alphabet of the system states), the automaton needs to be stored in a file in the HOA format. Alternatively, an LTL specification can be given, along with a mapping between the atomic predicates and the states of the system. In that case Synthesize uses Owl [12] to convert the LTL specification to a Rabin automaton.

The output of Synthesize is a folder called data that contains pieces of the controller encoded in BDDs and stored in binary files as well as various metadata information stored in text files. These files can be processed by Simulate to visualize simulated closed-loop trajectories of the system. The usage of Simulate is similar to Synthesize:

```
<path-to-Simulate binary>/Simulate <path-to-input-file>/<input.cfg>
    <sylvan/cudd flag>
```
where the input.cfg file should, in this case, contain information that are required to simulate the closed-loop, like simulation time steps, the python script that will plot the state space predicates (see the examples), etc.

#### 4 Examples

We present experimental results, showcasing practical usability of our tools and comparing performances with the state of the art. All the experiments were run on a computer with Intel Xeon E7-8857 v2 48 core processor and 1.5 TB RAM.

#### 4.1 Synthesizing Code-Aware Resource Mangers Using FairSyn

We consider a case study introduced by Chatterjee et al. [6]. In this example, there are two bounded FIFO queues, namely the broadcast and output queues, which interact among each other and transmit and receive data packets through a common network. The two queues are implemented using separate threads running on a single CPU. For this multi-threaded program, we consider the problem of synthesizing a code-aware resource manager, whose task is to grant different threads accesses to different shared synchronization resources (mutexes and counting semaphores). The specification is deadlock freedom across all threads at all time while assuming a fair scheduler (scheduling every thread always eventually) and fair progress in every thread (i.e., taking every existing execution branch always eventually). The resource-manager is code-aware, and has knowledge about the require and release characteristics of all threads for different resources. This enables us to avoid deadlocks more effectively than the case when the resource-manager does not have access to the code. Chatterjee et al. [6] showed that the synthesis problem (of the resource manager) can be reduced to the problem of computing the winning strategy in a <sup>2</sup><sup>1</sup>/2-player game, which we solved using FairSyn.

Table 1 compares the computational resources for the CUDD and Sylvan-based implementations of FairSyn; more details can be found in our earlier work [4]. It can be observed that the Sylvan-based implementation is significantly faster, although it consumes much more memory.


Table 1. Performance of FairSyn; code-aware resource management benchmark.

#### 4.2 Synthesizing Controllers for Stochastic Dynamical Systems Using Mascot-SDS

We use Mascot-SDS to synthesize controllers for two different applications.

A Bistable Switch. First, we compare our tool's performance against the stateof-the-art tool called StochasticSynthesis (abbr. SS) [10] on a benchmark example that was proposed by the authors of SS. In this example, there is a 2-dimensional nonlinear bistable switch that is perturbed with bounded stochastic noise. There are two synthesis problems with two different control objectives: one, a safety objective, and, two, a Rabin objective with two Rabin pairs. The model of the system and the control objectives can be found in the original paper [10].

The tool SS uses graph theoretic techniques to solve the controller synthesis problem, which is an alternative approach that is substantially different from our symbolic fixpoint based technique. In Table 2, we summarize the performance of Mascot-SDS powered by CUDD and Sylvan, alongside the performance of SS. Both Table 2. Performance comparison between Mascot-SDS and StochasticSynthesis (abbreviated as SS) [10] on the bistable switch. Col. 1 shows the specifications and the respective numbers of Rabin pairs, Col. 2 shows the approximation error ranges (smaller error means more intense computation), Col. 3, 4, and 5 Col. 6, 7, and 8 compare the peak memory footprint (as measured using the "time" command) for Mascot-SDS with CUDD, Mascot-SDS with Sylvan, and SS respectively. "TO" stands for timeout (5 h of cutoff time).


Table 3. Performance of Mascot-SDS with CUDD and Sylvan for the table-serving robot experiment.


Fig. 2. Closed-loop trajectories for 100 time steps with *kitchen* (green), *table* (blue), and *obstacle* (black). (Color figure online)

Mascot-SDS and SS compute controllers whose domains under-approximate the optimal controller domains. The second column of Table 2 shows a measure of the approximation error. For every comparable approximation error bound, both versions of Mascot-SDS significantly outperformed SS, both time and memorywise. In fact, Mascot-SDS with Sylvan was at least an order of magnitude faster in all instances. This is particularly astonishing, since SS uses a sophisticated *lazy* abstraction refinement technique, whereas Mascot-SDS uses a plain *uniform* abstraction which is typically computationally expensive. This shows the immense potential of our toolchain; we plan to extend Mascot-SDS with lazy gridding, an orthogonal optimization, in a future release to make further computational savings. For Mascot-SDS itself, as expected, Sylvan was significantly faster than CUDD. On the other hand, though Sylvan used less memory than CUDD in the simpler setups (the ones with more error), the memory requirement of Sylvan quickly grew and surpassed that of CUDD for the more complicated setup.

Table-Serving Robot. We consider the controller synthesis problem for a table-serving robot that needs to satisfy the following specification: ♦*kitchen* ∧ ¬*obtsacle*∧(♦*request* ↔ ♦*table*), where *table*, *kitchen*, *obstacle*, and *request* are predicates over the state space. The robot itself is modeled as the discretetime abstraction of the standard 3-dimensional Dubins vehicle [15] with an additional (i.e., 4th) dimension that records if a *request*, which is controlled by the environment, is pending. In Table 3, we summarize the computational resources, and, in Fig. 2, we show a simulated closed-loop trajectory that was plotted using our tool Simulate. We observe that Sylvan was much faster, but CUDD consumed much less memory.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Automated Tail Bound Analysis for Probabilistic Recurrence Relations

Yican Sun<sup>1</sup>, Hongfei Fu2(B) , Krishnendu Chatterjee<sup>3</sup>, and Amir Kafshdar Goharshady<sup>4</sup>

<sup>1</sup> School of Computer Science, Peking University, Beijing, China

sycpku@pku.edu.cn <sup>2</sup> Department of Computer Science and Engineering, Shanghai Jiao Tong University,

Shanghai, China

fuhf@cs.sjtu.edu.cn <sup>3</sup> Institute of Science and Technology, Klosterneuburg, Austria

krishnendu.chatterjee@ist.ac.at <sup>4</sup> Department of Computer Science and Engineering, Hong Kong University of

Science and Technology, Hong Kong, Hong Kong SAR, China

goharshady@cse.ust.hk

Abstract. Probabilistic recurrence relations (PRRs) are a standard formalism for describing the runtime of a randomized algorithm. Given a PRR and a time limit κ, we consider the tail probability Pr[T <sup>≥</sup> κ], i.e., the probability that the randomized runtime T of the PRR exceeds κ. Our focus is the formal analysis of tail bounds that aims at finding a tight asymptotic upper bound u <sup>≥</sup> Pr[T <sup>≥</sup> κ]. To address this problem, the classical and most well-known approach is the cookbook method by Karp (JACM 1994), while other approaches are mostly limited to deriving tail bounds of specific PRRs via involved custom analysis.

In this work, we propose a novel approach for deriving the common exponentially-decreasing tail bounds for PRRs whose preprocessing time and random passed sizes observe discrete or (piecewise) uniform distribution and whose recursive call is either a single procedure call or a divide-and-conquer. We first establish a theoretical approach via Markov's inequality, and then instantiate the theoretical approach with a template-based algorithmic approach via a refined treatment of exponentiation. Experimental evaluation shows that our algorithmic approach is capable of deriving tail bounds that are (i) asymptotically tighter than Karp's method, (ii) match the best-known manually-derived asymptotic tail bound for QuickSelect, and (iii) is only slightly worse (with a log log n factor) than the manually-proven optimal asymptotic tail bound for QuickSort. Moreover, our algorithmic approach handles all examples (including realistic PRRs such as QuickSort, QuickSelect, DiameterComputation, etc.) in less than 0.1 s, showing that our approach is efficient in practice.

Due to different academic norms, authors in Mainland China are ordered by contribution, whereas authors in Austria and Hong Kong SAR are ordered alphabetically. The code and benchmarks are available at https://github.com/boyvolcano/PRR.

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 16–39, 2023. https://doi.org/10.1007/978-3-031-37709-9\_2

#### 1 Introduction

Probabilistic program verification is a fundamental area in formal verification [3]. It extends the classical (non-probabilistic) program verification by considering randomized computation in a program and hence can be applied to the formal analysis of probabilistic computations such as probabilistic models [14], randomized algorithms [2,9,28,30], etc. In this line of research, verifying the time complexity of probabilistic recurrence relations (PRRs) is an important subject [9,30]. PRRs are a simplified form of recursive probabilistic programs and extend recurrence relations by incorporating randomization such as randomized preprocessing and divide-and-conquer. They are widely used in analyzing the time complexity of randomized algorithms (e.g., QuickSort [16], QuickSelect [17], and DiameterComputation [26, Chapter 9]). Compared with probabilistic programs, PRRs abstract away detailed computational aspects, such as problemspecific divide-and-conquer and data-structure manipulations, and include only key information on the runtime of the underlying randomized algorithm. Hence, PRRs provide a clean model for time-complexity analysis of randomized algorithms and randomized computations in a general sense.

In this work, we focus on the formal analysis of PRRs and consider the fundamental problem of tail bound analysis that aims at bounding the probability that a given PRR does not terminate within a prescribed time limit. In the literature, prominent works on tail bound analysis include the following. First, Karp proposed a classic "cookbook" formula [21] similar to Master Theorem. This method is further improved, extended, and mechanized by followup works [5,13,30]. While Karp's method has a clean form and is easy to use and automate, the bounds from the method are known to be not tight (see e.g. [15,25]). Second, the works [25] and resp. [15] performed ad-hoc custom analysis to derive asymptotically tight tail bounds for the PRRs of QuickSort and resp. QuickSelect, respectively. These methods require manual effort and do not have the generality to handle a wide class of PRRs.

From the literature, an algorithmic approach capable of deriving tight tail bounds over a wide class of PRRs is a major unresolved problem. Motivated by this challenge, we have the following contributions to this work:


– Experiments show that our algorithmic approach derives asymptotically tighter tail bounds when compared with Karp's method. Furthermore, the tail bounds derived from our approach match the best-known bound for QuickSelect [15], and are only slightly worse by a log log <sup>n</sup> factor against the optimal manually-derived bound for QuickSort [25]. Moreover, our algorithm synthesizes each of these tail bounds in less than 0.1 s and is efficient in practice.

A limitation of our approach is that we do not consider the transformation from a realistic implementation of a randomized algorithm into its PRR representation. However, such a transformation would require examining a diversified number of randomization patterns (e.g., randomized divide-and-conquer) in randomized algorithms and thus is an orthogonal direction. In this work, we focus on the tail bound analysis and present a novel approach to address this problem. Due to space limitations, we relegate some details in the extended version [29].

## 2 Preliminaries

Below we present necessary background in probability theory and the tail bound analysis problem we consider.

<sup>A</sup> *probability space* is a triple (Ω, <sup>F</sup>,Pr) such that Ω is a non-empty set termed as the *sample space*, <sup>F</sup> is a <sup>σ</sup>*-algebra* over Ω (i.e., a collection of subsets of Ω that contains the empty set ∅ and is closed under complement and countable union), and Pr(·) is a *probability measure* on <sup>F</sup> (i.e., a function F → [0, 1] such that Pr(Ω) = 1 and for every pairwise disjoint set-sequence <sup>A</sup>1, A2,... in <sup>F</sup>, we have that - i≥<sup>1</sup> Pr(Ai) = Pr i≥<sup>1</sup> <sup>A</sup><sup>i</sup> .

<sup>A</sup> *random variable* <sup>X</sup> from a probability space (Ω, <sup>F</sup>,Pr) is an <sup>F</sup>-measurable function <sup>X</sup> : Ω <sup>→</sup> <sup>R</sup>, i.e., for every <sup>d</sup> <sup>∈</sup> <sup>R</sup>, we have that {<sup>ω</sup> <sup>∈</sup> Ω <sup>|</sup> <sup>X</sup>(ω) < d}∈F. We denote <sup>E</sup>[X] as its expected value; formally, we have <sup>E</sup>[X] := <sup>X</sup> dPr. A *discrete probability distribution* (DPD) over a countable set U is a function <sup>η</sup> : <sup>U</sup> <sup>→</sup> [0, 1], such that - u∈U <sup>η</sup>(u)=1. The *support* of the DPD is defined as supp(η) := {<sup>u</sup> <sup>∈</sup> <sup>U</sup> <sup>|</sup> <sup>η</sup>(u) <sup>&</sup>gt; 0}. We abbreviate finite-support DPD as FSDPD.

<sup>A</sup> *filtration* of probability space (Ω, <sup>F</sup>,Pr) is an infinite sequence of {Fn}n≥<sup>0</sup> of <sup>σ</sup>-algebra over <sup>Ω</sup> such that <sup>F</sup>n ⊆ Fn+1 ⊆ F for every <sup>n</sup> <sup>≥</sup> <sup>0</sup>. Intuitively, it models the information at the n-th step. A *discrete-time stochastic process* is an infinite sequence Γ = {Xn}n≥<sup>0</sup> of random variables from the probability space (Ω, <sup>F</sup>,Pr). The process <sup>Γ</sup> is *adapted* to a filtration {Fn}n≥<sup>0</sup> if for all <sup>n</sup> <sup>≥</sup> <sup>0</sup>, <sup>X</sup>n is <sup>F</sup>n-measurable. Given a filtration {Fn}n≥<sup>0</sup>, a *stopping time* is a random variable <sup>τ</sup> : <sup>Ω</sup> <sup>→</sup> <sup>N</sup>, such that for every <sup>n</sup> <sup>≥</sup> <sup>0</sup>, {<sup>ω</sup> <sup>∈</sup> <sup>Ω</sup> <sup>|</sup> <sup>τ</sup> (ω) <sup>≤</sup> <sup>n</sup>}∈Fn.

A discrete-time stochastic process <sup>Γ</sup> <sup>=</sup> {Xn}n∈<sup>N</sup> adapted to a filtration {Fn}n∈<sup>N</sup> is a *martingale* (resp. *supermartingale*) if for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>E</sup>[|Xn|] <sup>&</sup>lt; <sup>∞</sup> and it holds a.s. that <sup>E</sup>[Xn+1 | Fn] = <sup>X</sup>n (resp. <sup>E</sup>[Xn+1 | Fn] <sup>≤</sup> <sup>X</sup>n). Intuitively, a martingale (resp. supermartingale) is a discrete-time stochastic process in which for an observer who has seen the values of <sup>X</sup>0,...,Xn, the expected value at the next step, i.e. <sup>E</sup>[Xn+1 | Fn], is equal to (resp. no more than) the last observed value <sup>X</sup>n. Also, note that in a martingale, the observed values for <sup>X</sup>0,...,Xn−<sup>1</sup> do not matter given that <sup>E</sup>[Xn+1 | Fn] = <sup>X</sup>n. In contrast, in a supermartingale, the only requirement is that <sup>E</sup>[Xn+1 | Fn] <sup>≤</sup> <sup>X</sup>n and hence <sup>E</sup>[Xn+1 | Fn] may depend on <sup>X</sup>0,...,Xn−1. Also, note that <sup>F</sup>n might contain more information than just the observations of <sup>X</sup>i's.

*Example 1.* Consider the classical gambler's ruin: a gambler starts with Y<sup>0</sup> dollars of money and bets continuously until he loses all of his money. If the bets are unfair, i.e. the expected value of his money after a bet is less than its expected value before the bet, then the sequence {Yn}n∈N<sup>0</sup> is a supermartingale. In this case, <sup>Y</sup>n is the gambler's total money after <sup>n</sup> bets. On the other hand, if the bets are fair, then {Yn}n∈N<sup>0</sup> is a martingale. 

We refer to standard textbooks (such as [6,34]) for a detailed treatment of all the concepts illustrated above.

#### 2.1 Probabilistic Recurrence Relations

In this work, we focus on probabilistic recurrence relations (PRRs) that describe the runtime behaviour of a single recursive procedure. Instead of having a direct syntax for a PRR, we propose a mini programming language *LRec* that captures a wide class of PRRs that have common probability distributions such as (piecewise) uniform distributions and discrete probability distributions, and whose recursive call consists of either a procedure call or two procedure calls in a divide-and-conquer style. We present the grammar of *LRec* in Fig. 1.

Fig. 1. The Grammar of *LRec*

In the grammar, we have two positive-integer valued variables n, v which stand for the input size and the sampled value in the randomization of the passed size to the recursive calls of a procedure, respectively. We use b > <sup>0</sup>, c, cp to denote integer constants, and use p to denote the name of the single procedure in the PRR. We consider arithmetic expressions expr as polynomials over v, v−<sup>1</sup>, ln <sup>v</sup> and n, n−<sup>1</sup>, ln <sup>n</sup> (which we call *pseudo-polynomials* in this work) and common probability distributions, including (i) the uniform distribution uniform(n) over {0, 1,...,n−1}, (ii) the piecewise uniform distribution muniform(n) that returns max{i, n−i−1} where <sup>i</sup> observes the uniform distribution uniform(n), and (iii) any FSDPD (indicated by discrete) whose probabilities and values are constants and pseudo-polynomials, respectively. We also support other piecewise uniform distribution, e.g., the distribution that each <sup>v</sup> ∈ {0, . . . , n/2} has probability <sup>2</sup> 3n and each <sup>v</sup> ∈ {n/2+1,...,n <sup>−</sup> 1} has probability <sup>4</sup> 3n .

The nonterminal proc generates the PRR in the form def <sup>p</sup>(n; <sup>c</sup>p) = {comm}, for which <sup>c</sup>p is an integer constant as the threshold of recursion, meaning that the procedure halts immediately when n<cp, and comm is the function body of the procedure. The nonterminal comm generates all statements with one of the two forms as follows.


We restrict the recursive calls to be either a single recursive call <sup>p</sup>(v) or <sup>p</sup>(size <sup>−</sup> <sup>v</sup>), or a divide-and-conquer composed of two consecutive recursive calls <sup>p</sup>(v) and <sup>p</sup>(size <sup>−</sup> <sup>v</sup>), for which we consider a general setting that the relevant overall size size is in the form of the input size n divided by some positive integer <sup>b</sup> with possibly an offset <sup>c</sup>. Choosing <sup>b</sup> = 1, c = <sup>−</sup>1 means the normal situation that the overall size is <sup>n</sup> <sup>−</sup> 1, i.e., removing one element from the original input.

Given a PRR <sup>p</sup>, we use func(p) to represent its function body.

We always assume that the given PRR is *well-formed*, i.e., every <sup>c</sup>i in a probabilistic choice is within [0, 1] and every random passed size (e.g. v,size−v) falls in [0, n]. Below, we present two examples for PRRs.

*Example 2 (QuickSelect).* Consider the problem of finding the d-th smallest element in an unordered array of n distinct elements. A classical randomized algorithm for solving this problem is QuickSelect [17] with <sup>O</sup>(n) expected running time. We model the algorithm as the following PRR:

$$\mathtt{cfdef } p(n;2) = \{ \mathtt{samp1e } v \leftarrow \mathtt{mnilform}(n) \text{ in } \{ \mathtt{pre}(n); \ \mathtt{invoke } p(v); \} \}$$

Here, we use <sup>p</sup>(n; 2) to represent the number of comparisons performed by Quick-Select over an input of size n, and v is the variable that captures the size of the remaining array that has to be searched recursively. It observes as the value max{i, n<sup>−</sup> 1<sup>−</sup> <sup>i</sup>} where the value of <sup>i</sup> is sampled uniformly from {0,...,n−1}, we use muniform(n) to represent this distribution. 

*Example 3 (QuickSort).* Consider the classical problem of sorting an array of n distinct elements. A well-known randomized algorithm for solving this problem is QuickSort [16]. We model the algorithm as the following PRR.

$$\textbf{\texttt{def}}\ p(n;2) = \{\texttt{\texttt{samp1e}}\ v \leftarrow \texttt{uniform}(n) \; \texttt{in}\ \{\texttt{pre}(n);\ \texttt{invoke}\ p(v); p(n-1-v); \}\}$$

Here, <sup>v</sup> and <sup>n</sup> <sup>−</sup> 1 <sup>−</sup> <sup>v</sup> capture the sizes of the two sub-arrays. 

Below we present the semantics of a PRR in a nutshell. Consider a PRR generated by *LRec* with the procedure name <sup>p</sup>, a *configuration* <sup>σ</sup> is a pair <sup>σ</sup> = (*comm*, <sup>n</sup>) where *comm* represents the current statement to be executed and <sup>n</sup> <sup>≥</sup> <sup>c</sup>p is the current value for the variable <sup>n</sup>. A *PRR state* <sup>μ</sup> is a triple σ, C, **<sup>K</sup>** for which:


We use emp to denote an empty stack, and say that a PRR state σ, C, **K** is *final* if **<sup>K</sup>** = emp and <sup>σ</sup> = halt. Note that in a final PRR state halt, C, emp, the value C represents the total execution runtime of the PRR. The semantics of the PRR is defined as a discrete-time Markov chain whose state space is the set of all PRR states and whose transition function **<sup>P</sup>**, where **<sup>P</sup>**(μ, μ ) is the probability that the next PRR state is <sup>μ</sup> given the current PRR state is <sup>μ</sup> = ((*comm*, <sup>n</sup>), C, **<sup>K</sup>**). The probability is determined by the following cases.


With an initial PRR state ((func(p), n<sup>∗</sup>), <sup>0</sup>, emp) where <sup>n</sup><sup>∗</sup> <sup>≥</sup> <sup>c</sup>p is the input size, the Markov chain induces a probability space where the sample space is the set of all infinite sequences of PRR states, the σ-algebra is generated by all *cylinder sets* over infinite sequences of PRR states, and the probability measure is uniquely determined by the transition function **P**. We refer to [3] for details. We use Prn<sup>∗</sup> for the probability measure where <sup>n</sup><sup>∗</sup> <sup>≥</sup> <sup>c</sup>p is the input size.

We further define the random variable τ such that for any infinite sequence of PRR states <sup>ρ</sup> <sup>=</sup> <sup>μ</sup>0, μ1,...,μt,... with each <sup>μ</sup>t = ((*comm*t, <sup>n</sup>t), Ct, **<sup>K</sup>**t), <sup>τ</sup> (ρ) equals the first moment that the sequence reaches a final PRR state, i.e., <sup>τ</sup> (ρ) = inf{<sup>t</sup> <sup>|</sup> the PRR state <sup>μ</sup>t is final}, for which inf <sup>∅</sup> <sup>=</sup> <sup>∞</sup>. We will always ensure that <sup>τ</sup> is almost-surely finite, i.e., Prn<sup>∗</sup> (τ < <sup>∞</sup>)=1). Note that the random cumulative processing time <sup>C</sup>τ in the PRR state <sup>μ</sup>τ <sup>∈</sup> <sup>ρ</sup> is the total execution time of the given PRR.

We formulate the tail bound analysis over PRRs as follows. Given a time limit <sup>α</sup> · <sup>κ</sup>(n<sup>∗</sup>) symbolic in the initial input <sup>n</sup><sup>∗</sup> and the coefficient <sup>α</sup>, the goal of tail bound analysis is to infer an upper bound <sup>u</sup>(α, n<sup>∗</sup>) symbolic in <sup>n</sup><sup>∗</sup> and <sup>α</sup> such that for every input size n<sup>∗</sup> and plausible value for α, we have that

$$\Pr\_{n^\*}\left[C\_\tau \ge \alpha \cdot \kappa(n^\*)\right] \le u(\alpha, n^\*). \tag{1}$$

As tails bounds are often evaluated asymptotically, we focus on deriving tight <sup>u</sup>(α, n∗) when α, n<sup>∗</sup> are sufficiently large. To compare the magnitude of two tail bounds, we follow the straightforward way that first treats α as a fixed constant and compares the bounds over n∗, and then if the magnitude over n<sup>∗</sup> is identical, we take a further comparison over the magnitude on the coefficient α.

*Example 4 (Our result on QuickSelect).* Continue with Example 2, suppose the user is interested in the tail bound Pr[Cτ <sup>≥</sup> <sup>α</sup> · <sup>n</sup>∗], where <sup>C</sup>τ is the running time of the QuickSelect algorithm over an array with length n∗. Then, Karp's method produces the symbolic tail bound as follows.

$$\Pr[C\_{\tau} \ge \alpha \cdot n^\*] \le \exp(1.15 - 0.28 \cdot \alpha)$$

However, our method can produce the following tail bound.

$$\Pr[C\_{\tau} \ge \alpha \cdot n^\*] \le \exp(2 \cdot \alpha - \alpha \cdot \ln \alpha)$$

Note that our method produces tail bounds with a better magnitude on α. 

*Example 5 (Our result on QuickSort).* Continue with Example 3, consider the tail bound Pr[Cτ <sup>≥</sup> <sup>α</sup>·n<sup>∗</sup> ·ln <sup>n</sup><sup>∗</sup>], where <sup>C</sup>τ is the running time of QuickSort over a length-n<sup>∗</sup> array. Then, Karp's method produces the symbolic tail bound as:

$$\Pr[C\_{\tau} \ge \alpha \cdot n^\* \cdot \ln n^\*] \le \exp(0.5 - 0.5 \cdot \alpha),$$

while our method can produce the bound as:

$$\Pr[C\_{\tau} \ge \alpha \cdot n^\* \cdot \ln n^\*] \le \exp((4 - \alpha) \cdot \ln n^\*)$$

Note that our method produces tail bounds with a better magnitude on n∗. 

## 3 Exponential Tail Bounds via Markov's Inequality

In this section, we demonstrate our theoretical approach for deriving exponentially decreasing tail bounds based on Markov's inequality.

Before illustrating our approach, we first translate a PRR in the language *LRec* with the single procedure p into the canonical form as follows.

$$p(n; c\_p) = \mathsf{pre}(S(n)); \text{invoke } p(\text{size}\_1(n)); \dots; p(\text{size}\_r(n)) \tag{2}$$

where (i) *<sup>S</sup>*(n) is a random variable related to the input size <sup>n</sup> that represents the randomized pre-processing time and observes a probability distribution resulting from a discrete probability choice of piecewise uniform distributions, and (ii) invoke <sup>p</sup>(size<sup>1</sup>(n)); ... ; <sup>p</sup>(sizer(n)) is a statement that is either a single recursive call <sup>p</sup>(size<sup>1</sup>(n)) or a divide-and-conquer <sup>p</sup>(size<sup>1</sup>(n)); <sup>p</sup>(size<sup>2</sup>(n)) upon the resolution of the randomization. For the latter, we use a random variable r (which is either 1 or 2) to represent the number of recursive calls.

The translation can be implemented by a straightforward recursive procedure Tf(n, P rog) that takes on input a positive integer <sup>n</sup> (as the input size) and a statement P rog (generated by the nonterminal comm) to be processed, Note that the procedure Tf(n, P rog) outputs the *joint* distribution of the random value *<sup>S</sup>*(n) and the recursive call <sup>p</sup>(size1(n)); ... ; <sup>p</sup>(sizer(n)) with randomized input size. These random variables may be dependent.

Our theoretical approach then works directly on the canonical form (2). It consists of two major steps to derive an exponentially-decreasing tail bound. In the first step, we apply Markov's inequality and reduce the tail bound analysis problem to the over-approximation of the moment generating function <sup>E</sup>[exp(<sup>t</sup> · <sup>C</sup>τ )] where <sup>C</sup>τ is the cumulative pre-processing time defined previously and t > <sup>0</sup> is a scaling factor that aids the derivation of the tail bound. In the second step, we apply Optional Stopping Theorem (a classical theorem in martingale theory) to over-approximate the expected value <sup>E</sup>[exp(t·Cτ )]. Below we fix an PRR with procedure <sup>p</sup> in the canonical form (2), and a time limit <sup>α</sup> · <sup>κ</sup>(n<sup>∗</sup>).

Our first step applies Markov's inequality. Our approach relies on the wellknown exponential form of Markov's inequality below.

Theorem 1. *For every random variable* <sup>X</sup> *and any scaling factor* t > 0*, we have that* Pr[<sup>X</sup> <sup>≥</sup> <sup>d</sup>] <sup>≤</sup> <sup>E</sup>[exp(<sup>t</sup> · <sup>X</sup>)]/ exp(<sup>t</sup> · <sup>d</sup>)*.*

The detailed application of Markov's inequality to tail bound analysis requires to choose a scaling factor <sup>t</sup> := <sup>t</sup>(α, n) symbolic in <sup>α</sup> and <sup>n</sup>. After choosing the scaling factor, Markov's inequality gives the following tail bound:

$$\Pr[C\_{\tau} \ge \alpha \cdot \kappa(n^\*)] \le \mathbb{E}[\exp(t(\alpha, n^\*) \cdot C\_{\tau})] / \exp(t(\alpha, n^\*) \cdot \alpha \cdot \kappa(n^\*)).\tag{3}$$

The role of the scaling factor <sup>t</sup>(α, n<sup>∗</sup>) is to scale the exponent in the term exp(κ(α, n<sup>∗</sup>)), and this is in many cases necessary as a tail bound may not be exponentially decreasing directly in the time limit <sup>α</sup> · <sup>κ</sup>(n<sup>∗</sup>).

An unsolved part in the tail bound above is the estimation of the expected value <sup>E</sup>[exp(t(α, n<sup>∗</sup>)·Cτ )]. Our second step over-approximates the expected value <sup>E</sup>[exp(t(α, n<sup>∗</sup>)· <sup>C</sup>τ )]. To achieve this goal, we impose a constraint on the scaling factor <sup>t</sup>(α, n) and an extra function <sup>f</sup>(α, n) and show that once the constraint is fulfilled, then one can derive an upper bound for <sup>E</sup>[exp(t(α, n<sup>∗</sup>) · <sup>C</sup>τ )] from <sup>t</sup>(α, n) and <sup>f</sup>(α, n). The theorem is proved via Optional Stopping Theorem. The theorem requires the almost-sure termination of the given PRR, a natural prerequisite of exponential tail bound. In this work, we consider PRRs with finite termination time that implies the almost-sure termination.

Theorem 2. *Suppose we have functions* t, f : [0,∞) <sup>×</sup> <sup>N</sup> <sup>→</sup> [0,∞) *such that*

$$\mathbb{E}[\exp(t(\alpha, n) \cdot \mathbb{E}\mathbf{x}(n \mid f))] \le \exp(t(\alpha, n) \cdot f(\alpha, n))\tag{4}$$

*for all sufficiently large* α, n<sup>∗</sup> <sup>&</sup>gt; <sup>0</sup> *and all* <sup>c</sup>p <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>n</sup>∗*, where*

$$\mathsf{Ex}(n \mid f) := S(n) + \sum\_{i=1}^{r} f(\alpha, \text{size}\_i(n)).$$

*Then for* <sup>t</sup>∗(α, n∗) := minc*p*≤n≤n<sup>∗</sup> <sup>t</sup>(α, n)*, we have that*

$$\mathbb{E}[\exp(t\_\*(\alpha, n^\*) \cdot C\_\tau)] \le \mathbb{E}[\exp(t\_\*(\alpha, n^\*) \cdot f(\alpha, n^\*))].$$

*Thus, we obtain the upper bound* <sup>u</sup>(α, n∗) := exp(t∗(α, n∗)·(f(α, n∗)−α·κ(n∗))) *for the tail bound in (1).*

*Proof Sketch.* We fix a procedure p, and some sufficiently large α and n∗. In general, we apply the martingale theory to prove this theorem. To construct a martingale, we need to make two preparations.

First, by the convexity of exp(·), substituting <sup>t</sup>(α, n) with <sup>t</sup><sup>∗</sup>(α, n<sup>∗</sup>) in (4) does not affect the validity of (4).

Second, given an infinite sequence of the PRR states <sup>ρ</sup> = <sup>μ</sup>0, μ1,... in the sample space, we consider the subsequence <sup>ρ</sup> = <sup>μ</sup> 0, μ <sup>1</sup>,... as follows, where we represent μ i as ((func(p), <sup>n</sup>ˆ i), C i, **<sup>K</sup>** i). It only contains states that are either final or at the entry of <sup>p</sup>, i.e., comm = func(p). We define <sup>τ</sup> := inf{<sup>t</sup> : <sup>μ</sup> t is final}, then it is straightforward that C τ <sup>=</sup> <sup>C</sup><sup>τ</sup> . We observe that <sup>μ</sup> i+1 represents the recursive calls of μ i. Thus, we can characterize the conditional distribution <sup>μ</sup> i+1 <sup>|</sup> <sup>μ</sup><sup>i</sup> by the transformation function Tf(ˆn, func(p)) as follows.


Now we construct the super-martingale as follows. For each <sup>i</sup> <sup>≥</sup> 0, we denote the stack as **K** i for <sup>μ</sup> i as (func(p),si,<sup>1</sup>)···(func(p),si,q*<sup>i</sup>* ), where <sup>q</sup><sup>i</sup> is the stack size. We prove that another process y0, y1,... that forms a super-martingale, where <sup>y</sup>i := exp <sup>t</sup><sup>∗</sup>(α, n<sup>∗</sup>) · C i <sup>+</sup> <sup>f</sup>(α, <sup>n</sup>ˆ i) + q*i* j=1 <sup>f</sup>(α,si,j ) . Note that <sup>y</sup><sup>0</sup> <sup>=</sup> exp(t<sup>∗</sup>(α, n<sup>∗</sup>) · <sup>f</sup>(α, n<sup>∗</sup>)), and <sup>y</sup>τ = exp(t<sup>∗</sup>(α, n<sup>∗</sup>) · <sup>C</sup> τ ) = exp (t<sup>∗</sup>(α, n<sup>∗</sup>) · <sup>C</sup><sup>τ</sup> ). Thus we informally have that <sup>E</sup>[exp (t<sup>∗</sup>(α, n<sup>∗</sup>) · <sup>C</sup>τ )] = <sup>E</sup>[yτ ] <sup>≤</sup> <sup>E</sup>[y<sup>0</sup>] = exp (t<sup>∗</sup>(α, n<sup>∗</sup>) · <sup>f</sup>(α, n<sup>∗</sup>)) and the theorem follows. 

It is natural to ask whether our theoretical approach can always find an exponential-decreasing tail bound over PRRs. We answer this question by showing that under a difference boundedness and a monotone condition, the answer is yes. We first present the difference boundedness condition (A1) and the monotone condition (A2) for a PRR Δ in the canonical form (2) as follows.

(A1) Δ is *difference-bounded* if there exist two real constants M ≤ M, such that for every <sup>n</sup> <sup>≥</sup> <sup>c</sup>p, and every possible value (V,s1,...,sk) in the support of the probability distribution Tf(n, func(p)), we have that

$$M' \cdot \mathbb{E}[S(n)] \le V + \left(\sum\_{i=1}^k \mathbb{E}[p(s\_i)]\right) - \mathbb{E}[p(n)] \le M \cdot \mathbb{E}[S(n)].$$

(A2) <sup>Δ</sup> is *expected non-decreasing* if <sup>E</sup>[*S*(n)] does not decrease as <sup>n</sup> increases.

In other words, (A1) says that for any possible concrete pre-processing time V and passed sizes <sup>s</sup>1,...,sk, the difference between the expected runtime before and after the recursive call is bounded by the magnitude of the expected preprocessing time. (A2) simply specifies that the expected pre-processing time be monotonically non-decreasing.

With the conditions (A1) and (A2), our theoretical approach guarantees a tail bound that is exponentially decreasing in the coefficient α and the ratio <sup>E</sup>[p(n∗)]/E[*S*(n∗)]. The theorem statement is as follows.

Theorem 3. *Let* Δ *be a PRR in the canonical form (2). If* Δ *satisfies (A1) and (A2), then for any function* <sup>w</sup> : [1,∞) <sup>→</sup> (1,∞)*, the functions* f, t *given by*

$$\begin{aligned} f(\alpha, n) := w(\alpha) \cdot \mathbb{E}[p(n)] \text{ and } \quad &t(\alpha, n) := \frac{\lambda(\alpha)}{\mathbb{E}[S(n)]},\\ &with \quad \quad \quad \lambda(\alpha) := \frac{8(w(\alpha) - 1)}{w(\alpha)^2 (M\_2 - M\_1)^2} \end{aligned}$$

*fulfill the constraint (4) in Theorem 2. Furthermore, by choosing* <sup>w</sup>(α) := <sup>2</sup><sup>α</sup> 1+α *in the functions* f, t *above and* <sup>κ</sup>(α, n<sup>∗</sup>) := <sup>α</sup> · <sup>E</sup>[p(n<sup>∗</sup>)]*, one obtains the tail bound*

$$\Pr[C\_{\tau} \ge \alpha \mathbb{E}[p(n^\*)]] \le \exp\left(-\frac{2(\alpha - 1)^2}{\alpha (M\_2 - M\_1)^2} \cdot \frac{\mathbb{E}[p(n^\*)]}{\mathbb{E}[S(n^\*)]}\right).$$

*Proof Sketch.* We first rephrase the constraint (4) as

$$\mathbb{E}\left[\exp\left(t(\alpha,n)\cdot(S(n)+\sum\_{i=1}^r f(\alpha,\text{size}\_i(n))-f(\alpha,n))\right)\right] \le 1$$

Then we focus on the exponent in the exp(·), by (A1), the exponent is a bounded random variable. By further calculating its expectation and applying Hoeffiding's Lemma [18], we obtain the theorem above. 

Note that since <sup>E</sup>[p(n)] <sup>≥</sup> <sup>E</sup>[*S*(n)] when <sup>n</sup> <sup>≥</sup> <sup>c</sup>p, the tail bound is at least exponentially-decreasing with respect to the coefficient α. This implies that our theoretical approach derives tail bounds that are at least as tight as Karp's method when (A1) and (A2) holds. When <sup>E</sup>[p(n)] is of a strictly greater magnitude than <sup>E</sup>[*S*(n)], our approach derives asymptotically tighter bounds.

Below, we apply the theorem above to prove tail bounds for Quickselect (Example 2) and Quicksort (Example 3).

*Example 6.* For QuickSelect, its canonical form is <sup>p</sup>(n; 2) = <sup>n</sup>+p(size<sup>1</sup>(n)), where size<sup>1</sup>(n) observes as muniform(n). Solving the recurrence relation, we obtain that <sup>E</sup>[p(n)] = 4 · <sup>n</sup>. We further find that this PRR satisfies (A1) with two constants <sup>M</sup> = <sup>−</sup>1, M = 1. Note that the PRR satisfies (A2) obviously. Hence, we apply Theorem 3 and derive the tail bound for every sufficiently large α:

$$\Pr[C\_{\tau} \ge 4 \cdot \alpha \cdot n^\*] \le \exp\left(-\frac{2(\alpha - 1)^2}{\alpha}\right).$$

On the other hand, Karp's cookbook has the tail bound

$$\Pr[C\_{\tau} \ge 4 \cdot \alpha \cdot n^\*] \le \exp\left(1.15 - 1.12 \cdot \alpha\right).$$

Our bound is asymptotically the same as Karp's but has a better coefficient. 

*Example 7.* For QuickSort, its canonical form is <sup>p</sup>(n; 2) = <sup>n</sup> + <sup>p</sup>(size1(n)) + <sup>p</sup>(size2(n)), where size1(n) observes as muniform(n) and size2(n)=n−1−size1(n). Similar to the example above, we first calculate <sup>E</sup>[p(n)] = 2 · <sup>n</sup> · ln <sup>n</sup>. Note that this PRR also satisfies two assumptions above with two constants <sup>M</sup> <sup>=</sup> <sup>−</sup>2 log 2, M = 1. Hence, for every sufficiently large <sup>α</sup>, we can derive the tail bound as follows:

$$\Pr[C\_{\tau} \ge 2 \cdot \alpha \cdot n^\* \cdot \ln n^\*] \le \exp\left(-\frac{0.7(\alpha - 1)^2}{\alpha} \cdot \ln n^\*\right) \dots$$

On the other hand, Karp's cookbook has the tail bound

$$\Pr[C\_{\tau} \ge 2 \cdot \alpha \cdot n^\* \cdot \ln n^\*] \le \exp\left(-\alpha + 0.5\right).$$

Note that our tail bound is tighter than Karp's with a ln <sup>n</sup> factor. 

From the generality of Markov's inequality, our theoretical approach can handle to general PRRs with three or more sub-procedure calls. However, the tail bounds derived from Theorem 3 is still not tight since the theorem only uses the expectation and bound of the given distribution. For example, for QuickSelect, the tightest known bound exp(−Θ(<sup>α</sup> · ln <sup>α</sup>)) [15], is tighter than that derived from Theorem 3. Below, we present an algorithmic approach that fully utilizes the distribution information and derives tight tail bounds that can match [15].

#### 4 An Algorithmic Approach

In this section, we demonstrate an algorithmic implementation for our theoretical approach (Theorem 2). Our algorithm synthesizes the functions t, f through template and a refined estimation on the exponential terms from the inequality (4). The estimation is via integration and the monotonicity of the template. Below we fix a PRR <sup>p</sup>(n; <sup>c</sup>p) in the canonical form (2) and a time limit <sup>α</sup>·κ(n<sup>∗</sup>).

Recall that to apply Theorem 2, one needs to find functions t, f that satisfy the constraint (4). Thus, the first step of our algorithm is to have pseudomonomial template for <sup>f</sup>(α, n) and <sup>t</sup>(α, n) in the following form:

$$f(\alpha, n) := c\_f \cdot \alpha^{p\_f} \cdot \ln^{q\_f} \alpha \cdot n^{u\_f} \cdot \ln^{v\_f} n \tag{5}$$

$$t(\alpha, n) := c\_t \cdot \alpha^{p\_t} \cdot \ln^{q\_t} \alpha \cdot n^{u\_t} \cdot \ln^{v\_t} n \tag{6}$$

In the template, we have <sup>p</sup>f , qf , uf , vf , pt, qt, ut, vt are given integers, and <sup>c</sup>f , ct <sup>&</sup>gt; <sup>0</sup> are unknown positive coefficients to be solved. For several compatibility reasons (see Proposition <sup>1</sup> and <sup>2</sup> in the following), we require that <sup>u</sup>f , vf <sup>≥</sup> <sup>0</sup> and

<sup>u</sup>t, vt <sup>≤</sup> <sup>0</sup>. We say that the concrete values <sup>c</sup>f , <sup>c</sup>t for the unknown coefficients <sup>c</sup>f , ct <sup>&</sup>gt; <sup>0</sup> are *valid* if the concrete functions f,t obtained by substituting <sup>c</sup>f , <sup>c</sup>t for <sup>c</sup>f , ct in the template (5) and (6) satisfy the constraint (4) for every sufficiently large α, n<sup>∗</sup> <sup>≥</sup> <sup>0</sup> and all <sup>c</sup>p <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>n</sup>∗.

We consider the pseudo-polynomial template since the runtime behavior of randomized algorithms can be mostly captured by pseudo-polynomials. We choose monomial templates since our interest is the asymptotic magnitude of the tail bound. Thus, only the monomial with the highest degrees matter.

Our algorithm searches the values for <sup>p</sup>f , qf , uf , vf , pt, qt, ut, vt by an enumeration within a bounded range {−B,...,B}, where B is a manually specified positive integer. To avoid exhaustive enumeration, we use the following proposition to prune the search space.

Proposition 1. *Suppose that we have functions* t, f : [0,∞) <sup>×</sup> <sup>N</sup> <sup>→</sup> [0,∞) *that fulfill the constraint (4). Then it holds that (i)*

(pf , qf ) <sup>≤</sup> (1, 0) *and* (pt, qt) <sup>≥</sup> (−1, 0)*, and (ii)*

<sup>f</sup>(α, n) = Ω(E[p(n)])*,* <sup>f</sup>(α, n) = <sup>O</sup>(κ(n)) *and* <sup>t</sup>(α, n) = Ω(κ(n)<sup>−</sup><sup>1</sup>) *for any fixed* α > 0*, where we write* (a, b) <sup>≤</sup> (c, d) *for the lexicographic order, i.e.,* (<sup>a</sup> <sup>≤</sup> <sup>c</sup>) <sup>∧</sup> (<sup>a</sup> = <sup>c</sup> <sup>→</sup> <sup>b</sup> <sup>≤</sup> <sup>d</sup>)*.*

*Proof.* Except for the constraint that <sup>f</sup>(α, n) = Ω(E[p(n)]), the other constraints simply ensure that the tail bound is exponentially-decreasing. To see why <sup>f</sup>(α, n) = Ω(E[p(n)]), we apply Jensen's inequality [27] to (4) and obtain <sup>f</sup>(n) <sup>≥</sup> <sup>E</sup>[Ex(n|f)] = <sup>E</sup>[*S*(n) + r i=1 <sup>f</sup>(sizei(n))]. Then we imitate the proof of Theorem <sup>2</sup> and derive that <sup>f</sup>(n) <sup>≥</sup> <sup>E</sup>[p(n)]. 

Proposition <sup>1</sup> shows that it suffices to consider (i) the choice of <sup>u</sup>f , vf that makes the magnitude of <sup>f</sup> to be within <sup>E</sup>[p(n)] and <sup>κ</sup>(n), (ii) the choice of <sup>u</sup>t, vt that makes the magnitude of <sup>t</sup> <sup>−</sup><sup>1</sup> within <sup>κ</sup>(n), and (iii) the choice of <sup>p</sup>f , qf , pt, qt that fulfills (pf , qf ) <sup>≤</sup> (1, 0),(pt, qt) <sup>≥</sup> (−1, 0). Note that an overapproximation of <sup>E</sup>[p(n)] can be either obtained manually or derived from automated approaches [9].

*Example 8.* Consider the quickselect example (Example 2), suppose we are interested in the tail bound Pr[Cτ <sup>≥</sup> <sup>α</sup>·n], and we enumerate the eight integers in the template from <sup>−</sup>1 to 1. Since <sup>E</sup>[p(n)] = 4 · <sup>n</sup>, by the proposition above, we must have that (uf , vf ) = (1, 0), (ut, vt) <sup>≥</sup> (−1, 0), (pt, qt) <sup>≥</sup> (−1, 0), (pf , qf ) <sup>≤</sup> (1, 0). This reduces the number of choices for the template from 1296 to 128, where these numbers are automatically generated by our implementation. A choice is <sup>f</sup>(α, n) := <sup>c</sup>f · <sup>α</sup> · (ln <sup>α</sup>)<sup>−</sup><sup>1</sup> · <sup>n</sup> and <sup>t</sup>(α, n) := <sup>c</sup>t · ln <sup>α</sup> · <sup>n</sup>−<sup>1</sup>. 

In the second step, our algorithm solves the unknown coefficients <sup>c</sup>t, cf in the template. Once they are solved, our algorithm applies Theorem 2 to obtain the tail bound. In detail, our algorithm computes <sup>t</sup><sup>∗</sup>(α, n<sup>∗</sup>) as the minimum of <sup>t</sup>(α, n) over <sup>c</sup>p <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>n</sup>∗, and by <sup>u</sup>t, vt <sup>≤</sup> <sup>0</sup>, <sup>t</sup><sup>∗</sup>(α, n<sup>∗</sup>) is simply <sup>t</sup>(α, n<sup>∗</sup>), so that we obtain the tail bound <sup>u</sup>(α, n<sup>∗</sup>) = exp(t(α, n<sup>∗</sup>) · (f(α, n<sup>∗</sup>) <sup>−</sup> <sup>α</sup> · <sup>κ</sup>(n<sup>∗</sup>))).

*Example 9.* Continue with Example 8. Suppose we have successfully found that <sup>c</sup>f = 2, <sup>c</sup>t = 1 is a valid concrete choice for the unknown coefficients in the template. Then <sup>t</sup>∗(α, n∗) is <sup>t</sup>(α, n∗) = ln <sup>α</sup> · (n∗)−1, and we have the tail bound <sup>u</sup>(α, n∗) = exp(2 · <sup>α</sup> <sup>−</sup> <sup>α</sup> · ln <sup>α</sup>), which has better magnitude than the tail bound by Karp's method and our Theorem 3 (See Example 6). 

Our algorithm follows the guess-and-check paradigm. The guess procedure explores possible values <sup>c</sup>f , <sup>c</sup>t for <sup>c</sup>f , ct and invokes the check procedure to verify whether the current choice is valid. Below we present the guess procedure in Sect. 4.1, and the check procedure in Sect. 4.2.

#### 4.1 The Guess Procedure **Guess(***f, t***)**

The pseudocode for our guess procedure Guess(f, t) is given in Algorithm 1. In detail, it first receives a positive integer M as the doubling and halving number (Line 1), then iteratively enumerates possible values for the unknown coefficients <sup>c</sup>f and <sup>c</sup>t by doubling and halving for <sup>M</sup> times (Line 3 – Line 4), and finally calls the check procedure (Line 5). It is justified by the following theorem.

Theorem 4. *Given the template for* <sup>f</sup>(α, n) *and* <sup>t</sup>(α, n) *as in (5) and (6), if* <sup>c</sup>f , <sup>c</sup>t *are valid choices, then (i) for every* k > <sup>1</sup>*,* <sup>k</sup> · <sup>c</sup>f , <sup>c</sup>t *remains to be valid, and (ii) for every* <sup>0</sup> <k< <sup>1</sup>*,* <sup>c</sup>f , k · <sup>c</sup>t *remains to be valid.*


By Theorem 4, if the check procedure is sound and complete (i.e., CheckCond always terminates and <sup>c</sup>f , <sup>c</sup>t fulfills the constraint (4) iff CheckCond(cf , <sup>c</sup>t) returns true), then the guess procedure guarantees to find a solution <sup>c</sup>f , <sup>c</sup>t (if it exists) when the parameter M is large enough.

*Example 10.* Continued with Example 8, suppose <sup>M</sup> = 2, we enumerate <sup>c</sup>f from { <sup>1</sup> <sup>2</sup> , <sup>1</sup>, <sup>2</sup>}, and <sup>c</sup><sup>t</sup> from {1, <sup>1</sup> <sup>2</sup> , <sup>1</sup> <sup>4</sup> }. We try every possible combination, and we find that CheckCond(2, 1) returns true. Thus, we return (2, 1) as the result. In Sect. 4.2, we will show how to conclude that CheckCond(2, 1) is true. 

## 4.2 The Check Procedure **CheckCond(***cf , ct***)**

The check procedure takes as input the concrete values <sup>c</sup>f , <sup>c</sup>t for the unknown coefficients in the template, and outputs whether they are valid. It is the most involved part in our algorithm due to the difficulty to tackle the validity of the constraint (4) that involves the composition of polynomials, exponentiation and logarithms. The existence of a sound and complete decision procedure for such validity is extremely difficult and is a long-standing open problem [1,33].

To circumvent this difficulty, the check procedure first strengthens the original constraint (4) into a canonical constraint with a specific form, so that a decision algorithm that is sound and complete up to any additive error applies. Below we fix a PRR with procedure p in the canonical form (2). We also discuss possible extensions for the check procedure in Remark 1.

The Canonical Constraint. We first present the canonical constraint <sup>Q</sup>(α, n) and how to decide the canonical constraint. The constraint is given by (where ∀<sup>∞</sup> means "for all sufficiently large α" or formally ∃α0.∀α ≥ α0)

$$Q(\alpha, n) := \forall^{\infty} \alpha. \forall n \ge c\_p. \left[ \sum\_{i=1}^k \gamma\_i \cdot \exp(f\_i(\alpha) + g\_i(n)) \le 1 \right] \tag{7}$$

subject to:

(C1) For each <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, we have <sup>γ</sup>i <sup>&</sup>gt; <sup>0</sup> is a positive constant, <sup>f</sup>i(α) is a pseudo-polynomial in <sup>α</sup>, and <sup>g</sup>i(n) is a pseudo-polynomial in <sup>n</sup>.

(C2) For each <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, the exponents for <sup>n</sup> and ln <sup>n</sup> in <sup>g</sup>i(n) are non-negative.

We use <sup>Q</sup>L(α, n) to represent the summation term k i=1 <sup>γ</sup><sup>i</sup> · exp(fi(α) + <sup>g</sup>i(n)) in (7). Below we show that this can be checked by the algorithm *Decide* up to any additive error. We present an overview of this algorithm. We also present its pseudo-code in Algorithm 2.

The algorithm *Decide* requires an external function NegativeLB(P(n)) that takes on input a pseudo-polynomial <sup>P</sup>(n) and outputs an integer <sup>T</sup> <sup>∗</sup> n such that <sup>P</sup>(n) <sup>≤</sup> 0 for every <sup>n</sup> <sup>≥</sup> <sup>T</sup> <sup>∗</sup> n , or output <sup>+</sup><sup>∞</sup> for the absence of <sup>T</sup> <sup>∗</sup> n . The idea of this function is to apply the monotonicity of pseudo-polynomials. With the function NegativeLB(P(n)), the algorithm *Decide* consists of two steps as follows.

*First*, we can change the bound of <sup>n</sup> from [cp,∞) into [cp, Tn], where <sup>T</sup>n is a constant, without affecting the soundness and completeness. This is achieved by the observation that either: (i) we can conclude <sup>Q</sup>(α, n) does not hold, or (ii) there is an integer <sup>T</sup>n such that <sup>Q</sup>L(α, n) is non-increasing when <sup>n</sup> <sup>≥</sup> <sup>T</sup>n. Hence, it suffices only to consider <sup>c</sup>p <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>T</sup>n. Below we show how to compute <sup>T</sup>n by case analysis of the limit <sup>M</sup>i of <sup>g</sup>i(n) as <sup>n</sup> → ∞, for each <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>.


Finally, we set <sup>T</sup>n as the maximum of <sup>L</sup>i's and <sup>c</sup>p.

*Second*, for every integer <sup>c</sup>p <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>T</sup>n, we substitute <sup>n</sup> with <sup>n</sup> to eliminate <sup>n</sup> in <sup>Q</sup>(α, n). Then, each exponent <sup>f</sup>i(α) + <sup>g</sup>i(n) becomes a pseudo-polynomial solely over α. Since we only concern sufficiently large α, we can compute the limit <sup>R</sup>n for <sup>Q</sup>L(α, <sup>n</sup>) as <sup>α</sup> → ∞. We decide based on the limit <sup>R</sup>n as follows.


#### Algorithm 2: The Decision procedure for canonical constraints


Algorithm *Decide* is sound, and complete up to any additive error, as is illustrated by the following theorem.

Theorem 5. *Algorithm Decide has the following properties:*


The Strengthening Procedure. Then we show how to strengthen the constraint (4) into the canonical constraint (7), so that Algorithm *Decide* applies. We rephrase (4) as

$$\mathbb{E}\left[\exp(t(\alpha,n)\cdot\left(\mathbf{S}(n)+\sum\_{i=1}^{r}f(\alpha,\text{size}\_{i}(n))-f(\alpha,n)\right)\right] \le 1\tag{8}$$

and consider two functions f,t obtained by substituting the concrete values <sup>c</sup>f , <sup>c</sup>t for unknown coefficients into the template (5) and (6). We observe that the jointdistribution of the random quantities <sup>S</sup>(n), r ∈ {1, <sup>2</sup>} and size<sup>1</sup>(n),...,sizer(n) in the canonical form (2) over PRRs can be described by several probabilistic branches {c<sup>1</sup> : <sup>B</sup>1,...,ck : <sup>B</sup>k}, which corresponds to the probabilistic choice commands in the PRR. Each probabilistic branch <sup>B</sup>i has a constant probability <sup>c</sup>i, a deterministic pre-processing time <sup>S</sup>i(n), a fixed number of subprocedure calls <sup>r</sup>i, and a probability distribution for the variable <sup>v</sup>. The strengthening first handles each probabilistic branch, and then combines the strengthening results of every branch into a single canonical constraint.

The strengthening of each branch is an application of a set of rewriting rules. Intuitively, each rewriting step over-approximates and simplifies the expectation term in the LHS of (8). Through multiple steps of rewriting, we eventually obtain the final canonical constraint. Below we present the details of the strengthening for a single probabilistic branch with the single recursion case. The divide-andconquer case follows a similar treatment, see the extended version for details.

Consider the single recursion case <sup>r</sup> = 1 where a probabilistic branch has deterministic pre-processing time <sup>S</sup>(n), distribution dist for the variable <sup>v</sup> and passed size <sup>H</sup>(v, n) for the recursive call. We have a case analysis on the distribution dist as follows.

— *Case I*: dist is a FSDPD discrete{c <sup>1</sup> : expr1,...,c k : exprk}, where <sup>v</sup> observes as expri with probability <sup>c</sup> i. Then the expectation in (8) is exactly:

$$\sum\_{i=1}^{k} c\_i' \cdot \exp\left(t(\alpha, n) \cdot S(n) + t(\alpha, n) \cdot f(\alpha, H(\textsf{expr}\_i, n)) - t(\alpha, n) \cdot f(\alpha, n)\right)$$

Thus it suffices to over-approximate the exponent <sup>X</sup>i(α, n) := <sup>t</sup>(α, n) · *<sup>S</sup>*(n) + <sup>t</sup>(α, n)· <sup>f</sup>(α, H(expri, n))−t(α, n)· <sup>f</sup>(n) into the form subject to (C1)–(C2). For this purpose, our strengthening repeatedly applies the following rewriting rules (R1)–(R4) for which 0 <a< 1 and b > 0:

$$\begin{aligned} \text{(R1) } f(\alpha, H(\text{expr}\_i, n)) &\le f(\alpha, n) \\ \text{(R2) } \ln(an - b) &\le \ln n + \ln a \quad \ln(an + b) \le \ln n + \ln(\min\{1, a + \frac{b}{c\_p}\}) \\ \text{(R3) } 0 \le n^{-1} \le c\_p^{-1} \quad 0 \le \ln^{-1} n \le \ln^{-1} c\_p \quad \text{(R4) } \lfloor \frac{n}{b} \rfloor &\le \frac{n}{b} \quad \lceil \frac{n}{b} \rceil \le \frac{n}{b} + \frac{b - 1}{b} \end{aligned}$$

(R1) follows from the well-formedness <sup>0</sup> <sup>≤</sup> <sup>H</sup>(sizei, n) <sup>≤</sup> <sup>n</sup> and the monotonicity of <sup>f</sup>(α, n) with respect to <sup>n</sup>. (R2)–(R4) are straightforward. Intuitively, (R1) can be used to cancel the term <sup>f</sup>(α, H(sizei, n)) <sup>−</sup> <sup>f</sup>(α, n), (R2) simplifies the subexpression in ln, (R3) is used to remove floors and ceils, and (R4) to remove <sup>n</sup><sup>−</sup><sup>c</sup> and ln<sup>−</sup><sup>c</sup> <sup>n</sup> to satisfy the restriction (C2) of the canonical constraint. To apply these rules, we consider two strategies below.


Our algorithm first tries to apply (S2-D), if it fails to derive a canonical constraint, then we apply the alternative (S1-D) to the original constraint. If both the strategies fails, we report failure and exit the check procedure.

*Example 11.* Suppose <sup>v</sup> observes as {0.5 : <sup>n</sup> <sup>−</sup> 1, 0.5 : <sup>n</sup> <sup>−</sup> 2}, S(n) := ln n, t(α, n) := ln <sup>α</sup> ln n , f(α, n) := 4 · <sup>α</sup> ln α · <sup>n</sup> · ln n, H(v, n) := <sup>v</sup>. We consider applying both strategies to the first term expr<sup>1</sup> := <sup>n</sup> <sup>−</sup> <sup>1</sup> and <sup>X</sup>1(α, n) := <sup>t</sup>(α, n)·(*S*(n)+f(α, n−1)−f(α, n)). If we apply (S1-D) to <sup>X</sup>1, it will be approximated as exp(ln <sup>α</sup>). If we apply (S2-D) to <sup>X</sup>1, it will be first over-approximated as ln <sup>α</sup> ln n · (ln <sup>n</sup> + 4 · <sup>α</sup> ln α · <sup>v</sup> · ln <sup>n</sup> <sup>−</sup> <sup>4</sup> · <sup>α</sup> ln α · <sup>n</sup> · ln <sup>n</sup>), then we substitute <sup>v</sup> <sup>=</sup> <sup>n</sup> <sup>−</sup> <sup>1</sup> and derive the final result exp(ln <sup>α</sup> <sup>−</sup> 4 · <sup>α</sup>). Hence, both the strategies succeed. 

— *Case II*: dist is uniform(n) or muniform(n). Note that <sup>H</sup>(v, n) is linear with respect to <sup>v</sup>, thus <sup>H</sup>(v, n) is a bijection over <sup>v</sup> for every fixed <sup>n</sup>. Hence, if <sup>v</sup> observes as uniform(n), then

$$\mathbb{E}[\exp(t(\alpha, n) \cdot f(\alpha, H(v, n)))] \le \frac{1}{n} \sum\_{v=0}^{n-1} \exp(t(\alpha, n) \cdot f(\alpha, v)) \tag{9}$$

If <sup>v</sup> observes as muniform(n), a similar inequality holds by replacing <sup>1</sup> n with <sup>2</sup> n . Since <sup>f</sup>(α, v) is a non-decreasing function with respect to <sup>v</sup>, we further overapproximate the summation in (9) by the integral <sup>n</sup> <sup>0</sup> exp(t(α, n) · <sup>f</sup>(α, v))dv.

*Example 12.* Continue with Example 10, we need to check

<sup>t</sup>(α, n) = ln <sup>α</sup> n and <sup>f</sup>(α, n) = <sup>2</sup>·<sup>α</sup> ln α · <sup>n</sup>. By the inequality (9), we expand the constraint (8) into <sup>2</sup> n · exp(ln <sup>α</sup> <sup>−</sup> <sup>2</sup> · <sup>α</sup>) · n−<sup>1</sup> v=0 exp( <sup>2</sup>·α·<sup>i</sup> n ). By integration, it is further over-approximated as <sup>2</sup> n · exp(ln <sup>α</sup> <sup>−</sup> <sup>2</sup> · <sup>α</sup>) · n <sup>0</sup> exp( <sup>2</sup>·α·<sup>v</sup> n )dv. 

Note that we still need to resolve the integration of an exponential function whose exponent is a pseudo-monomial over α, n, v. Below we denote by <sup>d</sup>v the degree on the variable <sup>v</sup> and by v the degree of ln <sup>v</sup>. We first list the situations where the integral can be computed exactly.


Then we handle the situation where the exact computation of the integral is infeasible. In this situation, the strengthening further over-approximates the integral into simpler forms by first replacing ln <sup>v</sup> with ln <sup>n</sup>, and then replacing <sup>v</sup> with <sup>n</sup> to reduce the degrees v and <sup>d</sup>v. Eventually, the exponent in the integral bows down to one of the three situations (where the integral can be computed exactly) above, and the strengthening returns the exact value of the integral.

*Example 13.* Continue with Example 12. We express the exponent as <sup>2</sup>·<sup>α</sup> n · <sup>v</sup>. Thus, we can plug <sup>2</sup>·<sup>α</sup> n into <sup>W</sup>(α, n) and obtain the integration result exp(2·α) <sup>2</sup>·α/n . Furthermore, we can simplify the formula in Example 12 as exp(ln <sup>α</sup>) α . 

In the end, we move the term <sup>1</sup> n (or <sup>2</sup> n ) that comes from the uniform (or muniform) distribution and the coefficient term <sup>W</sup>(α, n) into the exponent. If we move these terms directly, it may produce ln ln <sup>n</sup> and ln ln <sup>α</sup> that comes from taking the logarithm of ln <sup>n</sup> and ln <sup>α</sup>. Hence, we first apply ln <sup>c</sup>p <sup>≤</sup> ln <sup>n</sup> <sup>≤</sup> <sup>n</sup> and 1 <sup>≤</sup> ln <sup>α</sup> <sup>≤</sup> <sup>α</sup> to remove all terms ln <sup>n</sup> and ln <sup>α</sup> outside the exponent (e.g., ln <sup>α</sup> ln n is over-approximated as <sup>α</sup> ln <sup>c</sup>*<sup>p</sup>* ). After the over-approximation, the terms outside the exponentiation form a polynomial over α and n, we can trivially move these terms into the exponent by taking the logarithm. Finally, we apply (R4) in Case I to remove <sup>n</sup><sup>−</sup><sup>c</sup> and ln<sup>−</sup><sup>c</sup> <sup>n</sup>. If we fail to obtain the canonical constraint, the strengthening reports failure.

*Example 14.* Continue with Example 13, we move the term α into the exponentiation and simplify the over-approximation result as exp(ln <sup>α</sup> <sup>−</sup> ln <sup>α</sup>)=1. As a result, we over-approximate the LHS of (8) as 1 and we conclude that CheckCond(2, 1) holds. 

The details of the divide-and-conquer case are similar and omitted. Furthermore, we present how to combine the strengthening results for different branches into a single canonical constraint. Suppose for every probabilistic branch <sup>B</sup>i, we have successfully obtained the canonical constraint <sup>Q</sup>L,i(α, n) <sup>≤</sup> <sup>1</sup> as the strengthening of the original constraint (8). Then, the canonical constraint for the whole distribution is k i=1 <sup>c</sup><sup>i</sup> · <sup>Q</sup>L,i(α, n) <sup>≤</sup> <sup>1</sup>. Intuitively, there is probability <sup>c</sup>i for the branch <sup>B</sup>i, thus the combination follows by simply expanding the expectation term.

A natural question is to ask whether our algorithm can always succeed to obtain the canonical constraint. We have the proposition as follows.

Proposition 2. *If the template for* <sup>t</sup> *has a lower magnitude than* <sup>S</sup>(n)<sup>−</sup><sup>1</sup> *for every branch, then the rewriting always succeeds.*

*Proof.* We first consider the single recursion case. When dist is FSDPD, we can apply (S1-D) to over-approximate the exponent as <sup>t</sup>(α, n) · <sup>S</sup>(n). Since <sup>t</sup>(α, n) has a lower magnitude than <sup>S</sup>(n)<sup>−</sup><sup>1</sup>, by further applying (R3) to eliminate <sup>n</sup><sup>−</sup><sup>c</sup> and ln<sup>−</sup><sup>c</sup> <sup>n</sup>, we obtain the canonical constraint. If dist is uniform(n) or muniform(n) , we observe that the over-approximation result for the integral is either exp(f(α,n)) f(α,n)·t(α,n) (when <sup>d</sup><sup>v</sup> <sup>&</sup>gt; <sup>0</sup>) or ln <sup>n</sup>·exp(f(α,n)) f(α,n)·t(α,n) (when <sup>d</sup><sup>v</sup> = 0). Thus, we can cancel the term <sup>f</sup>(α, n) in the exponent and obtain the canonical constraint by the subsequent steps. The proof is the same for the divide-and-conquer case. 

By Proposition 2, we restrict <sup>u</sup>t, vt <sup>≤</sup> <sup>0</sup> in the template to ensure our algorithm never fails.

*Remark 1.* Our algorithm can be extended to support piecewise uniform distributions (e.g. each of 0, . . . , n/2 with probability <sup>2</sup> <sup>3</sup>n and each of n/2+1,...,n−<sup>1</sup> with probability <sup>4</sup> <sup>3</sup>n ) by handling each piece separately.

## 5 Experimental Results

In this section, we evaluated our algorithm over classical randomized algorithms such as QuickSort (Example 3), QuickSelect (Example 2), DiameterComputation [26, Chapter 9], RandomizedSearch [24, Chapter 9], ChannelConflictResolution [22, Chapter 13], examples such as Rdwalk and Rdadder in the literature [7], and four manually-crafted examples (MC1 – MC4). For each example, we manually compute its expected running time for the prunning.

We implemented our algorithm in C++. We choose <sup>B</sup> = 2 (as the bounded range for the template), <sup>M</sup> = 4 (in the guess procedure), <sup>Q</sup> = 8 (for the number of parts in the integral), and prune the search space by Theorem 1. All results were obtained on an Ubuntu 18.04 machine with an 8-Core Intel i7-7900x Processor (4.30 GHz) and 40 GB of RAM.

We report the tail bound derived by our algorithm in Table 1, where "Benchmark" lists the benchmarks, "α·κ(n<sup>∗</sup>)" lists the time

Fig. 2. Plot for QuickSelect

limit of interest, "Our bound" lists the tail bound by our approach, "Time(s)" lists the runtime (in seconds) of our approach, and "Karp's bound" lists the bounds by Karp's method. From the table, our algorithm constantly derives asymtotically tighter tail bounds than Karp's method. Moreover, all these bounds are obtained in a few seconds, demonstrating the efficiency of our algorithm. Furthermore, our algorithm obtains bounds with tighter magnitude than our completeness theorem (Theorem 3) in 9 benchmarks, and bounds with the same magnitude as the others.

For an intuitive comparison, we also report the concrete bounds and their plots of our method and Karp's method. We choose three concrete choices of α and <sup>n</sup><sup>∗</sup> and plot the concrete bounds over <sup>10</sup> <sup>≤</sup> <sup>α</sup> <sup>≤</sup> <sup>15</sup>, n<sup>∗</sup> = 17. For concrete bounds, we also report the ratio Karp's Bound Our Bound to show the strength of our method. Due to space limitations, we only report the results for QuickSelect (Example 2) in Table 2 and Fig. 2.


Table 1. Experimental Result

Table 2. Concrete Bounds for QuickSelect


#### 6 Related Work

*Karp's Cookbook.* Our approach is orthogonal to Karp's cookbook method [21] since we base our approach on Markov's inequality, and the core of Karp's method is a dedicated proof for establishing that an intricate tail bound function is a prefixed point of the higher order operator derived from the given PRR. Furthermore, our automated approach can derive asymptotically tighter tail bounds than Karp's method over all 12 PRRs in our benchmark. Our approach could also handle randomized preprocessing times, which is beyond the reach of Karp's method. Since Karp's proof of prefixed point is ad-hoc, it is non-trivial to extend his method to handle the randomized cost. Nevertheless, there are PRRs (e.g., Coupon-Collector) that can be handled by Karp's method but not by ours. Thus, our approach provides a novel way to obtain asymptotically tighter tail bounds than Karp's method.

The recent work [30] extends Karp's method for deriving tail bounds for parallel randomized algorithms. This method derives the same tail bounds as Karp's method over PRRs with a single recursive call (such as QuickSelect) and cannot handle randomized pre-processing time. Compared with this approach, our approach derives tail bounds with tighter magnitude on 11/12 benchmarks.

*Custom Analysis.* Custom analysis of PRRs [15,25] has successfully derived tight tail bounds for QuickSelect and QuickSort. Compared with the custom analysis that requires ad-hoc proofs, our approach is automated, has the generality from Markov's inequality, and is capable of deriving bounds identical or very close to the tail bounds from the custom analysis.

*Probabilistic Programs.* There are also relevant approaches in probabilistic program verification. These approaches are either based on martingale concentration inequalities (for exponentially-decreasing tail bounds) [7,10–12,19], Markov's inequality (for polynomially-decreasing tail bounds) [8,23,31], fixedpoint synthesis [32], or weakest precondition reasoning [4,20]. Compared with these approaches, our approach is dedicated to PRRs (a light-weight representation of recursive probabilistic programs) and involves specific treatment of common recursive patterns (such as randomized pivoting and divide-and-conquer) in randomized algorithms, while these approaches usually do not consider common recursion patterns in randomized algorithms. Below we have detailed technical comparisons with these approaches.


Acknowledgement. We thank Prof. Bican Xia for valuable information on the exponential theory of reals. The work is partially supported by the National Natural Science Foundation of China (NSFC) with Grant No. 62172271, ERC CoG 863818 (ForM-SMArt), the Hong Kong Research Grants Council ECS Project Number 26208122, the HKUST-Kaisa Joint Research Institute Project Grant HKJRI3A-055 and the HKUST Startup Grant R9272.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Compositional Probabilistic Model Checking with String Diagrams of MDPs**

Kazuki Watanabe1,2(B) , Clovis Eberhart1,3 , Kazuyuki Asada<sup>4</sup> , and Ichiro Hasuo1,2

1 National Institute of Informatics, Tokyo, Japan {kazukiwatanabe,eberhart,hasuo}@nii.ac.jp <sup>2</sup>

The Graduate University for Advanced Studies (SOKENDAI), Hayama, Japan <sup>3</sup> Japanese-French Laboratory of Informatics, 3527 CNRS, Tokyo, Japan <sup>4</sup> Tohoku University, Sendai, Japan kazuyuki.asada.b6@tohoku.ac.jp

**Abstract.** We present a compositional model checking algorithm for Markov decision processes, in which they are composed in the categorical graphical language of *string diagrams*. The algorithm computes optimal expected rewards. Our theoretical development of the algorithm is supported by category theory, while what we call decomposition equalities for expected rewards act as a key enabler. Experimental evaluation demonstrates its performance advantages.

**Keywords:** model checking · compositionality · Markov decision process · category theory · monoidal category · string diagram

## **1 Introduction**

*Probabilistic model checking* is a topic that attracts both theoretical and practical interest. On the practical side, probabilistic system models can naturally accommodate uncertainties inherent in many real-world systems; moreover, probabilistic model checking can give quantitative answers, enabling more fine-grained assessment than qualitative verification. Model checking of Markov decision processes (MDPs)—the target problem of this paper—has additional practical values since it not only verifies a specification but also synthesizes an optimal control strategy. On the theoretical side, it is notable that probabilistic model checking has a number of efficient algorithms, despite the challenge that the problem involves continuous quantities (namely probabilities). See e.g. [1].

However, even those efficient algorithms can struggle when a model is enormous. Models can easily become enormous—the so-called *state-space explosion*

JPMJFS2136. c The Author(s) 2023

The authors are supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST. K.W. is supported by the JST grant No.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 40–61, 2023. https://doi.org/10.1007/978-3-031-37709-9\_3

*problem*—due to the growing complexity of modern verification targets. Models that exceed the memory size of a machine for verification are common.

Among possible countermeasures to state-space explosion, one with both mathematical blessings and a proven track record is *compositionality*. It takes as input a model with a compositional structure—where smaller *component* models are combined, sometimes with many layers—and processes the model in a divideand-conquer manner. In particular, when there is repetition among components, compositional methods can exploit the repetition and reuse intermediate results, leading to a clear performance advantage.

Focusing our attention to MDP model checking, there have been many compositional methods proposed for various settings. One example is [14]: it studies probabilistic automata (they are only slightly different from MDPs) and in particular their *parallel composition*; the proposed method is a compositional framework, in an assume-guarantee style, based on multi-objective probabilistic model checking. Here, *contracts* among parallel components are not always automatically obtained. Another example is [11], where the so-called *hierarchical model checking* method for MDPs is introduced. It deals with *sequential composition* rather than parallel composition; assuming what can be called *parametric homogeneity* of components—they must be of the same shape while parameter values may vary—they present a model-checking algorithm that computes a guaranteed interval for the optimal expected reward.

In this work, inspired by these works and technically building on another recent work of ours [20], we present another compositional MDP model checking algorithm. We compose MDPs in *string diagrams*—a graphical language of category theory [15, Chap. XI] that has found applications in computer science [3,8,17]—that are more sequential than parallel. Our algorithm computes the optimal expected reward, unlike [11].

One key ingredient of the algorithm is the identification of compositionality as the *preservation of algebraic structures*; more specifically, we identify a compositional solution as a "homomorphisms" of suitable *monoidal categories*. This identification guided us in our development, explicating requirements of a desired compositional semantic domain (Sect. 2).

Another key ingredient is a couple of *decomposition equalities* for reachability probabilities, extended to expected rewards (Sect. 3). Those for reachability probabilities are well-known—one of them is *Girard's execution formula* [7] in linear logic—but our extension to expected rewards seems new.

The last two key ingredients are combined in Sect. 4 to formulate a compositional solution. Here we benefit from general categorical constructions, namely the Int *construction* [10] and *change of base* [5,6].

We implemented the algorithm (it is called CompMDP) and present its experimental evaluation. Using the benchmarks inspired by real-world problems, we show that 1) CompMDP can solve huge models in realistic time (e.g. 10<sup>8</sup> positions, in 6–130 s); 2) compositionality does boost performance (in some ablation experiments); and 3) the choice of the degree of compositionality is important. The last is enabled in CompMDP by the operator we call *freeze*.

**Fig. 1.** String diagrams of MDPs, an example (the Patrol benchmark in Sect. 5).

**Fig. 2.** Sequential composition ;, sum <sup>⊕</sup>, and loops of MDPs, illustrated.

**Compositional Description of MDPs by String Diagrams.** The calculus we use for composing MDPs is that of *string diagrams*. Figure 1 shows an example used in experiments. String diagrams offer two basic composition operations, *sequential composition* ; and *sum* ⊕, illustrated in Fig. 2. The rearrangement of wires in A⊕B is for bundling up wires of the same direction. It is not essential.

We note that *loops* in MDPs can be described using these algebraic operations, as shown in Fig. 2. We extend MDPs with open ends so that they allow such composition; they are called *open MDPs*.

The formalism of string diagrams originates from category theory, specifically from the theory of *monoidal categories* (see e.g. [15, Chap. XI]). Capturing the mathematical essence of the algebraic structure of arrow composition ◦ and tensor product ⊗—they correspond to ; and ⊕ in this work, respectively—monoidal categories and string diagrams have found their application in a wide variety of scientific disciplines, such as quantum field theory [12], quantum mechanics and computation [8], linguistics [17], signal flow diagrams [3], and so on.

Our reason for using string diagrams to compose MDPs is twofold. Firstly, string diagrams offer a rich metatheory—developed over the years together with its various applications—that we can readily exploit. Specifically, the theory covers *functors*, which are (structure-preserving) homomorphisms between monoidal categories. We introduce a *solution functor* <sup>S</sup> : **oMDP** <sup>→</sup> <sup>S</sup> from a category **oMDP** of open MDPs to a semantic category S that consists of solutions. We show that the functor S preserves two composition operations, that is,

$$\mathcal{S}(\mathcal{A}; \mathcal{B}) = \mathcal{S}(\mathcal{A}) \; ; \mathcal{S}(\mathcal{B}), \quad \mathcal{S}(\mathcal{A} \oplus \mathcal{B}) = \mathcal{S}(\mathcal{A}) \oplus \mathcal{S}(\mathcal{B}), \tag{1}$$

where ; and ⊕ on the right-hand sides are *semantic composition* operations on S. The equalities (1) are nothing but *compositionality*: the solution of the whole (on the left) is computed from the solutions of its parts (on the right).

The second reason for using string diagrams is that they offer an expressive language for composing MDPs—one that enables an efficient description of a number of realistic system models—as we demonstrate with benchmarks in Sect. 5.

**Granularity of Semantics: A Challenge Towards Compositionality** Now the main technical challenge is the design of a semantic domain S (it is a category in our framework). We shall call it the challenge of *granularity of semantics*; it is encountered generally when one aims at compositional solutions.


Therefore, in choosing S, one should find the smallest enrichment<sup>1</sup> of the original semantic domain that addresses all relevant interactions between components and thus enables compositional solutions. This is a theoretical challenge.

In this work, following our recent work [20] that pursued a compositional solution of parity games, we use category theory as guidance in tackling the above challenge. Our goal is to obtain a solution functor <sup>S</sup> : **oMDP** <sup>→</sup> <sup>S</sup> that preserves suitable algebraic structures (see (1)); the specific notion of algebra of our interest is that of *compact closed categories (compCC)*.

– The category **oMDP** organizes open MDPs as a category. It is a compCC, and its algebraic operations are defined as in Fig. 2.

<sup>1</sup> *Enrichment* here is in the natural language sense; it has nothing to do with the technical notion of *enriched category*.


Specifically, we find that S must be enriched with *reachability probabilities*, in addition to the desired solutions (namely expected rewards), to be a compCC. This enrichment is based on the *decomposition equalities* we observe in Sect. 3.

After all, our semantic category S is as follows: 1) an object is a pair of natural numbers describing an interface (how many entrances and exits); 2) an arrow is a collection of "semantics," collected over all possible (memoryless) schedulers τ , which records the expected reward that the scheduler τ yields when it traverses from each entrance to each exit. The last "semantics" is enriched so that it records the reachability probability, too, for the sake of compositionality.

**Related Work.** Compositional model checking is studied e.g. in [4,19,20]. Besides, probabilistic model checking is an actively studied topic; see [1, Chap. 10] for a comprehensive account. We shall make a detailed comparison with the works [11,14] that study compositional probabilistic model checking.

The work [14] introduces an assume-guarantee reasoning framework for parallel composition , as we already discussed. Parallel composition is out of our current scope; in fact, we believe that compositionality with respect to requires a much bigger enrichment of a semantic domain S than mere reachability probabilities as in our work. The work [14] is remarkable in that its solution to this granularity problem—namely by assume-guarantee reasoning—is practically sensible (domain experts often have ideas about what contract to impose) and comes with automata-theoretic automation. That said, such contracts are not always automatically synthesized in [14], while our algorithm is fully automatic.

The work [11] is probably the closest to ours in the type of composition (sequential rather than parallel) and automation. However, the technical bases of the two works are quite different: theirs is the theory of *parametric MDPs* [18], which is why their emphasis is on parametrized components and interval solutions; ours is monoidal categories and some decomposition equalities (Sect. 3).

We note that the work [11] and ours are not strictly comparable. On the one hand, we do not need a crucial assumption in [11], namely that a locally optimal scheduler in each component is part of a globally optimal scheduler. The assumption limits the applicability of [11]—it practically forces each component to have only one exit. The assumption does not hold in our benchmarks Patrol and Wholesale (see Sect. 5). Our algorithm does not need the assumption since it collects the semantics of all relevant memoryless schedulers.

On the other hand, unlike [11], our algorithm is not parametric, so it cannot exploit the similarity of components if they only differ in parameter values. Note that the target problems are different, too (interval [11] vs. exact here).

**Notations.** For natural numbers m and n, we let [m, n] := {m, m + 1,...,n − 1, n}; as a special case, we let [m] := {1, 2,...,m} (we let [0] = ∅ by convention). The disjoint union of two sets X, Y is denoted by X + Y .

**Fig. 3.** Categories of MDPs/MCs, semantic categories, and solution functors.

#### **2 String Diagrams of MDPs**

We introduce our calculus for composing MDPs, namely *string diagrams of MDPs*. Our formal definition is via their *unidirectional* and *Markov chain (MC)* restrictions. This apparent detour simplifies the theoretical development, allowing us to exploit the existing categorical infrastructure on (monoidal) categories.

#### **2.1 Outline**

We first make an overview of our technical development. Although we use some categorical terminologies, prior knowledge of them is not needed in this outline.

Figure 3 is an overview of relevant categories and functors. The verification targets—*open MDPs*—are arrows in the compact closed category (compCC) **oMDP**. The operations ;, ⊕ of compCCs compose MDPs, as shown in Fig. 2. Our semantic category is denoted by S, and our goal is to define a solution functor **oMDP** <sup>→</sup> <sup>S</sup> that is compositional. Mathematically, such a functor with the desired compositionality (cf. (1)) is called a *compact closed functor*.

Since its direct definition is tedious, our strategy is to obtain it from a unidirectional *rightward* framework <sup>S</sup>**<sup>r</sup>** : **roMDP** <sup>→</sup> <sup>S</sup>**r**, which canonically induces the desired bidirectional framework via the celebrated Int *construction* [10]. In particular, the category **oMDP** is defined by **oMDP** = Int(**roMDP**); so are the semantic category and the solution functor (<sup>S</sup> = Int(S**r**), <sup>S</sup> = Int(S**r**)).

Going this way, a complication that one would encounter in a direct definition of **oMDP** (namely potential loops of transitions) is nicely taken care of by the Int construction. Another benefit is that some natural equational axioms in **oMDP**—such as the associativity of sequential composition ;—follow automatically from those in **roMDP**, which are much easier to verify.

Mathematically, the unidirectional framework <sup>S</sup>**<sup>r</sup>** : **roMDP** <sup>→</sup> <sup>S</sup>**<sup>r</sup>** consists of *traced symmetric monoidal categories (TSMCs)* and *traced symmetric monoidal functors*; these are "algebras" of unidirectional graphs. The Int construction turns TSMCs into compCCs, which are "algebras" of bidirectional graphs.

Yet another restriction is given by *(rightward open) Markov chains (MCs)*. See the bottom row of Fig. 3. This MDP-to-MC restriction greatly simplifies our semantic development, freeing us from the bookkeeping of different schedulers. In fact, we can introduce (optimal memoryless) schedulers systematically by the categorical construction called *change of base* [5,6]; this way we obtain the semantic category S**<sup>r</sup>** from SMC **<sup>r</sup>** .

#### **2.2 Open MDPs**

We first introduce *open MDPs*; they have open ends via which they compose. They come with a notion of *arity*—the numbers of open ends on their left and right, distinguishing leftward and rightward ones. For example, the one on the right is from (2, 1) to (1, 3).

**Definition 2.1 (open MDP (oMDP)).** *Let* A *be a non-empty finite set, whose elements are called* actions*. An* open MDP A *(* over *the action set* A*) is the tuple* (m, n, Q, A, E, P, R) *of the following data. We say that it is* from m to n*.*

	- *For all* s, s ∈ [m**<sup>r</sup>** + n**l**] + Q*, if* exits(s) ∩ exits(s ) = ∅*, then* s = s *.*
	- *We further require that each exit is reached from an identical position by at most one action. That is, for each exit* t ∈ [n**<sup>r</sup>** + m**l**]*,* s ∈ Q*, and* a, b ∈ A*, if both* P(s, a, t) > 0 *and* P(s, b, t) > 0*, then* a = b*.*

Note that the unique access to each exit condition is for technical convenience; this can be easily enforced by adding an extra "access" position to an exit.

We define the semantics of open MDPs, which is essentially the standard semantics of MDPs given by expected cumulative rewards. In this paper, it suffices to consider memoryless schedulers (see Remark 2.1).

**Definition 2.2 (path and scheduler).** *Let* A = (m, n, Q, A, E, P, R) *be an open MDP. A (finite)* path <sup>π</sup>(i,j) *in* <sup>A</sup> *from an entrance* <sup>i</sup> <sup>∈</sup> [m**<sup>r</sup>** <sup>+</sup> <sup>n</sup>**l**] *to an exit* j ∈ [n**<sup>r</sup>** + m**l**] *is a finite sequence* i, s1,...,sn, j *such that* E(i) = s<sup>1</sup> *and for all* <sup>k</sup> <sup>∈</sup> [n]*,* <sup>s</sup><sup>k</sup> <sup>∈</sup> <sup>Q</sup>*. For each* <sup>k</sup> <sup>∈</sup> [n]*,* <sup>π</sup>(i,j) <sup>k</sup> *denotes* <sup>s</sup>k*, and* <sup>π</sup>(i,j) <sup>n</sup>+1 *denotes* j*. The set of all paths in* A *from* i *to* j *is denoted by* PathA(i, j)*.*

*A* (memoryless) scheduler τ *of* A *is a function* τ : Q → A*.*

*Remark 2.1.* It is well-known (as hinted in [2]) that we can restrict to memoryless schedulers for optimal expected rewards, *assuming that* the MDP in question is almost surely terminating under any scheduler (†). We require the assumption (†) in our compositional framework, too, and it is true in all benchmarks in this paper. The assumption (†) must be checked only for the top-level (composed) MDP; (†) for its components can then be deduced.

**Definition 2.3 (probability and reward of a path).** *Let* A = (m, n, Q, A, E, P, R) *be an open MDP,* <sup>τ</sup> : <sup>Q</sup> <sup>→</sup> <sup>A</sup> *be a scheduler of* <sup>A</sup>*, and* <sup>π</sup>(i,j) *be a path in* <sup>A</sup>*. The* probability PrA,τ (π(i,j) ) *of* π(i,j) *under* τ *is* PrA,τ (π(i,j) ) := <sup>n</sup> <sup>k</sup>=1 P π(i,j) <sup>k</sup> , τ (π(i,j) <sup>k</sup> ), π(i,j) <sup>k</sup>+1 . *The* reward RwA(π(i,j)) *along the path* π(i,j) *is the sum of the position rewards, that is,* RwA(π(i,j)) := - <sup>k</sup>∈[n] <sup>R</sup>(π(i,j) <sup>k</sup> )*.*

Our target problem on open MDPs is to compute the *expected cumulative reward* collected in a passage from a specified entrance i to a specified exit j. This is defined below, together with reachability probability, in the usual manner.

**Definition 2.4 (reachability probability and expected (cumulative) reward of open MDPs).** *Let* A *be an open MDP and* τ *be a scheduler, as in Definition 2.2. Let* i *be an entrance and* j *be an exit.*

*The* reachability probability RPrA,τ (i, j) *from* <sup>i</sup> *to* <sup>j</sup>*, in* <sup>A</sup> *under* <sup>τ</sup> *, is defined by* RPrA,τ (i, j) := - <sup>π</sup>(*i,j*)∈PathA(i,j) PrA,τ (π(i,j) )*.*

*The* expected (cumulative) reward ERwA,τ (i, j) from i to j*, in* <sup>A</sup> *under* <sup>τ</sup> *, is defined by* ERwA,τ (i, j) := - <sup>π</sup>(*i,j*)∈PathA(i,j) PrA,τ (π(i,j)) · RwA(π(i,j) )*. Note that the infinite sum here always converges to a finite value; this is because there are only finitely many positions in* A*. See e.g. [1].*

*Remark 2.2.* In standard definitions such as Definition 2.4, it is common to either 1) assume RPrA,τ (i, j) = 1 for technical convenience [11], or 2) allow RPrA,τ (i, j) <sup>&</sup>lt; 1, but in that case define ERwA,τ (i, j) := <sup>∞</sup> [1]. These definitions are not suited for our purpose (and for compositional model checking in general), since we take into account multiple exits, to each of which the reachability probability is typically < 1, and we need non-∞ expected rewards over those exits for compositionality. Note that our definition of expected reward is not conditional (unlike [1, Rem. 10.74]): when the reachability probability from i to j is small, it makes the expected reward small as well. Our notion of expected reward can be thought of as a "weighted sum" of rewards.

#### **2.3 Rightward Open MDPs and Traced Monoidal String Diagrams**

Following the outline (Sect. 2.1), in this section we focus on (unidirectional) *rightward* open MDPs and introduce the "algebra" **roMDP** of them. The operations ;, ⊕,tr of *traced symmetric monoidal categories (TSMCs)* compose rightward open MDPs in string diagrams.

$$\begin{array}{ccc} \left(\mathcal{A}: l+m \to l+n\right) & & \longmapsto & \mathop{\begin{subarray}{c}} \left(\mathop{\mathrm{Fun}}\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\nolimits\rm\end{array}$$

**Fig. 4.** The trace operator.

**Definition 2.5 (rightward open MDP (roMDP)).** *An open MDP* A = (m, n, Q, A, E, P, R) *is* rightward *if all its entrances are on the left and all its exits are on the right, that is,* m = (m**r**, 0**l**) *and* n = (n**r**, 0**l**) *for some* m**<sup>r</sup>** *and* n**r***. We write* A = (m**r**, n**r**, Q, A, E, P, R)*, dropping* 0 *from the arities.*

*We say that a rightward open MDP* A *is* from m to n*, writing* A : m → n*, if it is from* (m, 0) *to* (n, 0) *as an open MDP.*

We use an equivalence relation by *roMDP isomorphism* so that roMDPs satisfy TSMC axioms given in Sect. 2.4. See [21, Appendix A] for details.

We move on to introduce algebraic operations for composing rightward open MDPs. Two of them, namely *sequential composition* ; and *sum* ⊕, look like Fig. 2 except that all wires are rightward. The other major operation is the *trace operator* tr that realizes (unidirectional) loops, as illustrated in Fig. 4.

**Definition 2.6 (sequential composition** ; **of roMDPs).** *Let* A: m → k *and* B: k → n *be rightward open MDPs with the same action set* A *and with matching arities. Their* sequential composition <sup>A</sup>;B: <sup>m</sup> <sup>→</sup> <sup>n</sup> *is given by* <sup>A</sup>;<sup>B</sup> := m, n, Q<sup>A</sup> + QB, A, EA;B, P <sup>A</sup>;B, [RA, RB] *, where*


$$\begin{aligned} &P^{\mathcal{A};\mathcal{B}}(s^{\mathcal{A}},a,s'):=\begin{cases}P^{\mathcal{A}}(s^{\mathcal{A}},a,s') & ifs'\in Q^{\mathcal{A}},\\\sum\_{i\in\left[k\right]}P^{\mathcal{A}}(s^{\mathcal{A}},a,i)\cdot\delta\_{E^{\mathcal{B}}\{i\}=s'} & otherwise \ (i.e.\ s'\in Q^{\mathcal{B}}+[n]),\end{cases}\\ &P^{\mathcal{A};\mathcal{B}}(s^{\mathcal{B}},a,s'):=\begin{cases}P^{\mathcal{B}}(s^{\mathcal{B}},a,s') & ifs'\in Q^{\mathcal{B}}+[n],\\0 & otherwise,\end{cases}\end{aligned}$$

*where* δ *is a characteristic function (returning* 1 *if the condition is true); – and* [RA, RB]: <sup>Q</sup><sup>A</sup> <sup>+</sup> <sup>Q</sup><sup>B</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup> *combines* <sup>R</sup>A, R<sup>B</sup> *by case distinction.*

Defining sum ⊕ of roMDPs is straightforward, following Fig. 2. See [21, Appendix A] for details.

The trace operator tr is primitive in the TSMC **roMDP**; it is crucial in defining bidirectional sequential composition shown in Fig. 2 (cf. Definition 2.9).

**Definition 2.7 (the trace operator** trl;m,n **over roMDPs).** *Let* A : l+m → l + n *be a rightward open MDP. The* trace trl;m,n(A) : m → n *of* A *with respect to* l *is the roMDP* trl;m,n(A) := m, n, QA, A, E, P, RA *(cf. Fig. 4), where*


$$P(q,a,q') \coloneqq \begin{cases} P^{\mathcal{A}}(q,a,q'+l) + \sum\_{i \in \text{prec}(q'+l)} P^{\mathcal{A}}(q,a,i) \text{ if } q' \in [n],\\ P^{\mathcal{A}}(q,a,q') + \sum\_{i \in \text{prec}(q')} P^{\mathcal{A}}(q,a,i) \text{ otherwise}, i.e. \text{ if } q' \in Q^{\mathcal{A}}.\end{cases}$$

*Here* Q<sup>A</sup> *and* [l] *are assumed to be disjoint without loss of generality.*

*Remark 2.3.* In string diagrams, it is common to annotate a wire with its type, such as <sup>n</sup> −→ for id<sup>n</sup> : n → n. It is also common to separate a wire for a sum type into wires of its component types, such as below on the left. Therefore the two diagrams below on the right designate the same mathematical entity. Note that, on its right-hand side, the type annotation 1 to each wire is omitted.

#### **2.4 TSMC Equations Between roMDPs**

Here we show that the three operations ;, ⊕,tr on roMDPs satisfy the equational axioms of TSMCs [10], shown in Fig. 5. These equational axioms are not directly needed for compositional model checking. We nevertheless study them because 1) they validate some natural bookkeeping equivalences of roMDPs needed for their efficient handling, and 2) they act as a sanity check of the mathematical authenticity of our compositional framework. For example, the handling of open ends is subtle in Sect. 2.3—e.g. whether they should be positions or not—and the TSMC equational axioms led us to our current definitions.

The TSMC axioms use some "positionless" roMDPs as wires, such as *identities* <sup>I</sup><sup>m</sup> ( <sup>m</sup> —— in string diagrams) and *swaps* Sm,n (×). See [21, Appendix A] for details. The proof of the following is routine. For details, see [21, Appendix B].

**Theorem 2.1.** *The three operations* ;, ⊕,tr *on roMDPs, defined in Sect. 2.3, satisfy the equational axioms in Fig. 5 up-to isomorphisms (see [21, Appendix A] for details).*

**Corollary 2.1 (a TSMC roMDP).** *Let* **roMDP** *be the category whose objects are natural numbers and whose arrows are roMDPs over the action set* A *modulo isomorphisms. Then the operations* ;, ⊕,tr, I, S *make* **roMDP** *a traced symmetric monoidal category (TSMC).*

**Fig. 5.** The equational axioms of TSMCs, expressed for roMDPs, with some string diagram illustrations. Here we omit types of roMDPs; see [10] for details.

#### **2.5 Open MDPs and "Compact Closed" String Diagrams**

Following the outline in Sect. 2.1, we now introduce a bidirectional "compact closed" calculus of open MDPs (oMDPs), using the Int construction [10] that turns TSMCs in general into compact closed categories (compCCs).

The following definition simply says **oMDP** := Int(**roMDP**), although it uses concrete terms adapted to the current context.

**Definition 2.8 (the category oMDP).** *The* category **oMDP** of open MDPs *is defined as follows. Its objects are pairs* (m**r**, m**l**) *of natural numbers. Its arrows are defined by rightward open MDPs as follows:*

$$\begin{array}{c} \text{an } arrow \ (m\_{\mathbf{r}}, m\_{\mathbf{l}}) \longrightarrow (n\_{\mathbf{r}}, n\_{\mathbf{l}}) \ in \ \mathbf{o} \mathbf{MDP} \\ \hline \text{an } arrow \ \mathcal{A} \colon m\_{\mathbf{r}} + n\_{\mathbf{l}} \longrightarrow n\_{\mathbf{r}} + m\_{\mathbf{l}} \ in \ \mathbf{ro} \mathbf{MDP}, \ i.e. \ an \ roMDP \\ \text{or} \ \cdots \ \cdots \ \cdots \ \cdots \ \cdots \end{array} \tag{3}$$

*where the double lines* == *mean "is the same thing as."*

The definition may not immediately justify its name: no open MDPs appear there; only roMDPs do. The point is that we identify the roMDP A in (3) with the oMDP Ψ(A) of the designated type, using "twists" in Fig. 6. See [21, Appendix A] for details.

We move on to describe algebraic operations for composing oMDPs. These operations come from the structure of **oMDP** as a compCC; the latter, in turn, arises canonically from the Int construction.

**Definition 2.9 (**; **of oMDPs).** *Let* A : (m**r**, m**l**) → (l**r**, l**l**) *and* B : (l**r**, l**l**) → (n**r**, n**l**) *be arrows in* **oMDP** *with the same action set* A*. Their* sequential composition A ; B : (m**r**, m**l**) → (n**r**, n**l**) *is defined by the string diagram in Fig. 7,*

**Fig. 6.** Turning oMDPs to roMDPs, and vice versa, via twists.

**Fig. 7.** String diagrams in **roMDP** for <sup>A</sup> ; <sup>B</sup>, A⊕B in **oMDP**.

*formulated in* **roMDP***. Textually the definition is* A ; B := tr<sup>l</sup>**l**;m**r**+n**<sup>l</sup>** ,n**r**+m**<sup>l</sup>** (S<sup>l</sup>**l**,m**<sup>r</sup>** ⊕ I<sup>n</sup>**<sup>l</sup>** );(A⊕I<sup>n</sup>**<sup>l</sup>** );(I<sup>l</sup>**<sup>r</sup>** ⊕ S<sup>m</sup>**l**,n**<sup>l</sup>** );(B⊕I<sup>m</sup>**<sup>l</sup>** );(S<sup>n</sup>**r**,l**<sup>l</sup>** ⊕ I<sup>m</sup>**<sup>l</sup>** ) *.*

The definition of *sum* ⊕ of oMDPs is similarly shown in the string diagram in Fig. 7, formulated in **roMDP**. Definition of "wires" such as identities, swaps, *units* (⊂ in string diagrams) and *counits* (⊃) is easy, too.

**Theorem 2.2 (oMDP is a compCC).** *The category* **oMDP** *(Definition 2.8), equipped with the operations* ;, ⊕*, is a compCC.*

#### **3 Decomposition Equalities for Open Markov Chains**

Here we exhibit some basic equalities that decompose the behavior of (rightward open) Markov chains. We start with such equalities on *reachability probabilities* (which are widely known) and extend them to equalities on *expected rewards* (which seem less known). Notably, the latter equalities involve not only expected rewards but also reachability probabilities.

Here we focus on *rightward open Markov chains (roMCs)*, since the extension to richer settings is taken care of by categorical constructions. See Fig. 3.

**Definition 3.1 (roMC).** *A* rightward open Markov chain (roMC) C *from* m *to* n *is an roMDP from* m *to* n *over the singleton action set* {}*.*

*For an roMC* C*, its* reachability probability RPrC(i, j) *and* expected reward ERwC(i, j) *are defined as in Definition 2.4. The scheduler* τ *is omitted since it is unique.*

*Rightward open MCs, as a special case of roMDPs, form a TSMC (Corollary 2.1). It is denoted by* **roMC***.*

The following equalities are well-known, although they are not stated in terms of open MCs. Recall that RPrC(i, k) is the probability of reaching the exit k from the entrance i in C (Definition 2.4). Recall also the definitions of C ; D (Definition 2.6) and trl;m,n(E) (Definition 2.7), which are essentially as in Fig. 2 and Fig. 4.

**Proposition 3.1 (decomposition equalities for** RPr**).** *Let* C : m → l*,* D : l → n *and* E : l + m → l + n *be roMCs. The following matrix equalities hold.*

$$\begin{aligned} \left[\mathrm{RPr}^{\mathcal{C};\mathcal{D}}(i,j)\right]\_{i \in [m], j \in [n]} &= \left[\mathrm{RPr}^{\mathcal{C}}(i,k)\right]\_{i \in [m], k \in [l]} \cdot \left[\mathrm{RPr}^{\mathcal{D}}(k,j)\right]\_{k \in [l], j \in [n]}, \quad \text{(4)}\\ \left[\mathrm{RPr}^{\mathrm{tr}\_{l;m,n}(\mathcal{E})}(i,j)\right]\_{i \in [m], j \in [n]} &= \left[\mathrm{RPr}^{\mathcal{E}}(l+i,l+j)\right]\_{i \in [m], j \in [n]} + \sum\_{d \in \mathbb{N}} A \cdot B^{d} \cdot C. \end{aligned} \tag{5}$$

(5) *Here* RPrC;D(i, j) <sup>i</sup>∈[m],j∈[n] *denotes the* <sup>m</sup> <sup>×</sup> <sup>n</sup> *matrix with the designated components; other matrices are similar. The matrices* A, B, C *are given by* A := RPr<sup>E</sup> (l + i, k) i∈[m],k∈[l] *,* B := RPr<sup>E</sup> (k, k ) k∈[l],k-∈[l] *, and* C := RPr<sup>E</sup> (k , l + j) k-∈[l],j∈[n] *. In the last line, note that the matrix in the middle is the* d*-th power.*

The first equality is easy, distinguishing cases on the intermediate open end k (mutually exclusive since MCs are rightward). The second says

$$\left[\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{T}}}\mathop{\mathsf{T}}\_{\mathop{\mathsf{S}}}\right]\_{\mathop{\mathsf{T}}} = \sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{P}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{T}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\sideset{}{^{\mathsf{P}}}{\mathop{\mathsf{S}}}\_{\mathop{\mathsf{S}}}\s}$$

which is intuitive. Here, the small circles in the diagram correspond to dead ends. It is known as *Girard's execution formula* [7] in linear logic.

We now extend Prop. 3.1 to expected rewards ERwC(i, j).

**Proposition 3.2 (decomposition eq. for** ERw**).** *Let* C : m → l*,* D : l → n *and* E : l + m → l + n *be roMCs. The following equalities of matrices hold.*

$$\begin{array}{c} \left[ \text{ERw}^{\mathcal{C};\mathcal{D}}(i,j) \right]\_{i \in [m], j \in [n]} = \left[ \text{RPr}^{\mathcal{C}}(i,k) \right]\_{i \in [m], k \in [l]} \cdot \left[ \text{ERw}^{\mathcal{D}}(k,j) \right]\_{k \in [l], j \in [n]} \\ \quad + \left[ \text{ERw}^{\mathcal{C}}(i,k) \right]\_{i \in [m], k \in [l]} \cdot \left[ \text{RPr}^{\mathcal{D}}(k,j) \right]\_{k \in [l], j \in [n]}, \end{array} \tag{6}$$
 
$$\left[ \text{ERw}^{\text{tr}\_{l;m,n}(\mathcal{E})}(i,j) \right]\_{i \in [m], j \in [n]} = \left[ \text{ERw}^{\mathcal{E}}(l+i,l+j) \right]\_{i \in [m], j \in [n]} + \sum\_{d \in \mathbb{N}} A \cdot B^{d} \cdot C. \tag{7}$$

*Here* A, B, C *are the following* m × 2*l* ×2*l* ×2*l* ×2*l* ×n *matrices.*

$$\begin{array}{l} A = \left( \left[ \operatorname{RPr}^{\mathcal{E}}(l+i,k) \right]\_{i \in [m], k \in [l]} \left[ \operatorname{ERw}^{\mathcal{E}}(l+i,k) \right]\_{i \in [m], k \in [l]} \right), \\ B = \left( \left[ \operatorname{RPr}^{\mathcal{E}}(k,k') \right]\_{k \in [l], k' \in [l]} \left[ \operatorname{ERw}^{\mathcal{E}}(k,k') \right]\_{k \in [l], k' \in [l]} \right), \\ C = \left( \left[ \operatorname{ERw}^{\mathcal{E}}(k',l+j) \right]\_{k' \in [l], j \in [n]} \right), \\ C = \left( \left[ \operatorname{RPr}^{\mathcal{E}}(k',l+j) \right]\_{k' \in [l], j \in [n]} \right). \end{array}$$

Proposition 3.2 seems new, although proving them is not hard once the statements are given (see [21, Appendix C] for details). They enable one to compute the expected rewards of composite roMCs C ; D and trl;m,nE from those of component roMCs C, D, E. They also signify the role of reachability probabilities in

(7)

such computation, suggesting their use in the definition of semantic categories (cf. granularity of semantics in Sect. 1).

The last equalities in Propositions 3.1 and 3.2 involve infinite sums - <sup>d</sup>∈N, and one may wonder how to compute them. A key is their characterization as *least fixed points* via the Kleene theorem: the desired quantity on the left side (RPr or ERw) is a solution of a suitable linear equation; see Proposition 3.3. With the given definitions, the proof of Propositions 3.1 and 3.2 is (lengthy but) routine work (see e.g. [1, Thm. 10.15]).

**Proposition 3.3 (linear equation characterization for** (5) **and** (7)**).** *Let* E : l + m → l + n *be an roMC, and* k ∈ [l + 1, l + n] *be a specified exit of* E*. Consider the following linear equation on an unknown vector* [xi]<sup>i</sup>∈[l+m]*:*

$$\left[\left[x\_{i}\right]\_{i\in[l+m]} = \left[\mathrm{RPr}^{\mathcal{E}}(i,k)\right]\_{i\in[l+m]} + \left[\mathrm{RPr}^{\mathcal{E}}(i,j)\right]\_{i\in[l+m],j\in[l]} \cdot \left[x\_{j}\right]\_{j\in[l]}.\tag{8}$$

*Consider the least solution* [˜xi]<sup>i</sup>∈[l+m] *of the equation. Then its part* [˜xi+l]<sup>i</sup>∈[m] *is given by the vector* RPrtr*l*;*m,n*(E) (i, k − l) <sup>i</sup>∈[m] *of suitable reachability probabilities.*

*Moreover, consider the following linear equation on an unknown* [yi]<sup>i</sup>∈[l+m]*:*

$$\begin{split} \mathbb{E}\left[\boldsymbol{y}\_{i}\right]\_{i \in [l+m]} &= \left[\text{ERw}^{\mathcal{E}}(i,k)\right]\_{i \in [l+m]} + \left[\text{ERw}^{\mathcal{E}}(i,j)\right]\_{i \in [l+m], j \in [l]} \cdot \left[\boldsymbol{x}\_{j}\right]\_{j \in [l]} \\ &+ \left[\text{RPr}^{\mathcal{E}}(i,j)\right]\_{i \in [l+m], j \in [l]} \cdot \left[\boldsymbol{y}\_{j}\right]\_{j \in [l]}, \end{split} \tag{9}$$

*where the unknown* [x<sup>j</sup> ]<sup>j</sup>∈[l] *is shared with* (8)*. Consider the least solution* [˜yi]<sup>i</sup>∈[l+m] *of the equation. Then its part* [˜yi+l]<sup>i</sup>∈[m] *is given by the vector of suitable expected rewards, that is,* [˜yi+l]<sup>i</sup>∈[m] = ERwtr*l*;*m,n*(E) (i, k − l) i∈[m] *.*

We can modify the linear Eqs. (8,9)—removing unreachable positions, specifically—so that they have unique solutions without changing the least ones. One can then solve these linear equations to compute the reachabilities and expected rewards in (5,7). This is a well-known technique for computing reachability probabilities [1, Thm. 10.19]; it is not hard to confirm the correctness of our current extension to expected rewards.

#### **4 Semantic Categories and Solution Functors**

We build on the decomposition equalities (Proposition 3.2) and define the semantic category S for compositional model checking. This is the main construct in our framework. Our definitions proceed in three steps, from roMCs to roMDPs to oMDPs (Fig. 3). The gaps between them are filled in using general constructions from category theory.

#### **4.1 Semantic Category for Rightward Open MCs**

We first define the semantic category SMC **<sup>r</sup>** for roMCs (Fig. 3, bottom right). **Definition 4.1 (objects and arrows of** SMC **<sup>r</sup> ).** *The* category SMC **<sup>r</sup>** *has natural numbers* m *as objects. Its arrow* f : m → n *is given by an assignment, for each pair* (i, j) *of* i ∈ [m] *and* j ∈ [n]*, of a pair* (pi,j , ri,j ) *of nonnegative real numbers. There pairs* (pi,j , ri,j ) *are subject to the following conditions.*


An illustration is in Fig. 8. For an object m, each i ∈ [m] is identified with an open end, much like in **roMC** and **roMDP**. For an arrow f : m → n, the pair f(i, j)=(pi,j , ri,j ) encodes a reachability probability and an expected reward, from an open end i to j; together they represent a possible roMC behavior.

We go on to define the algebraic operations of SMC **r** as a TSMC. While there is a categorical description of

**Fig. 8.** An arrow <sup>f</sup> : 2 <sup>→</sup> 2 in <sup>S</sup>MC **<sup>r</sup>** .

SMC **<sup>r</sup>** using a *monad* [16], we prefer a concrete definition here. See [21, Appendix D] for the categorical definition of SMC **<sup>r</sup>** .

**Definition 4.2 (sequential composition** ; **of** SMC **<sup>r</sup> ).** *Let* f : m → l *and* g : l → n *be arrows in* SMC **<sup>r</sup>** *. Their* sequential composition f ; g : m → n *of* f *and* g *is defined as follows: letting* f(i, j)=(p<sup>f</sup> i,j , r<sup>f</sup> i,j ) *and* <sup>g</sup>(i, j)=(p<sup>g</sup> i,j , r<sup>g</sup> i,j )*, then* f ; g(i) := (p<sup>f</sup>;<sup>g</sup> i,j , r<sup>f</sup>;<sup>g</sup> i,j )<sup>j</sup>∈[n] *is given by*

$$\begin{aligned} \left[ \left[ p\_{i,j}^{f,g} \right]\_{i \in [m], j \in [n]} = \left[ \left[ p\_{i,k}^f \right]\_{i \in [m], k \in [l]} \cdot \left[ \left[ p\_{k,j}^g \right]\_{k \in [l], j \in [n]} \right], \\ \left[ r\_{i,j}^{f,g} \right]\_{i \in [m], j \in [n]} = \left[ \left[ p\_{i,k}^f \right]\_{i \in [m], k \in [l]} \cdot \left[ r\_{k,j}^g \right]\_{k \in [l], j \in [n]} + \left[ r\_{i,k}^f \right]\_{i \in [m], k \in [l]} \cdot \left[ p\_{k,j}^g \right]\_{k \in [l], j \in [n]} \right]. \end{aligned} \right]$$

The sum <sup>⊕</sup> and the trace operator tr of <sup>S</sup>MC **<sup>r</sup>** are defined similarly. To define and prove axioms of the trace operator (Fig. 5), we exploit the categorical theory of *strong unique decomposition categories* [9]. See [21, Appendix D].

#### **Definition 4.3 (**SMC **<sup>r</sup> as a TSMC).** SMC **<sup>r</sup>** *is a TSMC, with its operations* ;, ⊕,tr*.*

Once we expand the above definitions to concrete terms, it is evident that they mirror the decomposition equalities. Indeed, the sequential composition ; mirrors the first equalities in Propositions 3.1 and 3.2. The same holds for the trace operator, too. Therefore, one can think of the above categorical development in Definition 4.2 and Definition 4.3 as a structured *lifting* of the (local) equalities in Propositions 3.1 and 3.2 to the (global) categorical structures, as shown in Fig. 3.

Once we found the semantic domain SMC **<sup>r</sup>** , the following definition is easy.

**Definition 4.4 (**SMC **<sup>r</sup> ).** *The* solution functor <sup>S</sup>MC **<sup>r</sup>** : **roMC** <sup>→</sup> <sup>S</sup>MC **<sup>r</sup>** *is defined as follows. It carries an object* m *(a natural number) to the same* m*; it carries an arrow* <sup>C</sup> : <sup>m</sup> <sup>→</sup> <sup>n</sup> *in* **roMC** *to the arrow* <sup>S</sup>MC **<sup>r</sup>** (C): <sup>m</sup> <sup>→</sup> <sup>n</sup> *in* <sup>S</sup>MC **<sup>r</sup>** *, defined by*

$$\mathcal{S}^{\rm MC}\_{\rm r}(\mathcal{C})(i,j) := \left(\mathrm{RPr}^{\mathcal{C}}(i,j), \mathrm{ERw}^{\mathcal{C}}(i,j)\right),\tag{10}$$

*using reachability probabilities and expected rewards (Definition 2.4).*

**Theorem 4.1 (**SMC **<sup>r</sup> is compositional).** *The correspondence* <sup>S</sup>MC **<sup>r</sup>** *, defined in* (10)*, is a traced symmetric monoidal functor. That is,* <sup>S</sup>MC **<sup>r</sup>** (<sup>C</sup> ; <sup>D</sup>) = <sup>S</sup>MC **<sup>r</sup>** (C) ; <sup>S</sup>MC **<sup>r</sup>** (D)*,* <sup>S</sup>MC **<sup>r</sup>** (C⊕D) = <sup>S</sup>MC **<sup>r</sup>** (C)⊕SMC **<sup>r</sup>** (D)*, and* <sup>S</sup>MC **<sup>r</sup>** (tr(E)) = tr(SMC **<sup>r</sup>** (E))*. Here* ;, ⊕,tr *on the left are from Sect. 2.3; those on the right are from Definition 4.3.*

#### **4.2 Semantic Category of Rightward Open MDPs**

We extend the theory in Sect. 4.1 from MCs to MDPs (Fig. 3). In particular, on the semantics side, we have to bundle up all possible behaviors of an MDP under different schedulers. We find that this is done systematically by *change of base* [5,6]. We use the following notation for fixing scheduler τ .

**Definition 4.5 (roMC** MC(A, τ ) **induced by** A, τ **).** *Let* A : m → n *be a rightward open MDP and* τ : Q<sup>A</sup> → A *be a memoryless scheduler. The* rightward open MC MC(A, τ ) induced by <sup>A</sup> and <sup>τ</sup> *is* (m, n, QA, {}, E<sup>A</sup>, P MC(A,τ) , RA)*, where for each* <sup>s</sup> <sup>∈</sup> <sup>Q</sup> *and* <sup>t</sup> <sup>∈</sup> ([n**<sup>r</sup>** <sup>+</sup>m**l**] +Q)*,* <sup>P</sup> MC(A,τ) (s, , t) := P <sup>A</sup>(s, τ (s), t)*.*

Much like in Sect. 4.1, we first describe the semantic category S**<sup>r</sup>** in concrete terms. We later use the categorical machinery to define its algebraic structure.

**Definition 4.6 (objects and arrows of** S**r).** *The* category S**<sup>r</sup>** *has natural numbers* m *as objects. Its arrow* F : m → n *is given by a set* {f<sup>i</sup> : m → n *in*SMC **<sup>r</sup>** }<sup>i</sup>∈<sup>I</sup> *of arrows of the same type in* <sup>S</sup>MC **<sup>r</sup>** *(*I *is an arbitrary index set).*

The above definition of arrows—collecting arrows in SMC **<sup>r</sup>** , each of which corresponds to the behavior of MC(A, τ ) for each τ—follows from the change of base construction (specifically with the powerset functor P on the category **Set** of sets). Its general theory gives sequential composition ; for free (concretely described in Definition 4.7), together with equational axioms. See [21, Appendix D]. Sum ⊕ and trace tr are not covered by general theory, but we can define them analogously to ; in the current setting. Thus, for ⊕ and tr as well, we are using change of base as an inspiration.

Here is a concrete description of algebraic operations. It applies the corresponding operation of SMC **<sup>r</sup>** in the elementwise manner.

**Definition 4.7 (**;, <sup>⊕</sup>,tr **in** <sup>S</sup>**r).** *Let* <sup>F</sup> : <sup>m</sup> <sup>→</sup> <sup>l</sup>*,* <sup>G</sup> : <sup>l</sup> <sup>→</sup> <sup>n</sup>*,* <sup>H</sup> : <sup>l</sup> <sup>+</sup> <sup>m</sup> <sup>→</sup> <sup>l</sup> <sup>+</sup> <sup>n</sup> *be arrows in* S**r***. Their* sequential composition F ; G *of* F *and* G *is given by* F ; G := {f ; g | f ∈ F, g ∈ G} *where* f ; g *is the sequential composition of* f *and* g *in* SMC **<sup>r</sup>** *. The* trace trl;m,n(H) : m → n *of* H *with respect to* l *is given by* trl;m,n(H) := {trl;m,n(h) | h ∈ H} *where* trl;m,n(h) *is the trace of* h *with respect to* l *in* SMC **<sup>r</sup>** *.*

Sum <sup>⊕</sup> *in* <sup>S</sup>**<sup>r</sup>** *is defined analogously, applying the operation in* <sup>S</sup>MC **<sup>r</sup>** *elementwise. See [21, Appendix A] for details.*

**Theorem 4.2.** <sup>S</sup>**<sup>r</sup>** *is a TSMC.*

We now define a solution functor and prove its compositionality.

**Definition 4.8 (**S**r).** *The* solution functor <sup>S</sup>**<sup>r</sup>** : **roMDP** <sup>→</sup> <sup>S</sup>**<sup>r</sup>** *is defined as follows. It carries an object* <sup>m</sup> <sup>∈</sup> <sup>N</sup> *to* <sup>m</sup>*, and an arrow* <sup>A</sup>: <sup>m</sup> <sup>→</sup> <sup>n</sup> *in* **roMDP** *to* <sup>S</sup>**r**(A): <sup>m</sup> <sup>→</sup> <sup>n</sup> *in* <sup>S</sup>**r***. The latter is defined in the following elementwise manner, using* <sup>S</sup>MC **<sup>r</sup>** *in Definition 4.4.*

$$\mathcal{S}\_{\mathbf{r}}(\mathcal{A}) := \left\{ \mathcal{S}\_{\mathbf{r}}^{\mathrm{MC}}(\mathrm{MC}(\mathcal{A}, \boldsymbol{\tau})) \, \middle| \, \boldsymbol{\tau} : \, \boldsymbol{Q}^{\mathcal{A}} \to \mathrm{A} \text{ a (memoryless) } \mathrm{sechuder} \right\}. \tag{11}$$

**Theorem 4.3 (compositionality).** *The correspondence* <sup>S</sup>**<sup>r</sup>** : **roMDP** <sup>→</sup> <sup>S</sup>**<sup>r</sup>** *is a traced symmetric monoidal functor, preserving* ;, ⊕,tr *as in Thm. 4.1.*

*Remark 4.1 (memoryless schedulers).* Our restriction to memoryless schedulers (cf. Definition 2.2) plays a crucial role in the proof of Theorem 4.3, specifically for the trace operator (i.e. loops, cf. Fig. 4). Intuitively, a *memoryful* scheduler for a loop may act differently in different iterations. Its technical consequence is that the elementwise definition of tr, as in Definition 4.7, no longer works for memoryful schedulers.

#### **4.3 Semantic Category of MDPs**

Finally, we extend from (unidirectional) roMDPs to (bidirectional) oMDPs (i.e. from the second to the first row in Fig. 3). The system-side construction is already presented in Sect. 2.5; the semantical side, described here, follows the same Int construction [10]. The common intuition is that of twists, see Fig. 6.

**Definition 4.9 (the semantic category** S**).** *We define* S = Int(S**r**)*. Concretely, its objects are pairs* (m**r**, m**l**) *of natural numbers. Its arrows are given by arrows of* S**<sup>r</sup>** *as follows:*

$$\frac{\begin{array}{c} \text{an } \operatorname{arrow} F \colon (m\_{\mathbf{r}}, m\_{\mathbf{l}}) \longrightarrow (n\_{\mathbf{r}}, n\_{\mathbf{l}}) \text{ in } \mathbb{S} \end{array}}{\begin{array}{c} \text{an } \operatorname{arrow} F \colon m\_{\mathbf{r}} + n\_{\mathbf{l}} \longrightarrow n\_{\mathbf{r}} + m\_{\mathbf{l}} \text{ in } \mathbb{S} \end{array}} \tag{12}$$

*By general properties of* Int*,* S *is a compact closed category (compCC).*

The Int construction applies not only to categories but also to functors.

**Definition 4.10 (**S**).** *The* solution functor <sup>S</sup> : **oMDP** <sup>→</sup> <sup>S</sup> *is defined by* <sup>S</sup> <sup>=</sup> Int(S**r**)*.*

The following is our main theorem.

**Theorem 4.4 (the solution** S **is compositional).** *The solution functor* S : **oMDP** <sup>→</sup> <sup>S</sup> *is a compact closed functor, preserving operations* ;, <sup>⊕</sup> *as in*

$$\mathcal{S}(\mathcal{A}; \mathcal{B}) = \mathcal{S}(\mathcal{A}) \; ; \mathcal{S}(\mathcal{B}), \quad \mathcal{S}(\mathcal{A} \oplus \mathcal{B}) = \mathcal{S}(\mathcal{A}) \oplus \mathcal{S}(\mathcal{B}). \qquad \Box$$

We can easily confirm, from Definitions 4.4 and 4.8, that S computes the solution we want. Given an open MDP A, an entrance i and an exit j, S returns the set

$$\left\{ \left( \text{RPr}^{\text{MC}(\mathcal{A},\tau)}(i,j), \text{ERw}^{\text{MC}(\mathcal{A},\tau)}(i,j) \right) \Big| \tau \text{ is a memoryless schedule} \right\} \tag{13}$$

of pairs of a reachability probability and expected reward, under different schedulers, in a passage from i to j.

*Remark 4.2 (synthesizing an optimal scheduler).* The compositional solution functor S abstracts away schedulers and only records their results (see (13) where τ is not recorded). At the implementation level, we can explicitly record schedulers so that our compositional algorithm also synthesizes an optimal scheduler. We do not do so here for theoretical simplicity.

#### **5 Implementation and Experiments**

*Meager Semantics.* Since our problem is to compute optimal expected rewards, in our compositional algorithm, we can ignore those intermediate results which are *totally subsumed* by other results (i.e. those which come from clearly suboptimal schedulers). This notion of *subsumption* is formalized as an order ≤ between parallel arrows in SMC **<sup>r</sup>** (cf. Definition 4.1): (pi,j , ri,j )i,j ≤ (p i,j , r i,j )i,j if pi,j ≤ p i,j and ri,j ≤ r i,j for each i, j. Our implementation works with this *meager semantics* for better performance; specifically, it removes elements of S**r**(A) in (11) that are subsumed by others. It is possible to formulate this meager semantics as categories and functors, compare it with the semantics in Sect. 4, and prove its correctness. We defer it to another venue for lack of space.

*Implementation.* We implemented the compositional solution functor S : **oMDP** <sup>→</sup> <sup>S</sup>, using the meager semantics as discussed. This prototype implementation is in Python and called CompMDP.

CompMDP takes a string diagram A of open MDPs as input; they are expressed in a textual format that uses operations ;, ⊕ (such as the textual expression in Definition 2.9). Note that we are abusing notations here, identifying a string diagram of oMDPs and the composite oMDP A denoted by it.

Given such input A, CompMDP returns the arrow S(A), which is concretely given by pairs of a reachability probability and expected reward shown in (13) (we have suboptimal pairs removed, as discussed above). Since different pairs correspond to different schedulers, we choose a pair in which the expected reward is the greatest. This way we answer the optimal expected reward problem.

*Freezing.* In the input format of CompMDP, we have an additional *freeze* operator: any expression inside it is considered monolithic, and thus CompMDP does not solve it compositionally. Those frozen oMDPs—i.e., those expressed by frozen expressions—are solved by PRISM [13] in our implementation.

Freezing allows us to choose how deep—in the sense of the nesting of string diagrams—we go compositional. For example, when a component oMDP A<sup>0</sup> is small but has many loops, fully compositional model checking of A<sup>0</sup> can be more expensive than (monolithic) PRISM. Freezing is useful in such situations.

We have found experimentally that the degree of freezing often should not be extremal (i.e. none or all). The optimal degree, which should be thus somewhere intermediate, is not known a priori.

However, there are not too many options (the number of layers in compositional model description), and freezing a half is recommended, both from our experience and for the purpose of binary search.

We require that a frozen oMDP should have a unique exit. Otherwise, an oMDP with a specified exit can have the reachability probability < 1, in which case PRISM returns ∞ as the expected reward. The last is different from our definition of expected reward (Remark 2.2).

*Research Questions.* We posed the following questions.


*Experiment Setting.* We conducted experiments on Apple 2.3 GHz Dual-Core Intel Core i5 with 16 GB of RAM. We designed three benchmarks, called Patrol, Wholesale, and Packets, as string diagrams of MDPs. Patrol is sketched in Fig. 1; it has layers of *tasks*, *rooms*, *floors*, *buildings* and a *neighborhood*.

Wholesale is similar to Patrol, with four layers (*item*, *dispatch*, *pipeline*, *wholesale*), but their transition structures are more complex: they have more loops, and more actions are enabled in each position, compared to Patrol. The lowest-level component MDP is much larger, too: an *item* in Wholesale has 5000 positions, while a *task* in Patrol has a unique position.

Packets has two layers: the lower layer models a transmission of 100 packets with probabilistic failure. The upper layer is a sequence of copies of 2–5 variations of the lower layer—in total, we have 50 copies—modeling 50 batches of packets

For Patrol and Wholesale, we conducted experiments with varying *degree of identification (DI)*; this can be seen as an ablation study. These benchmarks have identical copies of a component MDP in their string diagrams; high DI means that these copies are indeed expressed as multiple occurrences of the same variable, informing CompMDP to reuse the intermediate solution. As DI goes lower, we introduce new variables for these copies and let them look different to CompMDP. Specifically, we have twice as many variables for DI-mid, and three (Patrol) or four (Wholesale) times as many for DI-low, as for DI-high.

For Packets, we conducted experiments with different degrees of freezing (FZ). FZ-none indicates no freezing, where our compositional algorithm digs all the way down to individual positions as component MDPs. FZ-all freezes everything, which means we simply used PRISM (no compositionality). FZ-int. (*intermediate*) freezes the lower of the two layers. Note that this includes the performance comparison between CompMDP and PRISM (i.e. FZ-all).

For Patrol and Wholesale, we also compared the performance of CompMDP and PRISM using their simple variations Patrol5 and Wholesale5. We did not use other variations (Patrol/Wholesale1–4) since the translation of the models to the PRISM format blowed up.


**Table 1.** Experimental results.


<sup>|</sup>Q<sup>|</sup> is the number of positions; <sup>|</sup>E<sup>|</sup> is the number of transitions (only counting action branching, not probabilistic branching); execution time is the average of five runs, in sec.; timeout (TO) is

*Results and Discussion.* Table 1 summarizes the experiment results.

**RQ1.** A big advantage of compositional verification is that it can reuse intermediate results. This advantage is clearly observed in the ablation experiments with the benchmarks Patrol1–4 and Wholesale1–4: as the degree of reuse goes 1/2 and 1/3–1/4 (see above), the execution time grew inverse-proportionally. Moreover, with the benchmarks Packets1–4, Patrol5 and Wholesale5, we see that compositionality greatly improves performance, compared to PRISM (FZall). Overall, we can say that compositionality has clear performance advantages in probabilistic model checking.

1200 sec.

**RQ2.** The Packets experiments show that controlling the degree of compositionality is important. Packet's lower layer (frozen in FZ-int.) is a large and complex model, without a clear compositional structure; its fully compositional treatment turned out to be prohibitively expensive. The performance advantage of FZ-int. compared to PRISM (FZ-all) is encouraging. The Patrol5 and Wholesale5 experiments also show the advantage of compositionality.

**RQ3.** We find the absolute performance of CompMDP quite satisfactory. The Patrol and Wholesale benchmarks are huge models, with so many positions that fitting their explicit state representation in memory is already nontrivial. CompMDP, exploiting their succinct presentation by string diagrams, successfully model-checked them in realistic time (6–130 s with DI-high).

**RQ4.** The experiments suggest that string diagrams are a practical modeling formalism, allowing faster solutions of realistic benchmarks. It seems likely that the formalism is more suited for *task compositionality* (where components are sub-*tasks* and they are sequentially composed with possible fallbacks and loops) rather than *system compositionality* (where components are sub-*systems* and they are parallelly composed).

**RQ5.** It seems that the number of locally optimal schedulers is an important factor: if there are many of them, then we have to record more in the intermediate solutions of the meager semantics. This number typically increases when more actions are available, as the comparison between Patrol and Wholesale.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were

made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Efficient Sensitivity Analysis for Parametric Robust Markov Chains

Thom Badings1(B) , Sebastian Junges<sup>1</sup> , Ahmadreza Marandi<sup>2</sup> , Ufuk Topcu<sup>3</sup> , and Nils Jansen<sup>1</sup>

<sup>1</sup> Radboud University, Nijmegen, The Netherlands thom.badings@ru.nl <sup>2</sup> Eindhoven University of Technology, Eindhoven, The Netherlands <sup>3</sup> University of Texas at Austin, Austin, USA

Abstract. We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of *k* parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.

## 1 Introduction

Discrete-time Markov chains (MCs) are ubiquitous in stochastic systems modeling [8]. A classical assumption is that all probabilities of an MC are precisely known—an assumption that is difficult, if not impossible, to satisfy in practice [4]. Robust MCs (rMCs), or uncertain MCs, alleviate this assumption by using *sets of probability distributions*, e.g., intervals of probabilities in the simplest case [12,39]. A typical verification problem for rMCs is to compute upper or lower bounds on measures of interest, such as the expected cumulative reward, under *worst-case realizations* of these probabilities in the set of distributions [52,59]. Thus, verification results are *robust* against any selection of probabilities in these sets.

This research has been partially funded by NWO grant NWA.1160.18.238 (PrimaVera), the ERC Starting Grant 101077178 (DEUCE), and grants ONR N00014-21-1-2502 and AFOSR FA9550-22-1-0403.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 62–85, 2023. https://doi.org/10.1007/978-3-031-37709-9\_4

*Where to improve my model?* As a running example, consider a ground vehicle navigating toward a target location in an environment with different terrain types. On each terrain type, there is some probability that the vehicle will slip and fail to move. Assume that we obtain a sufficient number of *samples* to infer upper and lower bounds (i.e., intervals) on the slipping probability on each terrain. We use these probability intervals to model the grid world as an rMC. However, from the rMC, it is unclear how our model (and thus the measure of interest) will change if we obtain more samples. For instance, if we take one more sample for a particular terrain, some of the intervals of the rMC will change, but how can we expect the verification result to change? And if the verification result is unsatisfactory, for which terrain type should we obtain more samples?

*Parametric Robust MCs.* To reason about how additional samples will change our model and thus the verification result, we employ a sensitivity analysis [29]. To that end, we use parametric robust MCs (prMCs), which are rMCs whose sets of probability distributions are defined as a function of a set of *parameters* [26], e.g., intervals with parametric upper/lower bounds. With these functions over the parameters, we can describe dependencies between the model's states. The assignment of values to each of the parameters is called an *instantiation*. Applying an instantiation to a prMC induces an rMC by replacing each occurrence of the parameters with their assigned values. For this induced rMC, we compute a (robust) value for a given measure, and we call this verification result the *solution* for this instantiation. Thus, we can associate a prMC with a function, called the *solution function*, that maps parameter instantiations to values.

*Differentation for prMCs.* For our running example, we choose the parameters to represent the number of samples we have obtained for each terrain. Naturally, the *derivative of this solution function* with respect to each parameter (a.k.a. sample size) then corresponds to the expected change in the solution upon obtaining more samples. Such differentiation for parametric MCs (pMCs), where parameter instantiations yield one precise probability distribution, has been studied in [34]. For prMCs, however, it is unclear how to compute derivatives and under what conditions the derivative exists. We thus consider the following problem:

Problem 1 *(Computing derivatives)*. Given a prMC and a parameter instantiation, compute the partial derivative of the solution function (evaluated at this instantiation) with respect to each of the parameters.

*Our Approach.* We compute derivatives for prMCs by solving a parameterized linear optimization problem. We build upon results from convex optimization theory for differentiating the optimal solution of this optimization problem [9,15]. We also present sufficient conditions for the derivative to exist.

*Improving Efficiency.* However, computing the derivative for every parameter explicitly does not scale to more realistic models with thousands of parameters. Instead, we observe that to determine for which parameter we should obtain more samples, we do not need to know *all partial derivatives explicitly*. Instead, it may suffice to know which parameters have *the highest* (or lowest, depending on the application) derivative. Thus, we also solve the following (related) problem:

Fig. 1. Grid world environment (a). The vehicle ( ) must deliver the package ( ) to the warehouse ( ). We obtain the MLEs in (b), leading to the MC in (c).

Problem 2 *(* k*-highest derivatives)*. Given a prMC with |V | parameters, determine the k < |V | parameters with the highest (or lowest) partial derivative.

We develop novel and efficient methods for solving Problem 2. Concretely, we design a linear program (LP) that finds the k parameters with the highest (or lowest) partial derivative without computing all derivatives explicitly. This LP constitutes a polynomial-time algorithm for Problem 2 and is, in practice, *orders of magnitude faster* than computing all derivatives explicitly, especially if the number of parameters is high. Moreover, if the concrete values for the partial derivatives are required, one can additionally solve Problem 1 for only the resulting k parameters. In our experiments, we show that we can compute derivatives for models with over a million states and thousands of parameters.

*Learning Framework.* Learning in stochastic environments is very data-intensive in general, and millions of samples may be required to obtain sufficiently tight bounds on measures of interest [43,47]. Several methods exist to obtain intervals on probabilities based on sampling, including statistical methods such as Hoeffding's inequality [14] and Bayesian methods that iteratively update intervals [57]. Motivated by this challenge of reducing the sample complexity of learning algorithms, we embed our methods in an iterative learning scheme that profits from having access to sensitivity values for the parameters. In our experiments, we show that derivative information can be used effectively to guide sampling when learning an unknown Markov chain with hundreds of parameters.

*Contributions.* Our contributions are threefold: (1) We present a first algorithm to compute partial derivatives for prMCs. (2) For both pMCs and prMCs, we develop an efficient method to determine a subset of parameters with the highest derivatives. (3) We apply our methods in an iterative learning scheme. We give an overview of our approach in Sect. 2 and formalize the problem statement in Sect. 3. In Sect. 4, we solve Problems (1) and (2) for pMCs, and in Sect. 5 for prMCs. Finally, the learning scheme and experiments are in Sect. 6.

## 2 Overview

We expand the example from Sect. 1 to illustrate our approach more concretely. The environment, shown in Fig. 1a, is partitioned into five regions of the same terrain type. The vehicle can move in the four cardinal directions. Recall that

Fig. 2. Parametric MC. Fig. 3. Parametric robust MC.

the slipping probabilities are the same for all states with the same terrain. The vehicle follows a dedicated route to collect and deliver a package to a warehouse. Our goal is to estimate the expected number of steps f to complete the mission.

*Estimating Probabilities.* Classically, we would derive maximum likelihood estimates (MLEs) of the probabilities by sampling. Consider that, using N samples per slipping probability, we obtained the rough MLEs shown in Fig. 1b and thus the MC in Fig. 1c. Verifying the MC shows that the expected travel time (called the solution) under these estimates is ˆf = 25.51 steps, which is far from the travel time of f - = 21.62 steps under the true slipping probabilities. We want to close this *verification-to-real gap* by taking more samples for one of the terrain types. For which of the five terrain types should we obtain more samples?

*Parametric Model.* We can model the grid world as a pMC, i.e., an MC with symbolic probabilities. The solution function for this pMC is the travel time ˆf, being a function of these symbolic probabilities. We sketch four states of this pMC in Fig. 2. The most relevant parameter is then naturally defined as the parameter with the *largest partial derivative of the solution function*. As shown in Fig. 1B, parameter v<sup>4</sup> has the highest partial derivative of ∂f ˆ ∂v<sup>4</sup> = 22.96, while the derivative of v<sup>3</sup> is zero as no states related to this parameter are ever visited.

*Parametric Robust Model.* The approach above does not account for the uncertainty in each MLE. Terrain type v<sup>4</sup> has the highest derivative but also the largest sample size, so sampling v<sup>4</sup> once more has likely less impact than for, e.g., v1. So, is v<sup>4</sup> actually the best choice to obtain additional samples for? The prMC that allows us to answer this question is shown in Fig. 3, where we use (parametric) intervals as uncertainty sets. The parameters are the sample sizes N1,...,N<sup>5</sup> for all terrain types (contrary to the pMC, where parameters represent slipping probabilities). Now, if we obtain one additional sample for a particular terrain type, how can we expect the uncertainty sets to change?

*Derivatives for prMCs.* We use the prMC to compute an upper bound f <sup>+</sup> on the true solution f -. Obtaining one more sample for terrain type v<sup>i</sup> (i.e., increasing N<sup>i</sup> by one) shrinks the interval [g(Ni), g¯(Ni)] on expectation, which in turn decreases our upper bound f <sup>+</sup>. Here, g and g¯ are functions mapping sample sizes to interval bounds. The partial derivatives ∂f<sup>+</sup> ∂N<sup>i</sup> for the prMC are also shown in Fig. 1b and give a very different outcome than the derivatives for the pMC. In fact, sampling v<sup>1</sup> yields the biggest decrease in the upper bound f <sup>+</sup>, so we ultimately decide to sample for terrain type v<sup>1</sup> instead of v4.

*Efficient Differentiation.* We remark that we do not need to know all derivatives explicitly to determine where to obtain samples. Instead, it suffices to know *which parameter has the highest (or lowest) derivative*. In the rest of the paper, we develop efficient methods for computing either all or only the <sup>k</sup> <sup>∈</sup> <sup>N</sup> highest partial derivatives of the solution functions for pMCs and prMCs.

*Supported Extensions.* Our approaches are applicable to general pMCs and prMCs whose parameters can be shared between distributions (and thus capture dependencies, being a common advantage of parametric models in general [40]). Besides parameters in transition probabilities, we can handle parametric initial states, rewards, and policies. We could, e.g., use parameters to model the policy of a surveillance drone in our example and compute derivatives for these parameters.

## 3 Formal Problem Statement

Let <sup>V</sup> <sup>=</sup> {v1,...,v}, <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> be a finite and ordered set of parameters. A parameter instantiation is a function <sup>u</sup>: <sup>V</sup> <sup>→</sup> <sup>R</sup> that maps a parameter to a real valuation. The vector function **<sup>u</sup>**(v1,...,v)=[u(v1),...,u(v)] <sup>∈</sup> <sup>R</sup> denotes an ordered instantiation of all parameters in V through u. The set of polynomials over the parameters V is Q[V ]. A polynomial f can be interpreted as a function <sup>f</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> where <sup>f</sup>(**u**) is obtained by substituting each occurrence of <sup>v</sup> by <sup>u</sup>(v). We denote these substitutions with f[**u**].

For any set <sup>X</sup>, let *pFun<sup>V</sup>* (*<sup>X</sup>* ) = {<sup>f</sup> <sup>|</sup> <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Q</sup>[<sup>V</sup> ]} be the set of functions that map from X to the polynomials over the parameters V . We denote by *pDist<sup>V</sup>* (*X* ) ⊂ *pFun<sup>V</sup>* (*X* ) the set of *parametric probability distributions* over X, i.e., the functions <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Q</sup>[<sup>V</sup> ] such that <sup>f</sup>(x)[**u**] <sup>∈</sup> [0, 1] and - <sup>x</sup>∈<sup>X</sup> <sup>f</sup>(x)[**u**]=1 for all parameter instantiations **u**.

#### Parametric Markov Chain. We define a pMC as follows:

Definition 1 (pMC). *A pMC* M *is a tuple* (S, s<sup>I</sup> ,V,P)*, where* S *is a finite set of states,* s<sup>I</sup> ∈ *Dist*(*S*) *a distribution over initial states,* V *a finite set of parameters, and* P : S → *pDist<sup>V</sup>* (*S*) *a parametric transition function.*

Applying an instantiation **u** to a pMC yields an MC M[**u**] by replacing each transition probability <sup>f</sup> <sup>∈</sup> <sup>Q</sup>[<sup>V</sup> ] by <sup>f</sup>[**u**]. We consider expected reward measures based on a state reward function <sup>R</sup>: <sup>S</sup> <sup>→</sup> <sup>R</sup>. Each parameter instantiation for a pMC yields an MC for which we can compute the solution for the expected reward measure [8]. We call the function that maps instantiations to a solution the *solution function*. The solution function is smooth over the set of graph-preserving instantiations [41]. Concretely, the solution function sol for the expected cumulative reward under instantiation **u** is written as follows:

$$\text{sol}(\mathbf{u}) = \sum\_{s \in S} \left( s\_I(s) \sum\_{\omega \in \Omega(s)} \text{row}(\omega) \cdot \text{Pr}(\omega, \mathbf{u}) \right), \tag{1}$$

where Ω(s) is the set of paths starting in s ∈ S, rew(ω) = R(s0) + R(s1) + ··· is the cumulative reward over ω = s0s<sup>1</sup> ··· , and Pr(ω, **u**) is the probability for a path ω ∈ Ω(s). If a terminal (sink) state is reached from state s ∈ S with probability one, the infinite sum over ω ∈ Ω(s) in Eq. (1) exist [53].

Parametric Robust Markov Chains. The convex polytope <sup>T</sup>A,b <sup>⊆</sup> <sup>R</sup><sup>n</sup> defined by matrix <sup>A</sup> <sup>∈</sup> <sup>R</sup>m×<sup>n</sup> and vector <sup>b</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup> is the set <sup>T</sup>A,b <sup>=</sup> {<sup>p</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> Ap <sup>≤</sup> <sup>b</sup>}. We denote by <sup>T</sup><sup>n</sup> the set of all convex polytopes of dimension <sup>n</sup>, i.e.,

$$\mathbb{T}\_n = \{ T\_{A,b} \mid A \in \mathbb{R}^{m \times n}, \ b \in \mathbb{R}^m, \ m \in \mathbb{N} \}. \tag{2}$$

A robust MC (rMC) [54,58] is a tuple (S, s<sup>I</sup> ,P), where S and s<sup>I</sup> are defined as for pMCs and the uncertain transition function <sup>P</sup> : <sup>S</sup> <sup>→</sup> <sup>T</sup>|S<sup>|</sup> maps states to convex polytopes <sup>T</sup> <sup>∈</sup> <sup>T</sup>|S|. Intuitively, an rMC is an MC with possibly infinite *sets of probability distributions*. To obtain robust bounds on the verification result for any of these MCs, an *adversary* nondeterministically chooses a precise transition function by fixing a probability distribution <sup>P</sup>ˆ(s) ∈ P(s) for each <sup>s</sup> <sup>∈</sup> <sup>S</sup>.

We extend rMCs with polytopes whose halfspaces are defined by polynomials Q[V ] over V . To this end, let Tn[V ] be the set of all such *parametric polytopes*:

$$\mathbb{T}\_n[V] = \{ T\_{A,b} \mid A \in \mathbb{Q}[V]^{m \times n}, \ b \in \mathbb{Q}[V]^m, m \in \mathbb{N} \}. \tag{3}$$

An element <sup>T</sup> <sup>∈</sup> <sup>T</sup>n[<sup>V</sup> ] can be interpreted as a function <sup>T</sup> : <sup>R</sup> <sup>→</sup> <sup>2</sup>(Rn) that maps an instantiation **u** to a (possibly empty) convex polytopic subset of R<sup>n</sup>. The set T[**u**] is obtained by substituting each v<sup>i</sup> in T by u(vi) for all i = 1,...,.

*Example 1.* The uncertainty set for state s<sup>1</sup> of the prMC in Fig. 3 is the parametric polytope <sup>T</sup> <sup>∈</sup> <sup>T</sup>2[<sup>V</sup> ] with singleton parameter set <sup>V</sup> <sup>=</sup> {N1}, such that

$$\begin{aligned} T = \left\{ \left[ p\_{1,1}, p\_{1,2} \right]^\top \in \mathbb{R}^2 \; \middle| \; \underline{g}\_1(N\_1) \le p\_{1,1} \le \bar{g}\_1(N\_1), \\ 1 - \bar{g}\_1(N\_1) \le p\_{1,2} \le 1 - \underline{g}\_1(N\_1), \; p\_{1,2} + p\_{1,2} = 1 \right\}. \end{aligned}$$

We use parametric convex polytopes to define prMCs:

Definition 2 (prMC). *A prMC* M<sup>R</sup> *is a tuple* (S, s<sup>I</sup> ,V,P)*, where* S*,* s<sup>I</sup> *, and* V *are defined as for pMCs (Def. 1), and where* <sup>P</sup> : <sup>S</sup> <sup>→</sup> <sup>T</sup>|S|[<sup>V</sup> ] *is a parametric and uncertain transition function that maps states to parametric convex polytopes.*

Applying an instantiation **u** to a prMC yields an rMC MR[**u**] by replacing each parametric polytope <sup>T</sup> <sup>∈</sup> <sup>T</sup>|S|[<sup>V</sup> ] by <sup>T</sup>[**u**], i.e., a polytope defined by a concrete matrix <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup>×<sup>n</sup> and vector <sup>b</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup>. Without loss of generality, we consider adversaries minimizing the expected cumulative reward until reaching a set of terminal states S<sup>T</sup> ⊆ S. This minimum expected cumulative reward solR(**u**), called the *robust solution* on the instantiated prMC MR[**u**], is defined as

$$\mathsf{sol}\_{R}(\mathbf{u}) = \sum\_{s \in S} \left( s\_I(s) \cdot \min\_{P \in \mathcal{P}[\mathbf{u}]} \sum\_{\omega \in \Omega(s)} \mathsf{rec}(\omega) \cdot \Pr(\omega, \mathbf{u}, P) \right). \tag{4}$$

We refer to the function sol<sup>R</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> as the *robust solution function*.

*Assumptions on pMCs and prMCs.* For both pMCs and prMCs, we assume that transitions cannot vanish under any instantiation (graph-preservation). That is, for every s, s ∈ S, we have that P(s)[**u**](s ) (for pMCs) and P(s)[**u**](s ) (for prMCs) are either zero or strictly positive for all instantiations **u**.

Problem Statement. Let <sup>f</sup>(q1,...,qn) <sup>∈</sup> <sup>R</sup><sup>m</sup> be a differentiable multivariate function with <sup>m</sup> <sup>∈</sup> <sup>N</sup>. We denote the *partial derivative* of <sup>f</sup> with respect to <sup>q</sup> by ∂x ∂q <sup>∈</sup> <sup>R</sup>m. The *gradient* of <sup>f</sup> combines all partial derivatives in a single vector as <sup>∇</sup>q<sup>f</sup> = [ ∂f ∂q<sup>1</sup> ,..., ∂f ∂q<sup>n</sup> ] <sup>∈</sup> <sup>R</sup><sup>m</sup>×<sup>n</sup>. We only use gradients <sup>∇</sup>**u**<sup>f</sup> with respect to the parameter instantiation **u**, so we simply write ∇f in the remainder.

The gradient of the robust solution function evaluated at the instantiation **u** is <sup>∇</sup>solR[**u**] = <sup>∂</sup>sol<sup>R</sup> ∂u(v1) [**u**],..., <sup>∂</sup>sol<sup>R</sup> ∂u(v-) [**u**] . We solve the following problem.

*Problem 1. Given a prMC* M<sup>R</sup> *and a parameter instantiation* **u***, compute the gradient* ∇solR[**u**] *of the robust solution function evaluated at* **u***.*

Solving Problem 1 is linear in the number of parameters, which may lead to significant overhead if the number of parameters is large. Typically, it suffices to only obtain the parameters with the highest derivatives:

*Problem 2. Given a prMC* MR*, an instantiation* **u***, and a* k ≤ |V |*, compute a subset* V *of* k *parameters for which the partial derivatives are maximal.*

For both problems, we present polynomial-time algorithms for pMCs (Sect. 4) and prMCs (Sect. 5). Section 6 defines problem variations that we study empirically.

## 4 Differentiating Solution Functions for pMCs

We can compute the solution of an MC M[**u**] with instantiation **u** based on a system of |S| linear equations; here for an expected reward measure [8]. Let x = [x<sup>s</sup><sup>1</sup> ,...,x<sup>s</sup>*|*S*<sup>|</sup>* ] and r = [r<sup>s</sup><sup>1</sup> ,...,r<sup>s</sup>*|*S*<sup>|</sup>* ] be variables for the expected cumulative reward and the instantaneous reward in each state s ∈ S, respectively. Then, for a set of terminal (*sink*) states S<sup>T</sup> ⊂ S, we obtain the equation system

$$x\_s = 0, \qquad \qquad \forall s \in S\_T \tag{5a}$$

$$x\_s = r\_s + P(s)[\mathbf{u}]x, \quad \forall s \in S \backslash S\_T. \tag{5b}$$

Let us set <sup>P</sup>(s)[**u**]=0 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>T</sup> and define the matrix <sup>P</sup>[**u**] <sup>∈</sup> <sup>R</sup>|S|×|S<sup>|</sup> by stacking the rows P(s)[**u**] for all s ∈ S. Then, Eq. (5) is written in matrix form as (I|S<sup>|</sup> − P[**u**])x = r. The equation system in Eq. (5) can be efficiently solved by, e.g., Gaussian elimination or more advanced iterative equation solvers.

#### 4.1 Computing Derivatives Explicitly

We differentiate the equation system in Eq. (5) with respect to an instantiation u(vi) for parameter v<sup>i</sup> ∈ V , similar to, e.g., [34]. For all s ∈ S<sup>T</sup> , the derivative

∂x<sup>s</sup> ∂u(vi) is trivially zero. For all s ∈ S \ S<sup>T</sup> , we obtain via the product rule that

$$\frac{\partial x\_s}{\partial u(v\_i)} = \frac{\partial P(s)x}{\partial u(v\_i)}[\mathbf{u}] = (x^\star)^\top \frac{\partial P(s)^\top}{\partial u(v\_i)}[\mathbf{u}] + P(s)[\mathbf{u}]\frac{\partial x}{\partial u(v\_i)},\tag{6}$$

where x- <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> is the solution to Eq. (5). In matrix form for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, this yields

$$\left(I\_{|S|} - P[\mathbf{u}]\right) \frac{\partial x}{\partial u(v\_i)} = \frac{\partial P x^\star}{\partial u(v\_i)}[\mathbf{u}].\tag{7}$$

The solution defined in Eq. (1) is computed as sol[**u**] = s <sup>I</sup> x-. Thus, the partial derivative of the solution function with respect to u(vi) in closed form is

$$\left(\frac{\partial \mathbf{sol}}{\partial u(v\_i)}\right)[\mathbf{u}] = s\_I^\top \frac{\partial x}{\partial u(v\_i)} = s\_I^\top \left(I\_{|S|} - P[\mathbf{u}]\right)^{-1} \frac{\partial P x^\star}{\partial u(v\_i)}[\mathbf{u}].\tag{8}$$

*Algorithm for Problem* 1*.* Let us provide an algorithm to solve 1 for pMCs. 8 provides a closed-form expression for the partial derivative of the solution function, which is a function of the vector x in Eq. (5). However, due to the inversion of (I|S<sup>|</sup> − P[**u**]), it is generally more efficient to solve the system of equations in Eq. (7). Doing so, the partial derivative of the solution with respect to u(vi) is obtained by: (1) solving Eq. (5) with **u** to obtain x- <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> , and (2) solving the equation system in Eq. (7) with <sup>|</sup>S<sup>|</sup> unknowns for this vector <sup>x</sup>-. We repeat step 2 for all of the |V | parameters. Thus, we can solve Problem 1 by solving |V | + 1 linear equation systems with |S| unknowns each.

#### 4.2 Computing *k*-Highest Derivatives

To solve Problem 2 for pMCs, we present a method to compute only the k ≤ = |V | parameters with the highest (or lowest) partial derivative without computing all derivatives explicitly. Without loss of generality, we focus on the highest derivative. We can determine these parameters by solving a combinatorial optimization problem with binary variables z<sup>i</sup> ∈ {0, 1} for i = 1,...,. Our goal is to formulate this optimization problem such that an optimal value of z- <sup>i</sup> = 1 implies that parameter v<sup>i</sup> ∈ V belongs to the set of k highest derivatives. Concretely, we formulate the following *mixed integer linear problem* (MILP) [60]:

$$\max\_{y \in \mathbb{R}^{|S|}, z \in \{0, 1\}^{\ell}} s\_I^{\top} y \tag{9a}$$

$$\text{subject to } \left( I\_{|S|} - P[\mathbf{u}] \right) y = \sum\_{i=1}^{\ell} z\_i \frac{\partial P x^\star}{\partial u(v\_i)} [\mathbf{u}] \tag{9b}$$

$$z\_1 + \dots + z\_\ell = k.\tag{9c}$$

Constraint (9c) ensures that any feasible solution to Eq. (9) has exactly k nonzero entries. Since matrix (I|S|−P[**u**]) is invertible by construction (see, e.g., [53]), Eq. (9) has a unique solution in <sup>y</sup> for each choice of <sup>z</sup> ∈ {0, <sup>1</sup>}. Thus, the objective value s <sup>I</sup> y is the sum of the derivatives for the parameters v<sup>i</sup> ∈ V for which z<sup>i</sup> = 1. Since we maximize this objective, an optimal solution y-, z to Eq. (9) is guaranteed to correspond to the k parameters that maximize the derivative of the solution in Eq. (8). We state this correctness claim for the MILP:

Proposition 1. *Let* y-*,* z *be an optimal solution to Eq. (9). Then, the set* V - <sup>=</sup> {v<sup>i</sup> <sup>∈</sup> <sup>V</sup> <sup>|</sup> <sup>z</sup>- <sup>i</sup> = 1} *is a subset of* k ≤ *parameters with maximal derivatives.*

The set V may not be unique. However, to solve Problem 2, it suffices to obtain *a set* of k parameters for which the partial derivatives are maximal. Therefore, the set V provides a solution to Problem 2. We remark that, to solve Problem 2 for the k lowest derivatives, we change the objective in Eq. (9a) to minimize s <sup>I</sup> y.

*Linear Relaxation.* The MILP in Eq. (9) is computationally intractable for high values of and k. Instead, we compute the set v via a *linear relaxation* of the MILP. Specifically, we relax the binary variables <sup>z</sup> ∈ {0, <sup>1</sup>} to continuous variables <sup>z</sup> <sup>∈</sup> [0, 1]. As such, we obtain the following LP relaxation of Eq. (9):

$$\underset{y \in \mathbb{R}^{|\mathcal{S}|}, z \in \mathbb{R}^{\ell}}{\text{maximize}} \ s\_I^{\top} y$$

$$\text{subject to } \left( I\_{|S|} - P[\mathbf{u}] \right) y = \sum\_{i=1}^{\ell} z\_i \frac{\partial P x^\star}{\partial u(v\_i)} [\mathbf{u}] \tag{10b}$$

$$0 \le z\_i \le 1, \quad \forall i = 1, \dots, \ell \tag{10c}$$

$$z\_1 + \dots + z\_\ell = k.\tag{10d}$$

Denote by y<sup>+</sup>, z<sup>+</sup> the solution of the LP relaxation in Eq. (10). For details on such linear relaxations of integer problems, we refer to [36,46]. In our case, every optimal solution y<sup>+</sup>, z<sup>+</sup> to the LP relaxation with only binary values z<sup>+</sup> <sup>i</sup> ∈ {0, 1} is also optimal for the MILP, resulting in the following theorem.

Theorem 1. *The LP relaxation in Eq.* (10) *has an optimal solution* y<sup>+</sup>*,* z<sup>+</sup> *with* <sup>z</sup><sup>+</sup> ∈ {0, <sup>1</sup>} *(i.e., every optimal variable* <sup>z</sup><sup>+</sup> <sup>i</sup> *is binary), and every such a solution is also an optimal solution of the MILP in Eq. (9).*

*Proof.* From invertibility of I|S<sup>|</sup> − P[**u**] , we know that Eq. (9) is equivalent to

$$\underset{z \in \{0, 1\}^{\ell}}{\text{maximize}} \sum\_{i=1}^{\ell} z\_i \left( s\_I^{\top} \left( I\_{|S|} - P[\mathbf{u}] \right)^{-1} \frac{\partial P x^{\star}}{\partial u(v\_i)}[\mathbf{u}] \right) \tag{11a}$$

$$\text{A subject to } z\_1 + \dots + z\_\ell = k.\tag{11b}$$

The linear relaxation of Eq. (11) is an LP whose feasible region has integer vertices (see, e.g., [37]). Therefore, both Eq. (11) and its relaxation Eq. (10) have an integer optimal solution z<sup>+</sup>, which constructs zin Eq. (9). 

The binary solutions <sup>z</sup><sup>+</sup> ∈ {0, <sup>1</sup>} are the vertices of the feasible set of the LP in Eq. (10). A simplex-based LP solver can be set to return such a solution.<sup>1</sup>

*Algorithm for Problem* 2*.* We provide an algorithm to solve Problem 2 for pMCs consisting of two steps. First, for pMC M and parameter instantiation **u**, we solve the linear equation system in Eq. (7) for x to obtain the solution sol[**u**] = s <sup>I</sup> x-. Second, we fix a number of parameters k ≤ and solve the LP relaxation in Eq. (10). The set V of parameters with maximal derivatives is then obtained as defined in Proposition 1. The parameter set V is a solution to Proposition 2.

#### 5 Differentiating Solution Functions for prMCs

We shift focus to prMCs. Recall that solutions solR[**u**] are computed for the worst-case realization of the uncertainty, called the robust solution. We derive the following equation system, where, as for pMCs, <sup>x</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> represents the expected cumulative reward in each state.

$$x\_s = 0,\tag{12a}$$

$$\forall s \in Sr \tag{12a}$$

$$x\_s = r\_s + \inf\_{p \in \mathcal{P}(s)[\mathbf{u}]} \left( p^\top x \right), \qquad \qquad \forall s \in S \; \backslash S\_T. \tag{12b}$$

Solving Eq. (12) directly corresponds to solving a system of nonlinear equations due to the inner infimum in Eq. (12b). The standard approach from robust optimization [12] is to leverage the dual problem for each inner infimum, e.g., as is done in [20,52]. For each s ∈ S, P(s) is a parametric convex polytope TA,b as defined in Eq. (3). The dimensionality of this polytope depends on the number of successor states, which is typically much lower than the total number of states. To make the number of successor states explicit, we denote by post(s) ⊆ S the successor states of <sup>s</sup> <sup>∈</sup> <sup>S</sup> and define <sup>T</sup>A,b <sup>∈</sup> <sup>T</sup><sup>|</sup>post(s)|[<sup>V</sup> ] with <sup>A</sup><sup>s</sup> <sup>∈</sup> <sup>Q</sup><sup>m</sup>s×|post(s)<sup>|</sup> and <sup>b</sup>s[**u**] <sup>∈</sup> <sup>Q</sup><sup>m</sup><sup>s</sup> (recall <sup>m</sup><sup>s</sup> is the number of halfspaces of the polytope). Then, the infimum in Eq. (12b) for each s ∈ S \ S<sup>T</sup> is

$$\text{minimize } p^\top x \tag{13a}$$

$$\text{subject to } A\_s[\mathbf{u}]p \le b\_s[\mathbf{u}] \tag{13b}$$

$$1^\top p = 1,\tag{13c}$$

where <sup>1</sup> denotes a column vector of ones of appropriate size. Let <sup>x</sup>post(s) <sup>=</sup> [xs]<sup>s</sup>∈post(s) be the vector of decision variables corresponding to the (ordered) successor states in post(s). The dual problem of Eq. (13), with dual variables <sup>α</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup><sup>s</sup> and <sup>β</sup> <sup>∈</sup> <sup>R</sup> (see, e.g., [11] for details), is written as follows:

$$\text{maximize } \, -b\_s [\mathbf{u}]^\top \alpha - \beta \tag{14a}$$

$$\begin{array}{c} \text{subject to } A\_s[\mathbf{u}]^\top \alpha + x\_{\mathsf{post}(s)} + \beta \mathbb{1} = 0 \end{array} \tag{14b}$$

$$
\alpha \ge 0.\tag{14c}
$$

<sup>1</sup> Even if a non-vertex solution *y*<sup>+</sup>*, z*<sup>+</sup> is obtained, we can use an arbitrary tie-break rule on *z*<sup>+</sup>, which forces each *<sup>z</sup>*<sup>+</sup> <sup>i</sup> binary and preserves the sum in Eq. (10d).

Fig. 4. Three polytopic uncertainty sets (blue shade), with the vector *x*, the worst-case points *p*-, and the active constraints shown in red. (Color figure online)

By using this dual problem in Eq. (12b), we obtain the following LP with decision variables <sup>x</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> , and with <sup>α</sup><sup>s</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup><sup>s</sup> and <sup>β</sup><sup>s</sup> <sup>∈</sup> <sup>R</sup> for every <sup>s</sup> <sup>∈</sup> <sup>S</sup>:

$$\text{Maximize } s\_I^\top x$$

subject to x<sup>s</sup> = 0, ∀s ∈ S<sup>T</sup> (15b)

$$x\_s = r\_s - \left(b\_s \mathbf{[u]}^\top \alpha\_s + \beta\_s\right), \qquad \qquad \forall s \in S \; | \; S\_T \qquad \text{(15c)}$$

$$A\_s[\mathbf{u}] \mid \alpha\_s + x\_{\text{post}(s)} + \beta\_s \mathbb{1} = 0, \quad \alpha\_s \ge 0, \qquad \forall s \in S \; | \; S\_T. \tag{15d}$$

The reformulation of Eq. (12) to Eq. (15) requires that s<sup>I</sup> ≥ 0, which is trivially satisfied because s<sup>I</sup> is a probability distribution. Denote by x-, α-, β an optimal point of Eq. (15). The x element of this optimum is also an optimal solution of Eq. (12) [12]. Thus, the robust solution defined in Eq. (4) is solR[**u**] = s <sup>I</sup> x-.

#### 5.1 Computing Derivatives via pMCs (and When It Does Not Work)

Toward solving Problem 1, we provide some intuition about computing robust solutions for prMCs. The infimum in Eq. (12) finds the *worst-case* point p in each set <sup>P</sup>(s)[**u**] that minimizes (p-) x. This minimization is visualized in Fig. 4a for an uncertainty set that captures three probability intervals p<sup>i</sup> ≤ p<sup>i</sup> ≤ p¯i, i = 1, 2, 3. Given the optimization direction x (arrow in Fig. 4a), the point p- (red dot) is attained at the vertex where the constraints p<sup>1</sup> ≤ p<sup>1</sup> and p<sup>2</sup> ≤ p<sup>2</sup> are active.<sup>2</sup> Thus, we obtain that the point in the polytope that minimizes (p-) x is p- = [p1, p2, 1 − p<sup>1</sup> − p2] . Using this procedure, we can obtain a worst-case point p- <sup>s</sup> for each state s ∈ S. We can use these points to convert the prMC into an induced pMC with transition function P(s) = p- <sup>s</sup> for each state s ∈ S.

For small changes in the parameters, the point p in Fig. 4a changes smoothly, and its closed-form expression (i.e., the functional form) remains the same. As such, it feels intuitive that we could apply the methods from Sect. 4 to compute partial derivatives on the induced pMC. However, this approach does not always work, as illustrated by the following two corner cases.

<sup>2</sup> An inequality constraint *gx* <sup>≤</sup> *h* is active under the optimal solution *x* if *gx*- = *h* [15].


These examples show that computing derivatives via an induced pMC by obtaining each point p- <sup>s</sup> can be tricky or is, in some cases, not possible at all. In what follows, we present a method that directly derives a set of linear equations to obtain derivatives for prMCs (all or only the k highest) based on the solution to the LP in Eq. (15), which intrinsically identifies the corner cases above in which the derivative is not defined.

#### 5.2 Computing Derivatives Explicitly

We now develop a dedicated method for identifying if the derivative of the solution function for a prMC exists, and if so, to compute this derivative. Observe from Fig. 4 that the point p is uniquely defined and has a smooth derivative only in Fig. 4a with two active constraints. For only one active constraint (Fig. 4b), the point is *underdetermined*, while for three active constraints (Fig. 4c), the derivative may *not be smooth*. In the general case, having exactly n − 1 active constraints (whose facets are nonparallel) is a sufficient condition for obtaining a unique and smoothly changing point pin the n-dimensional probability simplex.

*Optimal Dual Variables.* The optimal dual variables α- <sup>s</sup> ≥ 0 for each s ∈ S \ S<sup>T</sup> in Eq. (15) indicate which constraints of the polytope As[**u**]p ≤ bs[**u**] are active, i.e., for which rows as,i[**u**] of As[**u**] it holds that as,i[**u**]p- = bs[**u**]. Specifically, a value of αs,i > 0 implies that the i th constraint is active, and αs,i = 0 indicates a nonactive constraint [15]. We define <sup>E</sup><sup>s</sup> = [e1,...,e<sup>m</sup><sup>s</sup> ] ∈ {0, <sup>1</sup>}<sup>m</sup><sup>s</sup> as a vector whose binary values <sup>e</sup><sup>i</sup> <sup>∀</sup><sup>i</sup> ∈ {1,...,ms} are given as <sup>e</sup><sup>i</sup> = [[α- s,i > 0]]. <sup>3</sup> Moreover, denote by **D**(Es) the matrix with E<sup>s</sup> on the diagonal and zeros elsewhere. We reduce the LP in Eq. (15) to a system of linear equations that encodes only the constraints that are active under the worst-case point p- <sup>s</sup> for each s ∈ S \ S<sup>T</sup> :

$$x\_s = 0,\tag{16a}$$

$$\forall s \in S\_T \tag{16a}$$

$$x\_s = r\_s - \left(b\_s[\mathbf{u}] \, ^\top \mathbf{D}(E\_s) \alpha\_s + \beta\_s\right), \qquad \qquad \forall s \in S \, ^\top S\_T \tag{16b}$$

$$A\_s[\mathbf{u}]^\top \mathbf{D}(E\_s) \alpha\_s + x\_{\text{post}(s)} + \beta\_s \mathbb{1} = 0, \quad \alpha\_s \ge 0, \qquad \forall s \in S \; \backslash S\_T. \tag{16c}$$

*Differentiation.* However, when does Eq. (16) have a (unique) optimal solution? To provide some intuition, let us write the equation system in matrix form, i.e.,

<sup>3</sup> We use Iverson-brackets: [[*x*]] = 1 if *x* is true and [[*x*]] = 0 otherwise.

C xαβ = d, where we omit an explicit definition of matrix C and vector d for brevity. It is apparent that if matrix C is nonsingular, then Eq. (16) has a unique solution. This requires matrix C to be square, which is achieved if, for each s ∈ S \ S<sup>T</sup> , we have |post(s)| = -E<sup>s</sup> + 1. In other words, the number of successor states of s is equal to the number of active constraints of the polytope plus one. This confirms our previous intuition from Sect. 5.1 on a polytope for |post(s)| = 3 successor states, which required m<sup>s</sup> <sup>i</sup>=1 E<sup>i</sup> = 2 active constraints.

Let us formalize this intuition about computing derivatives for prMCs. We can compute the derivative of the solution x by differentiating the equation system in Eq. (16) through the product rule, in a very similar manner to the approach in Sect. 4. We state this key result in the following theorem.

Theorem 2. *Given a prMC* <sup>M</sup><sup>R</sup> *and an instantiation* **<sup>u</sup>***, compute* <sup>x</sup>-, α-, β *for Eq.* (15) *and choose a parameter* <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> *. The partial derivatives* ∂x ∂u(vi) *,* ∂α ∂u(vi) *, and* ∂β ∂u(vi) *are obtained as the solution to the linear equation system*

$$\frac{\partial x\_s}{\partial u(v\_i)} = 0,\tag{17a}$$

$$\frac{\partial x\_s}{\partial u(v\_i)} + b\_s[\mathbf{u}]^\top \mathbf{D}(E\_s) \frac{\partial \alpha\_s}{\partial u(v\_i)} + \frac{\partial \beta\_s}{\partial u(v\_i)} \qquad = -(\alpha\_s^\star)^\top \mathbf{D}(E\_s) \frac{\partial b\_s[\mathbf{u}]}{\partial u(v\_i)}, \quad (17b)$$

$$\forall s \in S \; \backslash S\_T$$

$$A\_s[\mathbf{u}]^\top \mathbf{D}(E\_s) \frac{\partial \alpha\_s}{\partial u(v\_i)} + \frac{\partial x\_{\text{post}(s)}}{\partial u(v\_i)} + \frac{\partial \beta\_s}{\partial u(v\_i)} \mathbb{1} \\ \quad = - (\alpha\_s^\star)^\top \mathbf{D}(E\_s) \frac{\partial A\_s[\mathbf{u}]}{\partial u(v\_i)}, \quad \text{(17c)}$$

$$\forall s \in S \; \bigvee S\_T.$$

The proof follows from applying the product rule to Eq. (16) and is provided in [6, Appendix A.1]. To compute the derivative for a parameter v<sup>i</sup> ∈ V , we thus solve a system of linear equations of size |S|+- <sup>s</sup>∈S\S<sup>T</sup> <sup>|</sup>post(s)|. Using Theorem 2, we obtain sufficient conditions for the solution function to be differentiable.

Lemma 1. *Write the linear equation system in Eq.* (17) *in matrix form, i.e.,*

$$C\left[\frac{\partial x}{\partial u(v\_i)}, \frac{\partial \alpha}{\partial u(v\_i)}, \frac{\partial \beta}{\partial u(v\_i)}\right]^\top = d,\tag{18}$$

*for* <sup>C</sup> <sup>∈</sup> <sup>R</sup><sup>q</sup>×<sup>q</sup> *and* <sup>d</sup> <sup>∈</sup> <sup>R</sup><sup>q</sup>*,* <sup>q</sup> <sup>=</sup> <sup>|</sup>S|+- <sup>s</sup>∈S\S<sup>T</sup> <sup>|</sup>post(s)|*, which are implicitly given by Eq. (17). The solution function* solR[**u**] *is differentiable at instantiation* **u** *if matrix* C *is nonsingular, in which case we obtain* ( <sup>∂</sup>sol<sup>R</sup> ∂u(vi) )[**u**] = s I ∂x ∂u(vi) *.*

*Proof.* The partial derivative of the solution function is <sup>∂</sup>sol<sup>R</sup> ∂u(vi) [**u**] = s I ∂x ∂u(v<sup>i</sup> , where ∂x ∂u(v<sup>i</sup> is (a part of) the solution to Eq. (16). Thus, the solution function is differentiable if there is a (unique) solution to Eq. (16), which is guaranteed if matrix C is nonsingular. Thus, the claim in Lemma 1 follows. 

*Algorithm for Problem*1*.* We use Theorem 2 to solve Problem 1 for prMCs, similarly as for pMCs. Given a prMC M<sup>R</sup> and an instantiation **u**, we first solve Eq. (15) to obtain x-, α-, β-. Second, we use α- <sup>s</sup> to compute the vector E<sup>s</sup> of active constraints for each s ∈ S\S<sup>T</sup> . Third, for every parameter v ∈ V , we solve the equation system in Eq. (17). Thus, to compute the gradient of the solution function, we solve one LP and |V | linear equation systems.

#### 5.3 Computing *k*-Highest Derivatives

We directly apply the same procedure from Sect. 4.2 to compute the parameters with the k ≤ highest derivatives. As for pMCs, we can compute the k highest derivatives by solving a MILP encoding the equation system in Eq. (17) for every parameter v ∈ V , which we present in [6, Appendix A.2] for brevity. This MILP has the same structure as Eq. (9), and thus we may apply the same linear relaxation to obtain an LP with the guarantees as stated in Theorem 1. In other words, solving the LP relaxation yields the set V of parameters with maximal derivatives as in Proposition 1. This set V is a solution to Problem 2 for prMCs.

#### 6 Numerical Experiments

We perform experiments to answer the following questions about our approach:


Let us briefly summarize the computations involved in answering these questions. First of all, computing the solution sol(**u**) for a pMC, which is defined in Eq. (1), means solving the linear equation system in Eq. (5). Similarly, computing the robust solution solR(**u**) for a prMC means solving the LP in Eq. (15). Then, solving Problem 1, i.e., computing all |V | partial derivatives, amounts to solving a linear equation system for each parameter v ∈ V (namely, Eq. (5) for a prMC and Eq. (17) for a prMC). In contrast, solving Problem 2, i.e., computing a subset V of parameters with maximal (or minimal) derivative, means for a pMC that we solve the LP in Eq. (10) (or the equivalent LP for a prMC) and thereafter extract the subset of V parameters using Proposition 1.

*Problem 3: Computing the* k*-highest Derivatives.* A solution to Problem 2 is a set V of k parameters but does not include the computation of the derivatives. However, it is straightforward to also obtain the actual derivatives <sup>∂</sup>sol ∂u(v) [**u**] for each parameter <sup>v</sup> <sup>∈</sup> <sup>V</sup> -. Specifically, we solve Problem 1 for the k parameters in V -, such that we obtain the partial derivatives for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> -. We remark that, for k = 1, the derivative follows directly from the optimal value s <sup>I</sup> y<sup>+</sup> of the LP in Eq. (10), so this additional step is not necessary. We will refer to computing the actual values of the k highest derivatives as *Problem 3*.

*Setup.* We implement our approach in Python 3.10, using Storm [35] to parse pMCs, Gurobi [31] to solve LPs, and the SciPy sparse solver to solve equation systems. All experiments run on a computer with a 4GHz Intel Core i9 CPU and 64 GB RAM, with a timeout of one hour. Our implementation is available at https://doi.org/10.5281/zenodo.7864260.

*Grid World Benchmarks.* We use scaled versions of the grid world from the example in Sect. 2 with over a million states and up to 10 000 terrain types. The vehicle only moves right or down, both with 50% probability (wrapping around when leaving the grid). Slipping only occurs when moving down and (slightly different from the example in Sect. 2) means that the vehicle moves *two cells instead of one*. We obtain between N = 500 and 1 000 samples of each slipping probability. For the pMCs, we use maximum likelihood estimation ( <sup>p</sup>¯ <sup>N</sup> , with p¯ the sample mean) obtained from these samples as probabilities, whereas, for the prMCs, we infer probability intervals using Hoeffding's inequality (see Q3 for details).

*Benchmarks from Literature.* We also use several instances of parametric extensions of MCs and Markov decision processes (MDPs) from standard benchmark suits [33,44]. We also use pMC benchmarks from [5,23] as these models have more parameters than the traditional benchmarks. We extend these benchmarks to prMCs by constructing probability intervals around the pMC's probabilities.

*Results.* The results for all benchmarks are shown in [6, Appendix B, Tab. 2–3].

#### Q1. Computing Solutions vs. Derivatives

We investigate whether computing derivatives is feasible on p(r)MCs. In particular, we compare the computation times for computing derivatives on p(r)MCs (Problems 1 and 3) with the times for computing the solution for these models.

Fig. 5. Runtimes (log-scale) for computing a single derivative (left, Problem 1) or the highest derivative (right, Problem 3), vs. computing the solution sol[**u**]/solR[**u**].


Table 1. Model sizes, runtimes, and derivatives for selection of grid world models.

<sup>a</sup>Extrapolated from the runtimes for <sup>10</sup> to all <sup>|</sup>*<sup>V</sup>* <sup>|</sup> parameters. <sup>b</sup>

Timeout (1 h) occurred for verifying the p(r)MC, not for computing derivatives.

In Fig. 5, we show for all benchmarks the times for computing the solution (defined in Eqs. (1) and (4)), versus computing either a single derivative for Problem 1 (left) or the highest derivative of all parameters resulting from Problem 3 (right). A point (x, y) in the left plot means that computing a single derivative took x seconds while computing the solution took y seconds. A line above the (center) diagonal means we obtained a speed-up over the time for computing the solution; a point over the upper diagonal indicates a 10× speed-up or larger.

*One Derivative.* The left plot in Fig. 5 shows that, for pMCs, the times for computing the solution and a single derivative are approximately the same. This is expected since both problems amount to solving a single equation system with |S| unknowns. Recall that, for prMCs, computing the solution means solving the LP in Eq. (15), while for derivatives we solve an equation system. Thus, computing a derivative for a prMC is relatively cheap compared to computing the solution, which is confirmed by the results in Fig. 5.

*Highest Derivative.* The right plot in Fig. 5 shows that, for pMCs, computing the highest derivative is slightly slower than computing the solution (the LP to compute the highest derivative takes longer than the equation system to compute the solution). On the other hand, computing the highest derivative for a prMC is still cheap compared to computing the solution. Thus, if we are using a prMC anyways, computing the derivatives is relatively cheap.

#### Q2. Runtime Improvement of Computing only *k* Derivatives

We want to understand the computational benefits of solving Problem 3 over solving Problem 1. For Q2, we consider all models with |V | ≥ 10 parameters.

An excerpt of results for the grid world benchmarks is presented in Table 1. Recall that, after obtaining the (robust) solution, solving Problem 1 amounts to solving |V | linear equation systems, whereas Problem 3 involves solving a

Fig. 6. Runtimes (log-scale) for computing the highest (left) or <sup>10</sup> highest (right) derivatives (Problem 3), versus computing all derivatives (Problem 1).

single LP and k equations systems. From Table 1, it is clear that computing k derivatives is orders of magnitudes faster than computing all |V | derivatives, especially if the total number of parameters is high.

We compare the runtimes for computing all derivatives (Problem 1) with computing only the k = 1 or 10 highest derivatives (Problem 3). The left plot of Fig. 6 shows the runtimes for k = 1, and the right plot for the k = 10 highest derivatives. The interpretation for Fig. 6 is the same as for Fig. 5. From Fig. 6, we observe that computing only the k highest derivatives generally leads to significant speed-ups, often of more than 10 times (except for very small models). Moreover, the difference between k = 1 and k = 10 is minor, showing that retrieving the actual derivatives after solving Problem 2 is relatively cheap.

*Numerical Stability.* While our algorithm is exact, our implementation uses floating-point arithmetic for efficiency. To evaluate the numerical stability, we compare the highest derivatives (solving Problem 3 for k = 1) with an empirical approximation of the derivative obtained by perturbing the parameter by <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>3</sup>. The difference (column *'Error. %'* in Table <sup>1</sup> and [6, Appendix B, Table 2] between both is marginal, indicating that our implementation is sufficiently numerically stable to return accurate derivatives.

#### Q3. Application in a Learning Framework

Reducing the sample complexity is a key challenge in learning under uncertainty [43,47]. In particular, learning in stochastic environments is very dataintensive, and realistic applications tend to require millions of samples to provide tight bounds on measures of interest [16]. Motivated by this challenge, we apply our approach in a learning framework to investigate if derivatives can be used to effectively guide exploration, compared to alternative exploration strategies.

Fig. 7. Robust solutions for each sampling strategy in the learning framework for the grid world (a) and drone (b) benchmarks. Averages values of 10 (grid world) or 5 (drone) repetitions are shown, with shaded areas the min/max.

*Models.* We consider the problem of where to sample in 1) a slippery grid world with |S| = 800 and |V | = 100 terrain types, and 2) the drone benchmark from [23] with |S| = 4 179 and |V | = 1 053 parameters. As in the motivating example in Sect. 2, we learn a model of the unknown MC in the form of a prMC, where the parameters are the sample sizes for each parameter. We assume access to a model that can arbitrarily sample each parameter (i.e., the slipping probability in the case of the grid world). We use an initial sample size of N<sup>i</sup> = 100 for each parameter i ∈ {1,..., |V |}, from which we infer a β = 0.9 (90%) confidence interval using Hoeffding's inequality. The interval for parameter i is [ˆp<sup>i</sup> −i, pˆ<sup>i</sup> + i], with pˆ<sup>i</sup> the sample mean and <sup>i</sup> = log 2−log (1−β) <sup>2</sup><sup>N</sup> (see, e.g., [14] for details).

*Learning Scheme.* We iteratively choose for which parameter v<sup>i</sup> ∈ V to obtain 25 (for the grid world) or 250 (for the drone) additional samples. We compare four strategies for choosing the parameter v<sup>i</sup> to sample for: 1) with highest derivative, i.e., solving Problem 3 for k = 1; 2) with biggest interval width i; 3) uniformly; and 4) sampling according to the expected number of visits times the interval width (see [6, Appendix B.1] for details). After each step, we update the robust upper bound on the solution for the prMC with the additional samples.

*Results.* The upper bounds on the solution for each sampling strategy, as well as the solution for the MC with the true parameter values, are shown in Fig. 7. For both benchmarks, our derivative-guided sampling strategy converges to the true solution faster than the other strategies. Notably, our derivative-guided strategy accounts for both the uncertainty and importance of each parameter, which leads to a lower sample complexity required to approach the true solution.

## 7 Related Work

We discuss related work in three areas: pMCs, their extension to parametric interval Markov chains (piMCs), and general sensitivity analysis methods.

*Parametric Markov Chains.* pMCs [24,45] have traditionally been studied in terms of computing the solution function [13,25,28,29,32]. Much recent literature considers synthesis (find a parameter valuation such that a specification is satisfied) or verification (prove that all valuations satisfy a specification). We refer to [38] for a recent overview. For our paper, particularly relevant are [55], which checks whether a derivative is positive (for all parameter valuations), and [34], which solves parameter synthesis via gradient descent. We note that all these problems are (co-)ETR complete [41] and that the solution function is exponentially large in the number of parameters [7], whereas we consider a polynomial-time algorithm. Furthermore, practical *verification* procedures for uncontrollable parameters (as we do) are limited to less than 10 parameters. Parametric verification is used in [51] to guide model refinement by detecting for which parameter values a specification is satisfied. In contrast, we consider slightly more conservative rMCs and aim to stepwise optimize an objective. Solution functions also provide an approach to compute and refine confidence intervals [17]; however, the size of the solution function hampers scalability.

*Parametric interval Markov Chains (piMCs).* While prMCs have, to the best of our knowledge, not been studied, their slightly more restricted version are piMCs. In particular, piMCs have interval-valued transitions with parametric bounds. Work on piMCs falls into two categories. First, *consistency* [27,50]: is there a parameter instantiation such that the (reachable fragment of the) induced interval MC contains valid probability distributions? Second, parameter synthesis for quantitative and qualitative reachability in piMCs with up to 12 parameters [10].

*Perturbation Analysis.* Perturbation analysis considers the change in solution by any perturbation vector X for the parameter instantiation, whose norm is upper bounded by δ, i.e., ||X|| ≤ δ (or conversely, which δ ensures the solution perturbation is below a given maximum). Likewise, [21] uses the distance between two instantiations of a pMC (called augmented interval MC) to bound the change in reachability probability. Similar analyses exist for stationary distributions [1]. These problems are closely related to the verification problem in pMCs and are equally (in)tractable if there are dependencies over multiple parameters. To improve tractability, a follow-up [56] derives asymptotic bounds based on first or second-order Taylor expansions. Other approaches to perturbation analysis analyze individual paths of a system [18,19,30]. Sensitivity analysis in (parameterfree) imprecise MCs, a variation to rMCs, is thoroughly studied in [22].

*Exploration in Learning.* Similar to Q3 in Sect. 6, determining where to sample is relevant in many learning settings. Approaches such as probably approximately correct (PAC) statistical model checking [2,3] and model-based reinforcement learning [47] commonly use optimistic exploration policies [48]. By contrast, we guide exploration based on the sensitivity analysis of the solution function with respect to the parametric model.

#### 8 Concluding Remarks

We have presented efficient methods to compute partial derivatives of the solution functions for pMCs and prMCs. For both models, we have shown how to compute these derivatives explicitly *for all parameters*, as well as how to compute only the k *highest derivatives*. Our experiments have shown that we can compute derivatives for models with over a million states and thousands of parameters. In particular, computing the k highest derivatives yields significant speed-ups compared to computing all derivatives explicitly and is feasible for prMCs which can be verified. In the future, we want to support nondeterminism in the models and apply our methods in (online) learning frameworks, in particular for settings where reducing the uncertainty is computationally expensive [42,49].

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## MDPs as Distribution Transformers: Affine Invariant Synthesis for Safety Objectives

S. Akshay1(B) , Krishnendu Chatterjee<sup>2</sup> , Tobias Meggendorfer2,3 , and Ðorđe Žikelić<sup>2</sup>

<sup>1</sup> Indian Institute of Technology Bombay, Mumbai, India akshayss@cse.iitb.ac.in <sup>2</sup> Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria {krishnendu.chatterjee,dzikelic}@ist.ac.at <sup>3</sup> Technical University of Munich, Munich, Germany

tobias.meggendorfer@cit.tum.de

Abstract. Markov decision processes can be viewed as transformers of probability distributions. While this view is useful from a practical standpoint to reason about trajectories of distributions, basic reachability and safety problems are known to be computationally intractable (i.e., Skolem-hard) to solve in such models. Further, we show that even for simple examples of MDPs, strategies for safety objectives over distributions can require infinite memory and randomization.

In light of this, we present a novel overapproximation approach to synthesize strategies in an MDP, such that a safety objective over the distributions is met. More precisely, we develop a new framework for template-based synthesis of certificates as affine distributional and inductive invariants for safety objectives in MDPs. We provide two algorithms within this framework. One can only synthesize memoryless strategies, but has relative completeness guarantees, while the other can synthesize general strategies. The runtime complexity of both algorithms is in PSPACE. We implement these algorithms and show that they can solve several non-trivial examples.

Keywords: Markov decision processes · invariant synthesis · distribution transformers · Skolem hardness

## 1 Introduction

Markov decision processes (MDPs) are a classical model for probabilistic decision making systems. They extend the basic probabilistic model of Markov chains with non-determinism and are widely used across different domains and contexts. In the

c The Author(s) 2023

This work was supported in part by the ERC CoG 863818 (FoRM-SMArt) and the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 665385 as well as DST/CEFIPRA/INRIA project EQuaVE and SERB Matrices grant MTR/2018/00074.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 86–112, 2023. https://doi.org/10.1007/978-3-031-37709-9\_5

verification community, MDPs are often viewed through an automata-theoretic lens, as state transformers, with runs being sequences of states with certain probability for taking each run (see e.g., [9]). With this view, reachability probabilities can be computed using simple fixed point equations and model checking can be done over appropriately defined logics such as PCTL\*. However, in several contexts such as modelling biochemical networks, queueing theory or probabilistic dynamical systems, it is more convenient to view MDPs as transformers of probability distributions over the states, and define objectives over these distributions [1,5,12,17,44,47]. In this framework, we can, for instance, easily reason about properties such as the probability in a set of states always being above a given threshold or comparing the probability in two states at some future time point. More concretely, in a chemical reaction network, we may require that the concentration of a particular complex is never above 10%. Such distribution-based properties cannot be expressed in PCTL\* [12], and thus several orthogonal logics have been defined [1,12,44] that reason about distributions.

Unfortunately, and perhaps surprisingly, when we view them as distribution transformers even the simplest reachability and safety problems with respect to probability distributions over states remain unsolved. The reason for this is a number-theoretical hardness result that lies at the core of these questions. In [3], it is shown that even with just Markov chains, reachability is as hard as the socalled Skolem problem, and safety is as hard as the Positivity problem [55, 56], the decidability of both of which are long-standing open problems in linear recurrence sequences. Moreover, synthesizing strategies that resolve the nondeterminism in MDPs to achieve an objective (whether reachability or safety) is further complicated by the issue of how much memory can be allowed for the strategy. As we show in Sect. 3, even for very simple examples, strategies for safety can require infinite memory as well as randomization.

In light of these difficulties, what can one do to tackle these problems *in theory and in practice*? In this paper, we take an over-approximation route to approach these questions, not only to check existence of strategies for safety but also synthesize them. Inspired by the success of invariant synthesis in program verification, our goal is to develop a novel invariant-synthesis based approach towards strategy synthesis in MDPs, viewed as transformers of distributions. In this paper, we restrict our attention to a class of safety objectives on MDPs, which are already general enough to capture several interesting and natural problems on MDPs. Our contributions are the following:

	- The first algorithm is restricted to synthesizing memoryless strategies but is *relatively complete*, i.e., whenever a memoryless strategy and an affine inductive distributional invariant that witness safety exist, we are guaranteed to find them.
	- The second algorithm can synthesize general strategies as well as memoryless strategies, but is incomplete in general.

In both cases, we employ a template-based synthesis approach and reduce synthesis to the existential first-order theory of reals, which gives a PSPACE complexity upper bound. In the first case, this reduction depends on Farkas' lemma. In the second case, we need to use Handelman's theorem, a specialized result for strictly positive polynomials.

4. We implement our approaches and show that for several practical and nontrivial examples, affine invariants suffice. Further, we demonstrate that our prototype tool can synthesize these invariants and associated strategies.

Finally, we discuss the generalization of our approach from affine to polynomial invariants and some variants that our approach can handle.

#### 1.1 Related Work

*Distribution-based Safety Analysis in MDPs.* The problem of checking distribution-based safety objectives for MDPs was defined in [5] but a solution was provided only in the *uninitialized* setting, where the initial distribution is not given and also under the assumption that the target set is closed and bounded. In contrast, we tackle both initialized and uninitialized settings, our target sets are general affine sets and we focus on actually synthesizing strategies not just proving existence.

*Template-based Program Analysis.* Template-based synthesis via the means of linear/polynomial constraint solving is a standard approach in program analysis to synthesizing certificates for proving properties of programs. Many of these methods utilize Farkas' lemma or Handelman's theorem to automate the synthesis of program invariants [20,27], termination proofs [6,14,23,28,57], reachability proofs [8] or cost bounds [16,39,64]. The works [2,18,19,21,22,24,25,62,63] utilize Farkas' lemma or Handelman's theorem to synthesize certificates for these properties in probabilistic programs. While our algorithms build on the ideas from the works on template-based inductive invariant synthesis in programs [20,27], the key novelty of our algorithms is that they synthesize a fundamentally different kind of invariants, i.e. *distributional invariants* in MDPs. In contrast, the existing works on (probabilistic) program analysis synthesize *state* invariants. Furthermore, our algorithms synthesize distributional invariants *together* with MDP strategies. While it is common in controller synthesis to synthesize an MDP strategy for a *state* invariant, we are not aware of any previous work that uses template-based synthesis methods to compute MDP strategies for a *distributional* invariant.

*Other Approaches to Invariant Synthesis in Programs.* Alternative approaches to invariant synthesis in programs have also been considered, for instance via abstract interpretation [29,30,33,60], counterexample guided invariant synthesis (CEGIS) [7,10,34], recurrence analysis [32,42,43] or learning [35,61]. While some of these approaches can be more scalable than constraint solving-based methods, they typically do not provide relative completeness guarantees. An interesting direction of future work would be to explore whether these alternative approaches could be used for synthesizing distributional invariants together with MDP strategies more efficiently.

*Weakest Pre-expectation Calculus.* Expectation transformers and the weakest pre-expectation calculus generalize Dijkstra's weakest precondition calculus to the setting of probabilistic programs. Expectation transformers were introduced in the seminal work on probabilistic propositional dynamic logic (PPDL) [45] and were extended to the setting of probabilistic programs with non-determinism in [48,52]. Weakest pre-expectation calculus for reasoning about expected runtime of probabilistic programs was presented in [40]. Intuitively, given a function over probabilistic program outputs, the weakest pre-expectation calculus can be used to reason about the supremum or the infimum expected value of the function upon executing the probabilistic program, where the supremum and the infimum are taken over the set of all possible schedulers (i.e. strategies) used to resolve non-determinism. When the function is the indicator function of some output set of states, this yields the method for reasoning about the probability of reaching the set of states. Thus, weakest pre-expectation calculus allows reasoning about safety with respect to *sets of states*. In contrast, we are interested in reasoning about safety with respect to *sets of probability distribution over states*. Moreover, while the expressiveness of this calculus allows reasoning about very complex programs, its automation typically requires user input. In this work, we aim for a fully automated approach to checking distribution-based safety.

#### 2 Preliminaries

In this section, we recall basics of probabilistic systems and set up our notation. We assume familiarity with the central ideas of measure and probability theory, see [13] for a comprehensive overview. We write [n] := {1,...,n} to denote the set of all natural numbers from 1 to n. For any set S, we use S to denote its complement. A *probability distribution* on a countable set X is a mapping μ : X → [0, 1], such that - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x)=1. Its *support* is denoted by supp(μ) = {<sup>x</sup> <sup>∈</sup> X | μ(x) > 0}. We write Δ(X) to denote the set of all probability distributions on X. An event happens *almost surely* (a.s.) if it happens with probability 1. We assume that countable sets of states S are equipped with an arbitrary but fixed numbering.

Fig. 1. Our running example MDP. It comprises three states <sup>S</sup> <sup>=</sup> {A, B, C}, depicted by rounded rectangles. In state A, there are two actions available, namely a and b. We have δ(A, a, A)=1 and δ(A, b, B)=1, indicated by arrows. States B and C have only one available action each, thus we omit explicitly labelling them.

#### 2.1 Markov Systems

<sup>A</sup> *(discrete time) Markov chain (MC)* is a tuple M = (S, δ), where <sup>S</sup> is a finite set of *states* and δ : S → Δ(S) a *transition function*, assigning to each state a probability distribution over successor states. A *Markov decision process (MDP)* is a tuple M = (S, *Act*, δ), where S is a finite set of *states*, *Act* is a finite set of *actions*, overloaded to yield for each state s the set of *available actions Act*(s) ⊆ *Act*, and δ : S × *Act* → Δ(S) is a *transition function* that for each state s and (available) action a ∈ *Act*(s) yields a probability distribution over successor states. For readability, we write δ(s, s ) and δ(s, a, s ) instead of δ(s)(s ) and δ(s, a)(s ), respectively. By abuse of notation, we redefine S×*Act* := {(s, a) | s ∈ S ∧ a ∈ *Act*(s)} to refer to the set of state-action pairs. See Fig. 1 for an example MDP. This MDP is our running example and we refer to it throughout this work to point out some of the peculiarities.

An *infinite path* in an MC is an infinite sequence <sup>ρ</sup> <sup>=</sup> <sup>s</sup>1s<sup>2</sup> ··· ∈ <sup>S</sup><sup>ω</sup>, such that for every <sup>i</sup> <sup>∈</sup> <sup>N</sup> we have <sup>δ</sup>(si, si+1) <sup>&</sup>gt; <sup>0</sup>. A *finite path* is a finite prefix of an infinite path. Analogously, infinite paths in MDP are infinite sequences <sup>ρ</sup> <sup>=</sup> <sup>s</sup>1a1s2a<sup>2</sup> ··· ∈ (<sup>S</sup> <sup>×</sup> *Act*)<sup>ω</sup> such that <sup>a</sup><sup>i</sup> <sup>∈</sup> *Act*(si) and <sup>δ</sup>(si, ai, si+1) <sup>&</sup>gt; <sup>0</sup> for every <sup>i</sup> <sup>∈</sup> <sup>N</sup>, and finite paths are finite prefixes thereof. We use <sup>ρ</sup><sup>i</sup> and <sup>i</sup> to refer to the <sup>i</sup>-th state in the given (in)finite path, and IPaths<sup>M</sup> and FPaths<sup>M</sup> for the set of all (in)finite paths of a system M.

*Semantics.* A Markov chain evolves by repeatedly applying the probabilistic transition function in each step. For example, if we start in state s1, we obtain the next state s<sup>2</sup> by drawing a random state according to the probability distribution δ(s1). Repeating this ad infinitum produces a random infinite path. Indeed, together with an initial state <sup>s</sup>, a Markov chain M induces a unique probability measure PrM,s over the (uncountable) set of infinite paths [9].

This reasoning can be lifted to distributions over states, as follows. Suppose we begin in μ<sup>0</sup> = {s<sup>1</sup> → 0.5, s<sup>2</sup> → 0.5}, meaning that initially we are in state s<sup>1</sup> or s<sup>2</sup> with probability 0.5 each. Then, μ1(s ) = μ0(s1)·δ(s1, s )+μ0(s2)·δ(s2, s ), i.e. the probability to be in a state s in the next step is 0.5 times the probability of moving from s<sup>1</sup> and s<sup>2</sup> there, respectively. For an initial distribution, we likewise obtain a probability distribution over infinite paths by setting PrM,μ<sup>0</sup> [S] := - <sup>s</sup>∈<sup>S</sup> <sup>μ</sup>0(s) · PrM,s[S] for measurable <sup>S</sup> <sup>⊆</sup> IPathsM.

In contrast to Markov chains, MDPs also feature non-determinism, which needs be resolved in order to obtain probabilistic behaviour. This is achieved by *(path) strategies*, recipes to resolve non-determinism. Formally, a strategy on an MDP classically is defined as a function <sup>π</sup> : FPaths<sup>M</sup> <sup>→</sup> <sup>Δ</sup>(*Act*), which given a finite path = s0a0s1a<sup>1</sup> ...s<sup>n</sup> yields a probability distribution π() ∈ Δ(*Act*(sn)) on the actions to be taken next. We write Π to denote the set of all strategies. Fixing any strategy π induces a Markov chain <sup>M</sup><sup>π</sup> = (FPathsM, δ<sup>π</sup>), where for a state <sup>=</sup> <sup>s</sup>0a<sup>0</sup> ...s<sup>n</sup> <sup>∈</sup> FPaths<sup>M</sup> the successor distribution is defined as <sup>δ</sup><sup>π</sup>(, an+1sn+1) = <sup>π</sup>(, an+1) · <sup>δ</sup>(sn, an+1, sn+1). (Note that the state space of this Markov chain in general is countably infinite.) Consequently, for each strategy π and initial distribution μ<sup>0</sup> we also obtain a unique probability measure Pr<sup>M</sup>π,μ<sup>0</sup> on the infinite paths of <sup>M</sup>. (Technically, the MC <sup>M</sup><sup>π</sup> induces a probability measure over paths in <sup>M</sup><sup>π</sup>, i.e. paths where each element is a finite path of M, however this can be directly projected to a measure over IPaths<sup>M</sup>.)

A *one-step strategy* (also known as *memoryless* or *positional* strategy) corresponds to a fixed choice in each state, independent of the history, i.e. a mapping π : S → Δ(*Act*). Fixing such a strategy induces a finite state Markov chain <sup>M</sup><sup>π</sup> = (S, δ<sup>π</sup>), where <sup>δ</sup><sup>π</sup>(s, s ) = - <sup>a</sup>∈*Act*(s) <sup>π</sup>(s)(a) · <sup>δ</sup>(s, a, s ). We write Π<sup>1</sup> for the set of all one-step strategies.

A sequence of one-step strategies (πi) <sup>∈</sup> <sup>Π</sup><sup>ω</sup> <sup>1</sup> induces a general strategy which in each step i and state s chooses πi(s). Observe that aside from the state, such a strategy only depends on the current step, also called *Markov strategy*.

#### 2.2 MDPs as Distribution Transformers

Probabilistic systems typically are viewed as "random generators" for paths, and we consequently investigate the (expected) behaviour of a generated path, i.e. path properties. However, in this work we follow a different view, and treat systems as *transformers of distributions*. Formally, fix a Markov chain M. For a given initial distribution μ0, we can define the distribution at step i by μi(s) = Pr<sup>μ</sup><sup>0</sup> [{<sup>ρ</sup> <sup>∈</sup> IPaths<sup>M</sup> <sup>|</sup> <sup>ρ</sup><sup>i</sup> <sup>=</sup> <sup>s</sup>}]. We write <sup>μ</sup><sup>i</sup> <sup>=</sup> <sup>M</sup>(μ0, i) for the <sup>i</sup>-th distribution and <sup>μ</sup><sup>1</sup> <sup>=</sup> <sup>M</sup>(μ0) for the "one-step" application of this transformation. Likewise, we obtain the same notion for an MDP M combined with a strategy π, and write <sup>μ</sup><sup>i</sup> <sup>=</sup> <sup>M</sup><sup>π</sup>(μ0, i), <sup>μ</sup><sup>1</sup> <sup>=</sup> <sup>M</sup><sup>π</sup>(μ0). In summary, for a given initial distribution, a Markov chain induces a unique stream of distributions, and an MDP provides one for each strategy.

This naturally invites questions related to this induced stream of distributions. In their path interpretation, queries such as *reachability* or *safety*, i.e. asking the probability of reaching or avoiding a set of states, allow for simple, polynomial time solutions [9,58]. However, the corresponding notions already are surprisingly difficult in the space of distributions. Thus, we restrict to the *safety problem*, which we introduce in the following. Intuitively, given a *safe set* of distributions over states H ⊆ Δ(S), we are interested in deciding whether the MDP can be controlled such that the stream of distributions always remains inside H.

## 3 Problem Statement and Examples

Let M = (S, *Act*, δ) be an MDP and H ⊆ Δ(S) be a safe set. A distribution <sup>μ</sup><sup>0</sup> is called <sup>H</sup>*-safe under* <sup>π</sup> if <sup>M</sup><sup>π</sup>(μ0, i) <sup>∈</sup> <sup>H</sup> for all <sup>i</sup> <sup>≥</sup> <sup>0</sup>, and <sup>H</sup>*-safe* if there exists a strategy under which μ<sup>0</sup> is safe. We mention two variants of the resulting decision problem as defined in [5]:


Note that we have discussed neither the shape nor the representation of H, which naturally plays an important role for decidability and complexity.

One may be tempted to think that the initialized variant is simpler, as more input is given. However, this problem is known to be Positivity*-hard* <sup>1</sup> already for simple cases and already when H is defined in terms of rational constants!

Theorem 1 ([3]). *The initialized safety problem for Markov chains and* H *given as linear inequality constraint (*<sup>H</sup> <sup>=</sup> {<sup>μ</sup> <sup>|</sup> <sup>μ</sup>(s) <sup>≤</sup> r, s <sup>∈</sup> S, r <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1]}*), is* Positivity*-hard.*

*Proof.* In [3, Corollary 4], the authors show that the inequality version of the Markov reachability problem, i.e. deciding whether there exists an i such that μi(s) > r for a given rational r, is Positivity-hard. The result follows by observing that safety is the negation of reachability. 

Thus, finding a decision procedure for this problem is unlikely, since it would answer several fundamental questions of number theory, see e.g. [41,55,56]. In contrast, the uninitialized problem is known to be decidable for safe sets H given as closed, convex polytopes (see [5] for details and [1] for a different approach specific to Markov chains). In a nutshell, we can restrict to the potential fixpoints of <sup>M</sup>, i.e. all distributions <sup>μ</sup> such that <sup>μ</sup> <sup>=</sup> <sup>M</sup><sup>π</sup>(μ, i) for some strategy <sup>π</sup>. It turns out that this set of distributions is a polytope and the problem – glossing over subtleties – reduces to checking whether the intersection of H with this polytope is non-empty. However, we note that the solution of [5] does not yield the witness strategy. In the following, we thus primarily focus on the initialized question. In Sect. 6, we then show how our approach, which also synthesizes a witness strategy, is directly applicable to the uninitialized case.

In light of the daunting hardness results for the general initialized problem, we restrict to *affine linear safe sets*, i.e. H which are specified by a finite set of affine linear inequalities. Formally, these sets are of the form H = {μ ∈ Δ(S) | <sup>N</sup> <sup>j</sup>=1(c j <sup>0</sup> + n <sup>i</sup>=1 c j <sup>i</sup> · μ(si)) ≥ 0}, where S = {s1,...,sn}, c j <sup>i</sup> are real-valued

<sup>1</sup> Intuitively, the Positivity problem asks for a given rational (or integer or real) matrix M, whether (M*<sup>n</sup>*)<sup>1</sup>*,*<sup>1</sup> > 0 for all n [54]. This problem (and its many variants) has been the subject of intense research over the last 10–15 years, see e.g. [55]. Yet, quite surprisingly, it still remains open in its full generality.

constants and N is the number of affine linear inequalities that define H. Our problem formally is given by the following query.

Problem Statement Given an MDP M, initial distribution μ0, and affine linear safe set H, (i) decide whether μ<sup>0</sup> is H-safe, and (ii) if yes, then synthesize a strategy for M which ensures safety.

Note that the problem strictly subsumes the special case when H is defined in terms of rational constants, and our approach aims to solve both problems. Also, note that Theorem 1 still applies, i.e. this "simplified" problem is Positivityhard, too. We thus aim for a sound and *relatively complete* approach. Intuitively, this means that we restrict our search to a sub-space of possible solutions and within this space provide a complete answer. To give an intuition for the required reasoning, we provide an example safety query together with a manual proof.

*Example 1.* Consider our running example from Fig. 1. Suppose the initial distribution is <sup>μ</sup><sup>0</sup> <sup>=</sup> {<sup>A</sup> → <sup>1</sup> <sup>3</sup> , B → <sup>1</sup> <sup>3</sup> , C → <sup>1</sup> <sup>3</sup> } and (affine linear) <sup>H</sup> <sup>=</sup> {<sup>μ</sup> <sup>|</sup> <sup>μ</sup>(C) <sup>≥</sup> <sup>1</sup> 4 }. This safety query is satisfiable, by, e.g., choosing action b, as we show in the following. First, observe that the i + 1-th distribution is μi+1(A) = <sup>1</sup> <sup>2</sup> · μi(C), μi+1(B) = μi(A), and μi+1(C) = μi(B) + <sup>1</sup> <sup>2</sup>μi(C). Thus, we cannot directly prove by induction that <sup>μ</sup>i(C) <sup>≥</sup> <sup>1</sup> <sup>4</sup> , we also need some information about μi(B) or <sup>μ</sup>i(A) to exclude, e.g., <sup>μ</sup><sup>i</sup> <sup>=</sup> {<sup>A</sup> → <sup>3</sup> <sup>4</sup> , C → <sup>1</sup> <sup>4</sup> }, where μi+1 would violate the safety constraint. We invite the interested reader to try to prove that μ<sup>0</sup> is indeed H-safe under the given strategy to appreciate the subtleties.

We proceed by proving that <sup>μ</sup>i(C) <sup>≥</sup> <sup>1</sup> <sup>4</sup> and additionally μi(A) ≤ μi(C) by induction. The base case follows immediately, thus suppose that μ<sup>i</sup> satisfies these constraints. For <sup>μ</sup>i+1(A) <sup>≤</sup> <sup>μ</sup>i+1(C) observe that <sup>μ</sup>i+1(A) = <sup>1</sup> <sup>2</sup>μi(C) and μi+1(C) = <sup>1</sup> <sup>2</sup>μi(C) + μi(B). Since μi(B) ≥ 0, the claim follows. To prove <sup>μ</sup>i+1(C) <sup>≥</sup> <sup>1</sup> <sup>4</sup> observe that <sup>μ</sup>i(A) <sup>≤</sup> <sup>1</sup> <sup>2</sup> since μi(A) ≤ μi(C) by induction hypothesis and distributions sum up to 1. Moreover, μi+1(C) = μi(B) + <sup>1</sup> <sup>2</sup>μi(C) = 1 <sup>2</sup>μi(B) + <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>μi(A) by again inserting the fact that distributions sum up to 1. Then, μi+1(C) = <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>μi(A) + <sup>1</sup> <sup>2</sup>μi(B) <sup>≥</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>μi(A) <sup>≥</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>4</sup> <sup>≥</sup> <sup>1</sup> <sup>4</sup> . 

Thus, already for rather simple examples the reasoning is non-trivial. To further complicate things, the structure of strategies can also be surprisingly complex:

*Example 2.* Again consider our running example from Fig. 1 with initial distribution <sup>μ</sup><sup>0</sup> <sup>=</sup> {<sup>A</sup> → <sup>3</sup> <sup>4</sup> , B → <sup>1</sup> <sup>4</sup> } and safe set <sup>H</sup> <sup>=</sup> {<sup>μ</sup> <sup>|</sup> <sup>μ</sup>(B) = <sup>1</sup> <sup>4</sup> }. This safety condition is indeed satisfiable, however the (unique) optimal strategy requires both infinite memory as well as randomization with arbitrarily small fractions! In step 1, we require choosing a with <sup>2</sup> <sup>3</sup> and <sup>b</sup> with <sup>1</sup> <sup>3</sup> to satisfy the safety constraint in the second step, getting <sup>μ</sup><sup>1</sup> <sup>=</sup> {<sup>A</sup> → <sup>1</sup> <sup>2</sup> , B → <sup>1</sup> <sup>4</sup> , C → <sup>1</sup> 4 }. For step 2, we require choosing both a and b with probability <sup>1</sup> <sup>2</sup> each, yielding <sup>μ</sup><sup>2</sup> <sup>=</sup> {<sup>A</sup> → <sup>3</sup> <sup>8</sup> , B → <sup>1</sup> <sup>4</sup> , C → <sup>3</sup> <sup>8</sup> }. Continuing this strategy, we obtain at step i that <sup>μ</sup><sup>i</sup> <sup>=</sup> {<sup>A</sup> → <sup>1</sup> <sup>4</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup>i+1 , B → <sup>1</sup> <sup>4</sup> , C → <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>i+1 } and action a is chosen with probability <sup>1</sup>/(2<sup>i</sup>−<sup>1</sup> + 1), converging to <sup>1</sup>.  In the following, we provide two algorithms that handle both examples. Our first algorithm focusses on memoryless strategies, the second considers a certain type of infinite memory strategies. Essentially, the underlying idea is to automatically synthesize a strategy together with such inductive proofs of safety.

## 4 Proving Safety by Invariants

We now discuss our principled idea of proving safety by means of (inductive) invariants, taking inspiration from research on safety analysis in programs [20, 27]. We first show that considering strategies which are purely based on the current distribution over states are sufficient. Then, we show that inductive invariants are a *sound and complete* certificate for safety. Together, we obtain that an initial distribution is H-safe *if and only if* there exists an invariant set I and distribution strategy π such that (i) the initial distribution is contained in I, (ii) I is a subset of the safe set H, and (iii) I is inductive under π, i.e. if <sup>μ</sup> <sup>∈</sup> <sup>I</sup> then <sup>M</sup><sup>π</sup>(μ) <sup>∈</sup> <sup>I</sup>. In the following section, we then show how we search for invariants and distribution strategies *of a particular shape*.

#### 4.1 Distribution Strategies

We show that *distribution strategies* π : Δ(S) → Π1, yielding for each distribution over states a one-step strategy to take next, are sufficient for the problem at hand. More formally, we want to show that an H-safe distribution strategy exists if and only if there exists any H-safe strategy.

First, observe that distribution strategies are a special case of regular path strategies. In particular, for any given initial distribution, we obtain a uniquely determined stream of distributions as <sup>μ</sup>i+1 <sup>=</sup> <sup>M</sup><sup>π</sup>(μi) (μi), i.e. the distribution μi+1 is obtained by applying the one-step strategy π(μi) to μi. In turn, this lets us define the Markov strategy πˆi(s) = π(μi)(s). For simplicity, we identify distribution strategies with their induced path strategy.

Next, we argue that restricting to distribution strategies is sufficient.

Theorem 2. *An initial distribution* μ<sup>0</sup> *is* H*-safe if and only if there exists a distribution strategy* π *such that* μ<sup>0</sup> *is* H*-safe under* π*.*

*Proof (Sketch).* The full proof can be found in [4, Sec. 4.1]. Intuitively, only the "distribution" behaviour of a strategy is relevant and we can sufficiently replicate the behaviour of any safe strategy by a distribution strategy. 

In this way, each MDP corresponds to a (uncountably infinite) transition system T<sup>M</sup> = (Δ(S), T) where (μ, μ ) ∈ T if there exists a one-step strategy <sup>π</sup> such that <sup>μ</sup> <sup>=</sup> <sup>M</sup><sup>π</sup>(μ). Note that <sup>T</sup><sup>M</sup> is a purely non-deterministic system, without any probabilistic behaviour. So, our decision problem is equivalent to asking whether the induced transition system T<sup>M</sup> can be controlled in a safe way. Note that T<sup>M</sup> is uncountably large and uncountably branching.

#### 4.2 Distributional Invariants for MDP Safety

We now define distributional invariants in MDPs and show that they provide sound and complete certificates for proving initialized (and uninitialized) safety.

*Distributional Invariants in MDPs.* Intuitively, a distributional invariant is a set of probability distributions over MDP states that contains all probability distributions that can arise from applying a strategy to an initial probability distribution, i.e. the complete stream μi. Hence, similar to the safe set H, distributional invariants are also defined to be subsets of Δ(S).

Definition 1 (Distributional Invariants). *Let* μ<sup>0</sup> ∈ Δ(S) *be a probability distribution over* S *and* π *be a strategy in* M*. A set* I ⊆ Δ(S) *is said to be a* distributional invariant for μ<sup>0</sup> under π *if the sequence of probability distributions induced by applying the strategy* π *to the initial probability distribution* μ<sup>0</sup> *is contained in* <sup>I</sup>*, i.e. if* <sup>M</sup><sup>π</sup>(μ0, i) <sup>∈</sup> <sup>I</sup> *for each* <sup>i</sup> <sup>≥</sup> <sup>0</sup>*.*

*A distributional invariant* I *is said to be* inductive under π*, if we furthermore have that* <sup>M</sup><sup>π</sup>(μ) <sup>∈</sup> <sup>I</sup> *holds for any* <sup>μ</sup> <sup>∈</sup> <sup>I</sup>*, i.e. if* <sup>I</sup> *is "closed" under application of* <sup>M</sup><sup>π</sup> *to any probability distribution contained in* <sup>I</sup>*.*

*Soundness and Completeness for MDP Safety.* The following theorem shows that, in order to solve the initialized (and uninitialized) safety problem, one can equivalently search for a distributional invariant that is fully contained in H. Furthermore, it shows that one can without loss of generality restrict the search to inductive distributional invariants.

Theorem 3 (Sound and Complete Certificate). *Let* μ<sup>0</sup> ∈ Δ(S) *be a probability distribution over* S*,* π *be a strategy in* M*, and* H ⊆ Δ(S) *be a safe set. Then* μ<sup>0</sup> *is* H*-safe under* π *if and only if there exists an inductive distributional invariant* I *for* μ<sup>0</sup> *and* π *such that* I ⊆ H*.*

The proof can be found in [4, Sec. 4.2].

Thus, in order to solve the initialized safety problem for μ0, it suffices to search for (i) a strategy π and (ii) an inductive distributional invariant I for μ<sup>0</sup> and π such that I ⊆ H. On the other hand, in order to solve the uninitialized safety problem, it suffices to search for (i) an initial probability distribution μ0, (ii) strategy π, and (iii) an inductive distributional invariant I for μ<sup>0</sup> and π such that I ⊆ H. In the following, we provide a fully automated, sound and *relatively* complete method of deciding the existence of such an invariant and strategy.

#### 5 Algorithms for Distributional Invariant Synthesis

We now present two algorithms for automated synthesis of strategies and inductive distributional invariants towards solving distribution safety problems in MDPs. The two algorithms differ in the kind of strategies they consider and, as a consequence of differences in the involved expressions, also in their completeness guarantees. For readability, we describe the algorithms in their basic form applied to the initialized variant of the safety problem and discuss further extensions in Sect. 6. In particular, our approach is also directly applicable to the uninitialized variant, as we describe there.

We say that an inductive distributional invariant is *affine* if it can be specified in terms of (non-strict) affine inequalities, which we formalize below. Both algorithms jointly synthesize a strategy and an affine inductive distributional invariant by employing a *template-based synthesis* approach. In particular, they fix symbolic templates for each object that needs to be synthesized, encode the defining properties of each object as constraints over unknown template variables, and solve the system of constraints by reduction to the existential first-order theory of the reals.

For example, a template for an affine linear constraint on distributions Δ(S) is given by aff(μ)=(c<sup>0</sup> <sup>+</sup> <sup>c</sup><sup>1</sup> · <sup>μ</sup>(s1) + ··· <sup>+</sup> <sup>c</sup><sup>n</sup> · <sup>μ</sup>(sn) <sup>≥</sup> 0). Here, the variables <sup>c</sup><sup>0</sup> to cn, written in grey for emphasis, are the *template variables*. For fixed values of these variables the expression aff is a concrete affine linear predicate over distributions. Thus, we can ask questions like "Do there exist values for c<sup>i</sup> such that for all distributions <sup>μ</sup> we have that aff(μ) implies aff(M<sup>π</sup>(μ))?". This is a sentence in the theory of reals – however with quantifier alternation. As a next step, template-based synthesis approaches then employ various quantifier elimination techniques to convert such expressions into equisatisfiable sentences in, e.g., the existential theory of reals, which is decidable in PSPACE [15].

*Difference between the Algorithms.* Our two algorithms differ in their applicability and the kind of completeness guarantees that they provide. In terms of applicability, the first algorithm only considers *memoryless* strategies, while the second algorithm searches for *distribution* strategies specified as fractions of affine linear expressions. (We discuss an extension to rational functions in Sect. 6.) In terms of completeness guarantees, the first algorithm is *(relatively) complete* in the sense that it is guaranteed to compute a memoryless strategy and an affine inductive distributional invariant that prove safety *whenever they exist*. In contrast, the second algorithm does not provide the same level of completeness.

*Notation.* In what follows, we write ≡ to denote (syntactic) equivalence of expressions, to distinguish from relational symbols used inside these expressions, such as "=". For example Φ(x) ≡ x = 0 means that Φ(x) is the predicate x = 0. Moreover, (x1,...,xn) denotes a symbolic probability distribution over the state space S = (s1,...,sn), where x<sup>i</sup> is a symbolic variable that encodes the probability of the system being in si. We use boldface notation *x* = (x1,...,xn) to denote the vector of symbolic variables. Thus, the above example would be written aff(*x*) <sup>≡</sup> <sup>c</sup><sup>0</sup> <sup>+</sup> <sup>c</sup><sup>1</sup> · <sup>x</sup><sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>c</sup><sup>n</sup> · <sup>x</sup><sup>n</sup> <sup>≥</sup> <sup>0</sup>. Since we often require vectors to represent a distribution, we write *x* ∈ Δ(S) as abbreviation for the predicate <sup>n</sup> <sup>i</sup>=1(0 ≤ x<sup>i</sup> ≤ 1) ∧ ( n <sup>i</sup>=1x<sup>i</sup> = 1).

*Algorithm Input and Assumptions.* Both algorithms take as input an MDP M = (S, *Act*, δ) with S = {s1,...,sn}. They also take as input a safe set H ⊆ Δ(S). We assume that H is specified by a boolean predicate over n variables as a logical conjunction of <sup>N</sup><sup>H</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> *affine* inequalities, and that it has the form

$$H(x) \equiv (x \in \Delta(S)) \land \bigwedge\_{i=1}^{N\_H} (h^i(x) \ge 0),$$

where the first term imposes that *x* is a probability distribution over S and hi (*x*) = h<sup>i</sup> <sup>0</sup> + h<sup>i</sup> <sup>1</sup> · <sup>x</sup><sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>h</sup><sup>i</sup> <sup>n</sup> · x<sup>n</sup> is an affine expression over *x* with realvalued coefficients h<sup>i</sup> <sup>j</sup> for each <sup>i</sup> <sup>∈</sup> [NH] and <sup>j</sup> ∈ {0,...,n}. (Note that <sup>h</sup><sup>i</sup> <sup>j</sup> are not template variables but fixed values, given as input.) Next, the algorithms take as input an initial probability distribution μ<sup>0</sup> ∈ Δ(S). Finally, the algorithms also take as input technical parameters. Intuitively, these describe the size of used *symbolic templates*, explained later. For the remainder of the section, fix an initialized safety problem, i.e. an M, safe set H of the required form, and an initial distribution μ0.

#### 5.1 Synthesis of Affine Invariants and Memoryless Strategies

We start by presenting our first algorithm, which synthesizes memoryless strategies and affine inductive distributional invariants. We refer to this algorithm as AlgMemLess. The algorithm proceeds in the following four steps:


We now describe each step in detail.

*Step 1: Setting up Templates.* The algorithm sets templates for π and I as follows:

– Since this algorithm searches for memoryless strategies, the probability of taking an action a<sup>j</sup> in state s<sup>i</sup> is always the same, independent of the current distribution. Hence, our template for π consists of a symbolic template variable p<sup>s</sup>i,a<sup>j</sup> for each s<sup>i</sup> ∈ S, a<sup>j</sup> ∈ *Act*(si). We write p<sup>s</sup>i,◦ = (p<sup>s</sup>i,a<sup>1</sup> ,...,p<sup>s</sup>i,am) to refer to the corresponding distribution in state si.

– The template of I is given by a boolean predicate specified by a conjunction of N<sup>I</sup> affine inequalities, where N<sup>I</sup> is the *template size* and is an algorithm parameter. In particular, the template of I looks as follows:

$$I(x) \equiv (x \in \Delta(S)) \land \bigwedge\_{i=1}^{N\_I} (a\_0^i + a\_1^i \cdot x\_1 + \dots + a\_n^i \cdot x\_n \ge 0).$$

The first predicate enforces that I only contains vectors that define probability distributions over S.

*Step 2: Constraint Collection.* We now collect the constraints over symbolic template variables which encode that π is a memoryless strategy, that I contains the initial distribution μ0, that I is an inductive distributional invariant under π, and that I is contained in H.

– For π to be a strategy, we only need to ensure that each p<sup>s</sup>i,◦ is a probability distribution over the set of available actions at every state si. Thus, we set

$$\Phi\_{\text{start}} \equiv \bigwedge\_{i=1}^{n} \left( p\_{s\_i, \circ} \in \Delta(Act(s\_i)) \right).$$

– For I to be a distributional invariant for π and μ<sup>0</sup> as well as to be inductive, it suffices to enforce that I contains μ<sup>0</sup> and that I is closed under application of π. Thus, we collect two constraints:

$$\begin{aligned} \Phi\_{\text{initial}} & \equiv I(\mu\_0) \equiv \bigwedge\_{i=1}^{N\_I} (a\_0^i + a\_1^i \cdot \mu\_0^1 + \dots \, a\_n^i \cdot \mu\_0^n \ge 0), \text{ and} \\\Phi\_{\text{inductive}} & \equiv \left( \forall x \in \mathbb{R}^n. \, I(x) \Longrightarrow I(\text{step}(x)) \right), \end{aligned}$$

where step(*x*)(xi) = - <sup>s</sup>k∈S,aj∈*Act*(sk) <sup>p</sup><sup>s</sup>k,a<sup>j</sup> · <sup>δ</sup>(sk, a<sup>j</sup> , si)· <sup>x</sup><sup>j</sup> yields the distribution after applying one step of the strategy induced by Φstrat to *x*.

– For I to be contained in H, we enforce the constraint:

$$\Phi\_{\text{safe}} \equiv \left( \forall x \in \mathbb{R}^n. \ I(x) \Longrightarrow H(x) \right).$$

*Step 3: Quantifier Elimination.* Constraints Φstrat and Φinitial are purely existentially quantified over symbolic template variables, thus we can solve them directly. However, Φinductive and Φsafe contain both universal and existential quantifiers, which are difficult to handle. In what follows, we show how the algorithm translates these constraints into equisatisfiable *purely existentially quantified* constraints. In particular, our translation exploits the fact that both Φinductive and Φsafe can, upon splitting the conjunctions on the right-hand side of implications into conjunctions of implications, be expressed as conjunctions of constraints of the form

$$\forall \mathbf{x} \in \mathbb{R}^n. \ (\text{affexp}\_1(\mathbf{x}) \ge 0) \land \dots \land (\text{affexp}\_N(\mathbf{x}) \ge 0) \implies (\text{affexp}(\mathbf{x}) \ge 0).$$

Here, each affexpi(*x*) and affexp(*x*) is an affine expression over *x* whose affine coefficients are either concrete real values or symbolic template variables.

.

In particular, we use Farkas' lemma [31] to remove universal quantification and translate the constraint into an equisatisfiable existentially quantified system of constraints over the symbolic template variables, as well as fresh auxiliary variables that are introduced by the translation. For completeness, we briefly recall (a strengthened and adapted version of) Farkas' lemma.

Lemma 1 ([31,37]). *Let* X = {x1,...,xn} *be a finite set of real-valued variables, and consider the following system of* <sup>N</sup> <sup>∈</sup> <sup>N</sup> *affine inequalities over* <sup>X</sup> *:*

$$\Phi: \begin{cases} c\_0^1 + c\_1^1 \cdot x\_1 + \dots + c\_n^1 \cdot x\_n \ge 0 \\ \vdots \\ c\_0^N + c\_1^N \cdot x\_1 + \dots + c\_n^N \cdot x\_n \ge 0 \end{cases}$$

*Suppose that* Φ *is satisfiable. Then* Φ *entails an affine inequality* φ ≡ c0+c<sup>1</sup> ·x1+ ···+c<sup>n</sup> ·xn*, i.e.* Φ =⇒ φ*, if and only if* φ *can be written as a* non-negative *linear combination of affine inequalities in* Φ*, i.e. if and only if there exist* y1,...,y<sup>n</sup> ≥ 0 *such that* c<sup>1</sup> = -N <sup>j</sup>=1 y<sup>j</sup> · c j <sup>1</sup>*, ...,* c<sup>n</sup> = -N <sup>j</sup>=1 <sup>y</sup><sup>j</sup> · <sup>c</sup><sup>j</sup> n*.*

Note that, for any implication appearing in Φinductive and Φsafe, the system of constraints on the left-hand side is simply I(*x*), and the satisfiability of I(*x*) is enforced by Φinitial. Hence, we may apply Farkas lemma to translate each constraint with universal quantification into an equivalent purely existentially quantified constraint. In particular, for any constraint of the form

$$\forall \mathbf{x} \in \mathbb{R}^n. \left( \text{affexp}\_1(\mathbf{x}) \ge 0 \right) \land \dots \land \left( \text{affexp}\_N(\mathbf{x}) \ge 0 \right) \implies \left( \text{affexp}(\mathbf{x}) \ge 0 \right),$$

we introduce fresh template variables y1,...,y<sup>N</sup> and translate it into the system of purely existentially quantified constraints

$$(y\_1 \ge 0) \land \dots \land (y\_N \ge 0) \land (\text{affexp}(\mathbf{z}) \equiv\_F y\_1 \cdot \text{affexp}\_1(\mathbf{z}) + \dots + y\_N \cdot \text{affexp}\_N(\mathbf{z})) .$$

Here, we use affexp(*x*) ≡<sup>F</sup> y<sup>1</sup> · affexp1(*x*) + ··· + y<sup>N</sup> · affexp<sup>N</sup> (*x*) to denote the set of n + 1 equalities over the symbolic template variable and y1,...,y<sup>N</sup> which equate the constant coefficients as well as the linear coefficients of each x<sup>i</sup> on two sides of the equivalence, i.e. exactly those equalities which we obtain from applying Farkas' lemma. We highlight that the expressions affexp are only affine linear for *fixed* existentially quantified variables, i.e. they are in general quadratic.

*Step 4: Constraint Solving.* Finally, we feed the resulting system of existentially quantified polynomial constraints over the symbolic template variables as well as the auxiliary variables introduced by applying Farkas' lemma to an off-the-shelf constraint solver. If the solver outputs a solution, we conclude that the computed invariant I is an inductive distributional invariant for the strategy π and initial distribution μ0, and that I is contained in H. Therefore, by Theorem 3, we conclude that μ<sup>0</sup> is H-safe under π.

$$\begin{aligned} \Phi\_{\text{init}} &: c\_0 + c\_1 \cdot \frac{1}{3} + c\_2 \cdot \frac{1}{3} + c\_3 \cdot \frac{1}{3} \ge 0 \\ \Phi\_{\text{safe}} &: \{c\_0 + c\_1 \cdot A + c\_2 \cdot B + c\_3 \cdot C \ge 0\} \Longrightarrow C \ge \frac{1}{4} \\ \Phi\_{\text{inductive}} &: \begin{cases} c\_0 + c\_1 \cdot A + c\_2 \cdot B + c\_3 \cdot C \ge 0 \end{cases} \Longrightarrow \end{aligned}$$

$$\Phi\_{\text{inductive}} : \begin{aligned} \{c\_0 + c\_1 \cdot A + c\_2 \cdot B + c\_3 \cdot C \ge 0\} \Longrightarrow \\ c\_0 + c\_1 \cdot \{A \cdot p\_{A, a\_1} + \frac{1}{2}C\} + c\_2 \cdot A \cdot p\_{A, a\_2} + c\_3 \cdot (B + \frac{1}{2}C) \ge 0 \end{aligned}$$

$$\Phi\_{\text{stat}} : p\_{A, a\_1} \ge 0 \quad p\_{A, a\_2} \ge 0 \quad p\_{A, a\_1} + p\_{A, a\_2} = 1 \end{aligned}$$

Fig. 2. List of constraints generated in Step 2 for Example <sup>1</sup> with <sup>N</sup>*<sup>I</sup>* = 1. The uppercase letters correspond to variables indicating the distribution in these states, i.e. A refers to μ(A). These also are the universally quantified variables, which will be handled by the quantifier elimination in Step 3. The template variables are written in grey. For readability, we omit the constraints required for state distributions μ ∈ Δ(S), i.e. A ≥ 0 etc. The actual query sent to the solver in Step 4 after quantifier elimination comprises 27 constraints with 21 variables.

Theorem 4. Soundness*: Suppose* AlgMemLess *returns a memoryless strategy* <sup>π</sup> *and an affine inductive distributional invariant* I*. Then,* μ<sup>0</sup> *is* H*-safe under* π*.*

Completeness*: If there exist a memoryless strategy* π *and an affine inductive distributional invariant* I *such that* I ⊆ H *and* μ<sup>0</sup> *is* H*-safe under* π*, then there exists a minimal value of the template size* <sup>N</sup><sup>I</sup> <sup>∈</sup> <sup>N</sup> *such that* <sup>π</sup> *and* <sup>I</sup> *are produced by* AlgMemLess*.*

Complexity*: The runtime of* AlgMemLess *is in PSPACE in the size of the MDP, the encoding of the safe set* <sup>H</sup> *and the template size parameter* <sup>N</sup><sup>I</sup> <sup>∈</sup> <sup>N</sup>*.*

The proof can be found in [4, Sec. 5.1]. We comment on the PSPACE upper bound on the complexity of AlgMemLess. The upper bound holds since the application of Farkas' lemma reduces synthesis to solving a sentence in the existential first-order theory of the reals and since the size of the sentence is polynomial in the sizes of the MDP, the encoding of the safe set H and the invariant template size Ni. However, it is unclear whether the resulting constraints could be solved more efficiently, and the best known upper bound on the time complexity of algorithms for template-based affine inductive invariant synthesis in programs is also PSPACE [8,27]. Designing more efficient algorithms for solving constraints of this form would lead to better algorithms both for the safety problem studied in this work and for template-based affine inductive invariant synthesis in programs.

*Example 3.* For completeness, we provide the constraints generated in Step 2 for Example 1 with N<sup>I</sup> = 1 for readability, i.e. our running example Fig. 1 with <sup>μ</sup><sup>0</sup> <sup>=</sup> {<sup>A</sup> → <sup>1</sup> <sup>3</sup> , B → <sup>1</sup> <sup>3</sup> , C → <sup>1</sup> <sup>3</sup> } and <sup>H</sup> <sup>=</sup> {<sup>μ</sup> <sup>|</sup> <sup>μ</sup>(C) <sup>≥</sup> <sup>1</sup> <sup>4</sup> }, in Fig. 2.

To conclude this section, we emphasize that our algorithm *simultaneously* synthesizes both the invariant and the witnessing strategy, which is the key component to achieve relative completeness.

#### 5.2 Synthesis of Affine Invariants and General Strategies

We now present our second algorithm, which additionally synthesizes *distribution strategies* (of a particular shape) together with an affine inductive distributional invariant. We refer to it as AlgDist. The second algorithm proceeds in the analogous four steps as the first algorithm, AlgMemLess. Hence, in the interest of space, we only discuss the differences compared to AlgMemLess.

*Step 1: Setting up Templates.* The algorithm sets up templates for π and I. The template for I is defined analogously as in Sect. 5.1. However, as we now want to search for a strategy π that need not be memoryless but instead may depend on the current distribution, we need to consider a more general template. In particular, the template for the probability p<sup>s</sup>i,a<sup>j</sup> of taking an action a<sup>j</sup> in state s<sup>i</sup> is no longer a constant value. Instead, p<sup>s</sup>i,a<sup>j</sup> (*x*) is a function of the probability distribution *x* of the current state of the MDP, and we define its template to be a quotient of two affine expressions for each s<sup>i</sup> ∈ S and a<sup>j</sup> ∈ *Act*(si):

$$p\_{s\_i, a\_j}(\mathbf{z}) \equiv \frac{\text{num}(s\_i, a\_j)(\mathbf{z})}{\text{den}(s\_i)(\mathbf{z})} \equiv \frac{r\_0^{i,j} + r\_1^{i,j} \cdot x\_1 + \dots + r\_n^{i,j} \cdot x\_n}{s\_0^i + s\_1^i \cdot x\_1 + \dots + s\_n^i \cdot x\_n}$$

.

(In Sect. 6, we discuss how to extend our approach to polynomial expressions for numerator and denominator, i.e. rational functions.) Note that the coefficients in the numerator depend both on the state s<sup>i</sup> and the action a<sup>j</sup> , whereas the coefficients in the denominator depend only on the state si. This is because we only use the affine expression in the denominator as a normalization factor to ensure that p<sup>s</sup>i,a<sup>i</sup> indeed defines a probability.

*Step 2: Constraint Collection.* As before, the algorithm now collects the constraints over symbolic template variables which encode that π is a strategy, that I is an inductive distributional invariant, and that I is contained in H. The constraints Φinitial, Φinductive, and Φsafe are defined analogously as in Sect. 5.1, with the necessary adaptation to step(*x*). For the strategy constraint Φstrat we now need to take additional care to ensure that each quotient template defined above does not induce division by 0 and that these values indeed correspond to a distribution over the available actions. We ensure this by the following constraint:

$$\Phi\_{\text{start}} \equiv \forall x \in \mathbb{R}^n. \ I(x) \Longrightarrow \bigwedge\_{i=1}^n \begin{pmatrix} \bigwedge\_{a\_j \in Act(s\_i)} \text{num}(s\_i, a\_j)(x) \ge 0 \land \\ \text{den}(s\_i)(x) \ge 1 \land \\ \sum\_{a\_j \in Act(s\_i)} \text{num}(s\_i, a\_j)(x) = \text{den}(s\_i)(x) . \end{pmatrix}.$$

The first two constraints ensure that all quantities are positive and we never divide by 0. The third means that the numerators sum up to the denominator. Together, this ensures the desired result, i.e. p<sup>s</sup>i,◦(*x*) ∈ Δ(*Act*(si)) whenever *x* ∈ Δ(S). Note that the ≥ 1 constraint for the denominator can be replaced by an arbitrary constant > 0, since we can always rescale all involved coefficients.

*Step 3: Quantifier Elimination.* The constraints Φstrat, Φinitial, and Φsafe can be handled analogously to Sect. 5.1. In particular, by applying Farkas' lemma these can be translated into an equisatisfiable purely existentially quantified system of polynomial constraints, and our algorithm applies this translation.

However, the constraint Φinductive now involves quotients of affine expressions: Upon splitting the conjunction on the right-hand side of the implication in Φinductive into a conjunction of implications, the inequalities on the right-hand side of these implications contain templates for strategy probabilities psi,a<sup>j</sup> (*x*). The algorithm removes the quotients by multiplying both sides of the inequality by denominators of each quotient. (Recall that each denominator is positive by the constraint Φstrat.) This results in the multiplication of symbolic affine expressions, hence Φinductive becomes a conjunction of implications of the form

$$\forall x \in \mathbb{R}^n. \left( \text{affexp}\_1(x) \ge 0 \right) \land \dots \land \left( \text{affexp}\_N(x) \ge 0 \right) \implies \left( \text{poleyexp}(x) \ge 0 \right).$$

Here, each affexpi(*x*) is an affine expression over *x*, but polyexp(*x*) is now a polynomial expression over *x*. Hence we cannot apply a Farkas' lemma-style result to remove universal quantifiers.

Instead, we motivate our translation by recalling Handelman's theorem [38], which characterizes *strictly* positive polynomials over a set of affine inequalities. It will allow us to soundly translate Φinductive into an existentially quantified system of constraints over the symbolic template variables, as well as fresh auxiliary variables that are introduced by the translation.

Theorem 5 ([38]). *Let* X = {x1,...,xn} *be a finite set of real-valued variables, and consider the following system of* <sup>N</sup> <sup>∈</sup> <sup>N</sup> *non-strict affine inequalities over* X *:*

.

$$\Phi: \begin{cases} c\_0^1 + c\_1^1 \cdot x\_1 + \dots + c\_n^1 \cdot x\_n \ge 0 \\ \vdots \\ c\_0^N + c\_1^N \cdot x\_1 + \dots + c\_n^N \cdot x\_n \ge 0 \end{cases}$$

*Let Prod*(Φ) = { t <sup>i</sup>=1 <sup>φ</sup><sup>i</sup> <sup>|</sup> <sup>t</sup> <sup>∈</sup> <sup>N</sup>0, φ<sup>i</sup> <sup>∈</sup> <sup>Φ</sup>} *be the set of all products of finitely many affine expressions in* Φ*, where the product of* 0 *affine expressions is a constant expression* 1*. Suppose that* Φ *is satisfiable and that* {*y* | *y* |= Φ}*, the set of values satisfying* Φ*, is topologically compact, i.e. closed and bounded. Then* Φ *entails a polynomial inequality* φ(*x*) > 0 *if and only if* φ *can be written as a non-negative linear combination of finitely many products in Prod*(Φ)*, i.e. if and only if there exist* y1,...,y<sup>n</sup> ≥ 0 *and* φ1,...,φ<sup>n</sup> ∈ *Prod*(Φ) *such that* φ = y<sup>1</sup> · φ<sup>1</sup> + ··· + y<sup>n</sup> · φn*.*

Notice that we cannot directly apply Handelman's theorem to a constraint

$$\forall \mathbf{x} \in \mathbb{R}^n. \ (\text{affexp}\_1(\mathbf{x}) \ge 0) \land \dots \land (\text{affexp}\_N(\mathbf{x}) \ge 0) \implies (\text{poleyexp}(\mathbf{x}) \ge 0),$$

since the polynomial inequality on the right-hand-side of the implication is nonstrict whereas the polynomial inequality in Handelman's theorem is strict. However, the direction needed for the soundness of translation holds even with the non-strict polynomial inequality on the right-hand side. In particular, it clearly holds that if polyexp can be written as a non-negative linear combination of finitely many products of affine inequalities, then polyexp is non-negative whenever all affine inequalities are non-negative. Hence, we may use the translation in Handelman's theorem to translate each implication in Φinductive into a system of purely existentially quantified constraints.

As Handelman's theorem does not impose a bound on the number of products of affine expressions that might appear in the translation, we *parametrize* the algorithm with an upper bound K on the maximal number of affine inequalities appearing in each product. To that end, we define ProdK(Φ) = { t <sup>i</sup>=1 φ<sup>i</sup> | 0 ≤ t ≤ K, φ<sup>i</sup> ∈ Φ}. Let M<sup>K</sup> = |ProdK(Φ)| be the total number of such products and ProdK(Φ) = {φ1,...,φ<sup>M</sup><sup>K</sup> }. Then, for any constraint of the form

$$\forall \mathbf{x} \in \mathbb{R}^n. \left( \text{affexp}\_1(\mathbf{z}) \ge 0 \right) \land \dots \land \left( \text{affexp}\_N(\mathbf{z}) \ge 0 \right) \implies \left( \text{polyexp}(\mathbf{z}) \ge 0 \right),$$

we introduce fresh template variables y1,...,y<sup>M</sup><sup>K</sup> and translate it into the system of purely existentially quantified constraints

$$(y\_1 \ge 0) \land \dots \land (y\_N \ge 0) \land (\text{polyexp}(\mathbf{z}) \equiv\_H y\_1 \cdot \phi\_1(\mathbf{z}) + \dots + y\_{M\_K} \cdot \phi\_{M\_K}(\mathbf{z})).$$

Here, polyexp(*x*) ≡<sup>H</sup> y<sup>1</sup> ·φ1(*x*)+···+y<sup>M</sup><sup>K</sup> ·φ<sup>M</sup><sup>K</sup> (*x*) denotes the set of equalities over template variables and y1,...,y<sup>M</sup><sup>K</sup> which equate the constant coefficients as well as the coefficients of each monomial over {x1,...,xk} of degree at most K on two sides of the equivalence, as specified by Handelman's theorem.

While our translation into a purely existentially quantified constraints is not complete due to the non-strict polynomial inequality and due to the parametrization by K, Handelman's theorem justifies the translation as it indicates that the translation is "close to complete" for sufficiently large values of K.

*Step 4: Constraint Solving.* This step is analogous to Sect. 5.1 and we use an offthe-shelf polynomial constraint solver to handle the resulting system of purely existentially quantified polynomial constraints. If the solver outputs a solution, we conclude that the computed I is an inductive distributional invariant for the computed strategy π and initial distribution μ0, and that I is contained in H. Therefore, by Theorem 3, we conclude that μ<sup>0</sup> is H-safe under π.

Theorem 6. Soundness*: Suppose* AlgDist *returns a strategy* <sup>π</sup> *and an affine inductive distributional invariant* I*. Then,* π *is* H*-safe for* μ0*.*

Complexity*: For any fixed parameter* <sup>K</sup> <sup>∈</sup> <sup>N</sup>*, the runtime of* AlgDist *is in PSPACE in the size of the MDP and the template size parameter* <sup>N</sup><sup>I</sup> <sup>∈</sup> <sup>N</sup>*.*

The proof can be found in [4, Sec. 5.2].

#### 6 Discussion, Extensions, and Variants

With our two algorithms in place, we remark on several interesting details and possibilities for extensions.

*Polynomial Expressions.* Our second algorithm can also be extended to synthesizing *polynomial* inductive distributional invariants, i.e. instead of defining the invariant I through a conjunction of affine linear expressions we could synthesize polynomial expressions such as x<sup>2</sup> <sup>1</sup> + x<sup>2</sup> · x<sup>3</sup> ≤ 0.5. This can be achieved by using Putinar's Positivstellensatz [59] instead of Handelman's theorem in Step 3. This technique has recently been used for generating polynomial inductive invariants in programs in [20], and our translation in Step 3 can be analogously adapted to synthesize polynomial inductive distributional invariants up to a specified degree. In the same way, instead of requiring that H is given as a conjunction of affine linear constraints, we can also handle the case of polynomial constraints. The same holds true for the probabilities of choosing certain actions p<sup>s</sup>i,a<sup>j</sup> (*x*). While we have defined these as fractions of affine linear expressions, we could replace them with rational functions, which we chose to exclude for sake of readability.

*Uninitialized and Restricted Initial Case.* We remark that we can directly incorporate the uninitialized case in our algorithm. In particular, instead of requiring that I(μ0) holds for the concretely given initial values, we can instead existentially quantify over the values of μ0(si) and add the constraint that μ<sup>0</sup> is a distribution, i.e. μ0(si) ∈ Δ(S). This does not add universal quantification, thus we do not need to apply any quantifier elimination for these variables. This also subsumes and generalizes the ideas of [5], which observes that checking whether a fixpoint of the transition dynamics lies within H is sufficient. Choosing I = {μ<sup>∗</sup>} where μ<sup>∗</sup> is such a fixpoint satisfies all of our constraints. See [4, Sec. 6] for details.

Our algorithm is also able to handle the "intermediate" case, as follows. The uninitialized case leaves absolute freedom in the choice of initial distribution, while the initialized case concretely specifies one initial distribution. Here, we could as well impose *some* constraints on the initial distribution without fixing it completely, i.e. ask whether there exists an H-safe initial distribution μ<sup>0</sup> which satisfies a predicate Φinit. If Φinit is a conjunction of affine linear constraints, we can directly handle this query, too. Note that both initialized and uninitialized are special cases thereof.

*Non-Inductive Initial Steps.* Instead of requiring to synthesize an invariant which contains the initial distribution, we can explicitly write down the first k distributions and only then require an invariant and strategy to be found. More concretely, the set of distributions that can be achieved in a given step k while remaining in H can be explicitly computed, denote this set as Δ<sup>k</sup>. For a different perspective, this describes the set of states reachable in T<sup>M</sup> within k steps and corresponds to "unrolling" the MDP for a fixed number of steps. This then goes hand in hand with the above "restricted initial case", where we ask whether there exists an H-safe distribution in Δ<sup>k</sup>. We conjecture that this could simplify the search for distributional invariants for systems which have a lot of "transient" behaviour, as observed in searching for invariants for state reachability [11].

Fig. 3. Our Split toy example. The MDP comprises two disconnected parts. Probability mass flows from A to B and from C to D under all strategies.

#### 7 Implementation and Evaluation

While the main focus of our contribution lies on the theory, we validate the applicability through an unoptimized prototype implementation. We implemented our approach in Python 3.10, using SymPy 1.11 [50] to handle and simplify symbolic expressions, and PySMT 0.9 [36] to abstract communication with constraint solvers. We use z3 4.8 [53] and mathsat 5.6 [26] as back-ends. Our experiments were executed on consumer hardware (AMD Ryzen 3600 CPU with 16 GB RAM).

*Caveats.* While the existential (non-linear) theory of the reals is known to be decidable, practical algorithms are less explored than, for example, SAT solving. In particular, runtimes are quite sensitive to minor changes in the input structure and initial randomization (many solvers apply randomized algorithms). We observed differences of several orders of magnitude (going from seconds to hours) simply due to restarting the computation (leading to different initial seeds). Similarly, by strengthening the antecedents of implications by known facts, we also observed significant improvements. Concretely, given that we have constraints of the form I(*x*) =⇒ H(*x*) and I(*x*) =⇒ Φ(*x*), we observed that changing the second constraint to I(*x*) ∧ H(*x*) =⇒ Φ(*x*) would drastically improve the runtime even though the two are semantically equivalent.

This suggests that both improvements of our implementation as well as further work on constraint solvers are likely to have a significant impact on the runtime.

*Models.* Aside from our running example of Fig. 1, which we refer to as Running here, we consider two further toy examples.

The first model, called Chain, is a Markov chain defined as follows: We consider the states S = {s1,...,s10} and set δ(si) = {si+1 → 1} for all i < 10 and <sup>δ</sup>(s10) = {s<sup>9</sup> → <sup>1</sup> <sup>2</sup> , s<sup>10</sup> → <sup>1</sup> <sup>2</sup> }. The initial distribution is given as <sup>μ</sup>0(si) = <sup>1</sup> <sup>10</sup> for all <sup>s</sup><sup>i</sup> <sup>∈</sup> SS and the safe set by <sup>H</sup> <sup>=</sup> {μ(s10) <sup>≥</sup> <sup>1</sup> <sup>10</sup> }. We are mainly interested in this model to investigate demonstrate applicability to "larger" systems.

The second model, called Split, is an MDP which actually comprises two independent subsystems. We depict the model in Fig. 3. The initial distribution is <sup>μ</sup><sup>0</sup> <sup>=</sup> {<sup>A</sup> → <sup>1</sup> <sup>2</sup> , C → <sup>1</sup> <sup>2</sup> } and the safe set <sup>H</sup> <sup>=</sup> {μ(A) +μ(D) <sup>≥</sup> <sup>1</sup> <sup>2</sup> }. This aims to explore both disconnected models as well as a safe set which imposes a constraint on multiple states at once. In particular, observe that initially μ0(D)=0 but μi(D) converges to 1 while μi(A) converges to 0, even if choosing action a1. Thus, the invariant needs to identify the simultaneous flow from A to B and C to D.

Table 1. Overview of our results for the five considered models. From left to right, we list the name of the model, the runtime, and size of the invariant, followed by the number of variables, constraints, and total size of the query passed to the constraint solvers. For Running, we provided additional hints to the solver to achieve a more consistent runtime, indicated by the dagger symbol.


Table 2. The invariants and strategies computed for our models. We omit the invariants for the two real-world scenarios since they are too large to fit.


We additionally consider two examples from the literature, namely the PageRank example from [1, Fig. 3], based on [51], and Insulin-<sup>131</sup>I, a pharmacokinetics system [1, Example 2], based on [17]. Both are Markov chains.

*Results.* We summarize our findings briefly in Table 1. We again underline that not too much attention should be put on runtimes, since they are very sensitive to minimal changes in the model. The evaluation is mainly intended to demonstrate that our methods are actually able to provide results. For completeness, we report the size of the invariant N<sup>I</sup> and the size of the constraint problem in terms of number of variables, constraints, and operations inside these constraints. We also provide the invariants and strategy identified by our method in Table 2. Note that for Running we used AlgDist, while the other two examples are handled by AlgMemLess. For Running, we observed a significant dependence on the initialization of the solvers. Thus we added several "hints", i.e. known correct values for some variables. (To be precise, we set the value for eight of the 92 variables.)

*Discussion.* We remark two related points: Firstly, we observe that very often most of the involved auxiliary variables introduced by the quantifier elimination have a value of zero. Thus, a potential optimization is to explicitly set most such variables to zero, check whether the formula is satisfiable, and, if not, gradually remove these constraints either at random or guided by unsat-cores if available (i.e. clauses which are the "reason" for unsatisfiability). Moreover, we observed significant differences between the solvers: While z3 seems to be much quicker to identify unsatisfiability, mathsat usually is better at finding satisfying assignments. Hence, using both solvers in tandem seems to be very beneficial.

## 8 Conclusion

We developed a framework for defining certificates for safety objectives in MDPs as distributional inductive invariants. Using this, we came up with two algorithms that synthesize linear/affine invariants and corresponding memoryless/general strategies for safety in MDPs. To the best of our knowledge this is the first time the template-based invariant approach, already known to be successful for programs, has been applied to synthesis strategies in MDPs for distributional safety properties. Our experimental results show that affine invariants are sufficient for many interesting examples. However, the second approach can be lifted to synthesize polynomial invariants, and hence potentially, a large set of MDPs. Exploring this could be a future line of work. It would also be interesting to explore how one can automate distributional invariant synthesis if the safe set H is specified in terms of both strict and non-strict inequalities. Finally, in terms of applicability, we would like to apply this approach to solve more benchmarks and problems, e.g., to synthesize risk-aware strategies for MDPs [46,49].

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Search and Explore: Symbiotic Policy Synthesis in POMDPs**

Roman Andriushchenko<sup>1</sup> , Alexander Bork<sup>2</sup> , Milan Ceˇ <sup>ˇ</sup> ska1(B) , Sebastian Junges<sup>3</sup> , Joost-Pieter Katoen<sup>2</sup> , and Filip Mac´ak<sup>1</sup>

<sup>1</sup> Brno University of Technology, Brno, Czech Republic ceskam@fit.vutbr.cz <sup>2</sup> RWTH Aachen University, Aachen, Germany

<sup>3</sup> Radboud University, Nijmegen, The Netherlands

**Abstract.** This paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller—that solely decides based on the observations seen so far—to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a beliefbased and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.

## **1 Introduction**

A formidable synthesis challenge is to find a decision-making policy that satisfies temporal constraints even in the presence of stochastic noise. *Markov decision processes (MDPs)* [26] are a prominent model to reason about such policies under stochastic uncertainty. The underlying decision problems are efficiently solvable and probabilistic model checkers such as PRISM [22] and Storm [13] are well-equipped to synthesise policies that provably (and optimally) satisfy a given specification. However, a major shortcoming of MDPs is the assumption that the policy can depend on the precise state of a system. This assumption is unrealistic whenever the state of the system is only observable via sensors. *Partially*

This work has been supported by the Czech Science Foundation grant GA23-06963S (VESCAA), the ERC AdG Grant 787914 (FRAPPANT) and the DFG RTG 2236/2 (UnRAVeL).

**Fig. 1.** Schematic depiction of the symbiotic approach

*observable MDPs (POMDPs)* overcome this shortcoming, but policy synthesis for POMDPs and specifications such as *the probability to reach the exit is larger than 50*% requires solving undecidable problems [23]. Nevertheless, in recent years, a variety of approaches have been successfully applied to a variety of challenging benchmarks, but the approaches also fail somewhat spectacularly on seemingly tiny problem instances. From a user perspective, it is hard to pick the right approach without detailed knowledge of the underlying methods. This paper sets out to develop a framework in which conceptually orthogonal approaches symbiotically alleviate each other's weaknesses and find policies that maximise, e.g., the expected reward before a target is reached. We show empirically that the combined approach can find compact policies achieving a significantly higher reward than the policies that either individual approach constructs.

*Belief Exploration.* Several approaches for solving POMDPs use the notion of *beliefs* [27]. The key idea is that each sequence of observations and actions induces a belief—a distribution over POMDP states that reflects the probability to be in a state conditioned on the observations. POMDP policies can decide optimally solely based on the belief. The evolution of beliefs can be captured by a fully observable, yet possibly infinite *belief MDP*. A practical approach (see the lower part of Fig. 1) is to unfold a finite fragment of this belief MDP and make its frontier absorbing. This finite fragment can be analysed with off-the-shelf MDP model checkers. Its accuracy can be improved by using an arbitrary but fixed cut-off policy from the frontier onwards. Crucially, the probability to reach the target under such a policy can be efficiently pre-computed for all beliefs. This paper considers the belief exploration method from [8] realised in Storm [13].

*Policy Search.* An orthogonal approach searches a (finite) space of policies [14, 24] and evaluates these policies by verifying the induced Markov chain. To ensure scalability, sets of policies must be efficiently analysed. However, policy spaces explode whenever they require memory. The open challenge is to adequately define the space of policies to search in. In this paper, we consider the policysearch method from [5] as implemented in Paynt [6] that explores spaces of finite-state controllers (FSCs), represented as deterministic Mealy machines [2], using a combination of abstraction-refinement, counterexamples (to prune sets of policies), and increasing a controller's memory, see the upper part of Fig. 1.

*Our Symbiotic Approach.* In essence, our idea relies on the fact that a policy found via one approach can boost the other approach. The key observation is that such a policy is beneficial even when it is sub-optimal in terms of the objective at hand. Figure 1 sketches the symbiotic approach. The FSCs F<sup>I</sup> obtained by policy search are used to guide the partial belief MDP to the target. Vice versa, the FSCs F<sup>B</sup> obtained from belief exploration are used to shrinken the set of policies and to steer the abstraction. Our experimental evaluation, using a large set of POMDP benchmarks, reveals that (a) belief exploration can yield better FSCs (sometimes also faster) using FSCs F<sup>I</sup> from Paynt—even if the latter FSCs are far from optimal, (b) policy search can find much better FSCs when using FSCs from belief exploration, and (c) the FSCs from the symbiotic approach are superior in value to the ones obtained by the standalone approaches.

*Beyond Exploration and Policy Search.* In this work, we focus on two powerful orthogonal methods from the set of belief-based and search-based methods. Alternatives exist. Exploration can also be done using a fixed set of beliefs [25]. Prominently, HSVI [18] and SARSOP [20] are belief-based policy synthesis approaches typically used for discounted properties. They also support undiscounted properties, but represent policies with α-vectors. Bounded policy synthesis [29] uses a combination of belief-exploration and inductive synthesis over paths and addresses finite horizon reachability. α-vector policies lead to more complex analysis downstream: the resulting policies must track the belief and do floatingpoint computations to select actions. For policy search, prominent alternatives are to search for randomised controllers via gradient descent [17] or via convex optimization [1,12,19]. Alternatively, FSCs can be extracted via deep reinforcement learning [9]. However, randomised policies limit predictability, which hampers testing and explainability. The area of programmatic reinforcement learning [28] combines inductive synthesis ideas with RL. While our empirical evaluation is method-specific, the lessons carry over to integrating other methods.

*Contributions.* The key contribution of this paper is the symbiosis of belief exploration [8] and policy search [5]. Though this seems natural, various technical obstacles had to be addressed, e.g., obtaining F<sup>B</sup> from the finite fragment of the belief MDP and the policies for its frontier and developing an interplay between the exploration and search phases that minimises the overhead. The benefits of the symbiotic algorithm are manifold, as we show by a thorough empirical evaluation. It can solve POMDPs that cannot be tackled with either of the two approaches alone. It outputs FSCs that are superior in value (with relative improvements of up to 40%) as well as FSCs that are more succinct (with reduction of a factor of up to two orders of magnitude) with only a small penalty in their values. Additionally, the integration reduces the memory footprint compared to belief exploration by a factor of 4. In conclusion, the proposed symbiosis offers a powerful push-button, anytime synthesis algorithm producing, in the given time, superior and/or more succinct FSCs compared to the stateof-the-art methods.

**Fig. 2.** (a) and (b) contain two POMDPs. Colours encode observations. Unlabelled transitions have probability 1. Omitted actions (e.g. γ,δ in state B2) execute a selfloop. (c) Markov chain induced by the minimising policy σ<sup>B</sup> in the finite abstraction M<sup>B</sup> *<sup>a</sup>* of the POMDP from Fig. 2a. In the rightmost state, policy F is applied (cut-off), allowing to reach the target in ρ steps. (Color figure online)

## **2 Motivating Examples**

We give a sample POMDP that is hard for the belief exploration, a POMDP that challenges the policy search approach, and indicate why a symbiotic approach overcomes this. A third sample POMDP is shown to be unsolvable by either approach alone but can be treated by the symbiotic one.

**A Challenging POMDP for Belief-Based Exploration.** Consider POMDP M<sup>a</sup> in Fig. 2a. The objective is to minimise the expected number of steps to the target Ta. An optimal policy is to always take action α yielding 4 expected steps. An FSC realising this policy can be found by a policy search under 1s.

*Belief MDPs.* States in the *belief MDP* M<sup>B</sup> <sup>a</sup> are *beliefs*, probability distributions over POMDP states with equal observations. The initial belief is {S -→ 1}. By taking action α, 'yellow' is observed and the belief becomes {L -<sup>→</sup> <sup>1</sup> <sup>2</sup> , R -<sup>→</sup> <sup>1</sup> 2 }. Closer inspection shows that the set of reachable beliefs is infinite rendering M<sup>B</sup> a to be infinite. Belief exploration constructs a finite fragment M<sup>B</sup> <sup>a</sup> by exploring M<sup>B</sup> <sup>a</sup> up to some depth while *cutting off* the frontier states. From cut-off states, a shortcut is taken directly to the target. These shortcuts are heuristic overapproximations of the true number of expected steps from the cut-off state to the target. The finite MDP M<sup>B</sup> <sup>a</sup> can be analysed using off-the-shelf tools yielding the minimising policy σ<sup>B</sup> assigning to each belief state the optimal action.

*Admissible Heuristics.* A simple way to over-approximate the minimal number of the expected number of steps to the target is to use an arbitrary controller F and use the expected number of steps under F. The latter is cheap if F is compact, as detailed in Sect. 4.2. Figure 2c shows a Markov chain induced by σ<sup>B</sup> in M<sup>B</sup> <sup>a</sup> , where the belief {L -<sup>→</sup> <sup>7</sup> <sup>8</sup> , R -<sup>→</sup> <sup>1</sup> <sup>8</sup> } is cut off using F. The belief exploration in Storm [8] unfolds 1000 states of M<sup>B</sup> <sup>a</sup> and finds controller F that uniformly randomises over all actions in the rightmost state. The resulting suboptimal controller F<sup>B</sup> reaches the target in ≈4.1 steps. Exploring only a few states suffices when replacing F by a (not necessarily optimal) FSC provided by a policy search.

**A Challenging POMDP for Policy Search.** Consider POMDP M<sup>b</sup> in Fig. 2b. The objective is to minimise the expected number of steps to Tb. Its 9-state belief MDP M<sup>B</sup> <sup>b</sup> is trivial for the belief-based method. Its optimal controller σ<sup>B</sup> first picks action γ; on observing 'yellow' it plays β twice, otherwise it always picks α. This is realised by an FSC with 3 memory states. The inductive policy search in Paynt [5] explores families of FSCs of increasing complexity, i.e., of increasing memory size. It finds the optimal FSC after consulting about 20 billion candidate policies. This requires 545 model-checking queries; the optimal one is found after 105 queries while the remaining queries prove that no better 3-state FSC exists.

*Reference Policies.* The policy search is guided by a reference policy, in this case the fully observable MDP policy that picks (senseless) action δ in B<sup>1</sup> first. Using policy σB—obtained by the belief method—instead, δ is never taken. As σ<sup>B</sup> picks in each 'blue' state a different action, mimicking this requires at least three memory states. Using σ<sup>B</sup> reduces the total number of required modelchecking queries by a factor of ten; the optimal 3-state FSC is found after 23 queries.

**The Potential of Symbiosis.** To further exemplify the limitation of the two approaches and the potential of their symbiosis, we consider a synthetic POMDP, called Lanes+, combining a Lane model with larger variants of the POMDPs in Fig. 2; see Table 2 on page 14 for the model statistics and Appendix C of [3] for the model description. We consider minimisation of the expected number of steps and a 15-min timeout. The belief-based approach by Storm yields the value 18870. The policy search method by Paynt finds an FSC with 2 memory states achieving the value 8223. This sub-optimal FSC significantly improves the belief MDP approximation and enables Storm to find an FSC with value 6471. The symbiotic synthesis loop finds the optimal FSC with value 4805.

## **3 Preliminaries and Problem Statement**

A (discrete) *distribution* over a countable set A is a function μ: A → [0, 1] s.t. - <sup>a</sup> μ(a) = 1. The set supp(μ) := {a ∈ A | μ(a) > 0} is the *support* of μ. The set Distr(A) contains all distributions over A. We use Iverson bracket notation, where [x] = 1 if the Boolean expression x evaluates to true and [x] = 0 otherwise.

**Definition 1 (MDP).** *A* Markov decision process (MDP) *is a tuple* M = (S, s0, Act,P) *with a countable set* S *of states, an initial state* s<sup>0</sup> ∈ S*, a finite set* Act *of actions, and a partial transition function* <sup>P</sup> : <sup>S</sup> <sup>×</sup> Act - Distr(S)*.* Act(s) := {α ∈ Act | P(s, α) = ⊥} *denotes the set of actions available in state* s ∈ S*. An MDP with* |Act(s)| = 1 *for each* s ∈ S *is a* Markov chain (MC)*.*

Unless stated otherwise, we assume Act(s) = Act for each s ∈ S for conciseness. We denote P(s, α, s ) := P(s, α)(s ). A (finite) *path* of an MDP M is a sequence π = s0α0s1α<sup>1</sup> ...s<sup>n</sup> where P(si, αi, si+1) > 0 for 0 ≤ i<n. We use last(π) to denote the last state of path π. Let P aths<sup>M</sup> denote the set of all finite paths of M. State s is absorbing if supp(P(s, α)) = {s} for all α ∈ Act.

**Definition 2 (POMDP).** *A* partially observable MDP (POMDP) *is a tuple* M = (M, Z, O)*, where* M *is the underlying MDP,* Z *is a finite set of observations and* O: S → Z *is a (deterministic) observation function.*

For POMDP M with underlying MDP M, an *observation trace* of path π = s0α0s1α<sup>1</sup> ...s<sup>n</sup> is a sequence O(π) := O(s0)α0O(s1)α<sup>1</sup> ...O(sn). Every MDP can be interpreted as a POMDP with Z = S and O(s) = s for all s ∈ S.

A (deterministic) *policy* is a function <sup>σ</sup> : P aths<sup>M</sup> <sup>→</sup> Act. Policy <sup>σ</sup> is *memoryless* if last(π) = last(π ) =⇒ σ(π) = σ(π ) for all π, π <sup>∈</sup> P aths<sup>M</sup>. A memoryless policy σ maps a state s ∈ S to action σ(s). Policy σ is *observation-based* if O(π) = O(π ) =⇒ σ(π) = σ(π ) for all π, π <sup>∈</sup> P aths<sup>M</sup>. For POMDPs, we always consider observation-based policies. We denote by Σ*obs* the set of all observation-based policies. A policy <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>*obs* induces the MC <sup>M</sup><sup>σ</sup>.

We consider indefinite-horizon reachability or expected total reward properties. Formally, let M = (S, s0, Act,P) be an MC, and let T ⊆ S be a set of *target states*. <sup>P</sup><sup>M</sup> [<sup>s</sup> <sup>|</sup><sup>=</sup> ♦T] denotes the probability of reaching <sup>T</sup> from state <sup>s</sup> <sup>∈</sup> <sup>S</sup>. We use <sup>P</sup><sup>M</sup> [♦T] to denote <sup>P</sup><sup>M</sup> [s<sup>0</sup> <sup>|</sup><sup>=</sup> ♦T] and omit the superscript if the MC is clear from context. Now assume POMDP M with underlying MDP M = (S, s0, Act,P), and a set T ⊆ S of absorbing target states. Without loss of generality, we assume that the target states are associated with the unique observation <sup>z</sup><sup>T</sup> <sup>∈</sup> <sup>Z</sup>, i.e. <sup>s</sup> <sup>∈</sup> <sup>T</sup> iff <sup>O</sup>(s) = <sup>z</sup><sup>T</sup> . For a POMDP <sup>M</sup> and T ⊆ S, the *maximal reachability probability* of T for state s ∈ S in M is <sup>P</sup><sup>M</sup>max [<sup>s</sup> <sup>|</sup><sup>=</sup> ♦T] := sup<sup>σ</sup>∈Σ*obs* <sup>P</sup>M<sup>σ</sup> [s |= ♦T]. The minimal reachability probability P<sup>M</sup> min [s |= ♦T] is defined analogously.

Finite-state controllers are automata that compactly encode policies.

**Definition 3 (FSC).** *A* finite-state controller (FSC) *is a tuple* F = (N,n0, γ,δ)*, with a finite set* N *of* nodes*, the* initial node n<sup>0</sup> ∈ N*, the* action function γ : N × Z → Act *and the* update function δ : N × Z × Z → N*.*

A k*-FSC* is an FSC with |N| = k. If k=1, the FSC encodes a memoryless policy. We use F<sup>M</sup> (F<sup>M</sup> <sup>k</sup> ) to denote the family of all (k-)FSCs for POMDP M. For a POMDP in state s, an agent receives observation z = O(s). An agent following an FSC F executes action α = γ(n, z) associated with the current node n and the current (prior) observation z. The POMDP state is updated accordingly to some s with P(s, α, s ) > 0. Based on the next (posterior) observation z = O(s ), the FSC evolves to node n = δ(n, z, z ). The *induced MC* for FSC F is <sup>M</sup><sup>F</sup> = (<sup>S</sup> <sup>×</sup> N,(s0, n0), {α},P<sup>F</sup> ), where for all (s, n),(s , n ) ∈ S × N we have

<sup>P</sup><sup>F</sup> ((s, n), α,(s , n )) = [n = δ (n, O(s), O(s ))] · P(s, γ(n, O(s)), s ).

We emphasise that for MDPs with infinite state space and POMDPs, an FSC realising the maximal reachability probability generally does not exist. For FSC <sup>F</sup> ∈ F<sup>M</sup> with the set <sup>N</sup> of memory nodes, let <sup>P</sup>M<sup>F</sup> [(s, n) |= ♦T] := PM<sup>F</sup> [(s, n) |= ♦(T × N)] denote the probability of reaching target states T from state (s, n) <sup>∈</sup> <sup>S</sup> <sup>×</sup> <sup>N</sup>. Analogously, <sup>P</sup>M<sup>F</sup> [♦T] := <sup>P</sup>M<sup>F</sup> [♦(T × N)] denotes the probability of reaching target states <sup>T</sup> in the MC <sup>M</sup><sup>F</sup> induced on <sup>M</sup> by <sup>F</sup>.

**Problem Statement.** The classical synthesis problem [23] for POMDPs asks: given POMDP M, a set T of targets, and a threshold λ, find an FSC F such that PM<sup>F</sup> [♦T] ≥ λ, if one exists. We take a more practical stance and aim instead to optimise the value PM<sup>F</sup> [♦T] in an anytime fashion: the faster we can find FSCs with a high value, the better.

*Remark 1.* Variants of the maximising synthesis problem for the expected total reward and minimisation are defined analogously. For conciseness, in this paper, we always assume that we want to maximise the value.

In addition to the value of the FSC F, another key characteristic of the controller is its *size*, which we treat as a secondary objective and discuss in detail in Sect. 6.

### **4 FSCs for and from Belief Exploration**

We consider *belief exploration* as described in [8]. A schematic overview is given in the lower part of Fig. 1. We recap the key concepts of belief exploration. This section explains two contributions: we discuss how arbitrary FSCs are included and present an approach to export the associated POMDP policies as FSCs.

#### **4.1 Belief Exploration with Explicit FSC Construction**

Finite-state controllers for a POMDP can be obtained by analysing the (fully observable) *belief MDP* [27]. The state space of this MDP consists of *beliefs*: probability distributions over states of the POMDP M having the same observation. Let S<sup>z</sup> := {s ∈ S | O(s) = z} denote the set of all states of M with observation z ∈ Z. Let the set of all beliefs B<sup>M</sup> := <sup>z</sup>∈<sup>Z</sup> Distr(Sz) and denote for b ∈ B<sup>M</sup> by O(b) ∈ Z the unique observation O(s) of any s ∈ supp(b).

In a belief b, taking action α yields an updated belief as follows: let P(b, α, z ) := - s∈SO(b) b(s)· - s-∈Sz- P(s, α, s ) denote the probability of observing z ∈ Z upon taking action α ∈ Act in belief b ∈ BM. If P(b, α, z ) > 0, the corresponding successor belief b = b|α, z with O(b ) = z is defined component-wise as

$$\mathbb{I}[b|\alpha, z'](s') := \frac{\sum\_{s \in S \cup (b)} b(s) \cdot \mathcal{P}(s, \alpha, s')}{\mathcal{P}(b, \alpha, z')}$$

for all s ∈ S<sup>z</sup>- . Otherwise, b|α, z is undefined.

**Definition 4 (Belief MDP).** *The* belief MDP *of POMDP* M *is the MDP* M<sup>B</sup> = (BM, b0, Act,P<sup>B</sup>)*, with initial belief* b<sup>0</sup> := {s<sup>0</sup> -→ 1} *and transition function* P<sup>B</sup>(b, α, b ) := [b = b|α, z ] · P(b, α, z ) *where* z = O(b )*.*

The belief MDP captures the behaviour of its POMDP. It can be unfolded by starting in the initial belief and computing all successor beliefs.

*Deriving FSCs from Finite Belief MDPs.* Let T <sup>B</sup> := <sup>b</sup> ∈ B<sup>M</sup> <sup>|</sup> <sup>O</sup>(b) = <sup>z</sup><sup>T</sup> denote the set of *target beliefs*. If the reachable state space of the belief MDP M<sup>B</sup> is finite, e.g. because the POMDP is acyclic, standard model checking techniques can be applied to compute the memoryless policy σ<sup>B</sup> : B<sup>M</sup> → Act that selects in each belief state <sup>b</sup> ∈ B<sup>M</sup> the action that maximises <sup>P</sup> b |= ♦T <sup>B</sup> <sup>1</sup>. We can translate the deterministic, memoryless policy σ<sup>B</sup> into the corresponding FSC F<sup>B</sup> = (BM, b0, γ,δ) with action function γ(b, z) = σB(b) and update function δ(b, z, z ) = b|σB(b), z for all z, z ∈ Z. 2

*Handling Large and Infinite Belief MDPs.* In case the reachable state space of the belief MDP M<sup>B</sup> is infinite or too large for a complete unfolding, a finite approximation M<sup>B</sup> is used instead [8]. Assuming M<sup>B</sup> is unfolded up to some depth, let E⊂B<sup>M</sup> denote the set of explored beliefs and let U⊂BM\E denote the *frontier* : the set of unexplored beliefs reachable from E in one step. To complete the finite abstraction, we require handling of the frontier beliefs. The idea is to use for each b ∈ U a *cut-off value* V (b): an under-approximation of the maximal reachability probability PM<sup>B</sup> max b |= ♦T <sup>B</sup> for b in the belief MDP. We explain how to compute cut-off values systematically given an FSC in Sect. 4.2.

Ultimately, we define a finite MDP M<sup>B</sup> = (E∪U∪{b, b⊥}, b0, Act,P<sup>B</sup>) with the transition function: P<sup>B</sup>(b, α) := P<sup>B</sup>(b, α) for explored beliefs b ∈ E and all α ∈ Act, and P<sup>B</sup>(b, α) := {b -→ V (b), b<sup>⊥</sup> -→ 1 − V (b)} for frontier beliefs b ∈ U and all α ∈ Act, where b and b<sup>⊥</sup> are fresh sink states, i.e. P<sup>B</sup>(b, α) := {b -→ 1}

<sup>1</sup> Memoryless policies suffice to maximise the value in a fully observable MDP [26].

<sup>2</sup> The assignments of missing combinations where <sup>z</sup> -= O(b) are irrelevant.

and PB(b⊥, α) := {b<sup>⊥</sup> -→ 1} for all α ∈ Act. The reachable state space of M<sup>B</sup> is finite, enabling its automated analysis; since our method to compute cut-off values emulates an FSC, a policy maximising PM<sup>B</sup> max ♦(T <sup>B</sup> ∪ {b}) induces an FSC for the original POMDP M. We discuss how to obtain this FSC in Sect. 4.3.

#### **4.2 Using FSCs for Cut-Off Values**

A crucial aspect when applying the belief exploration with cut-offs is the choice of suitable cut-off values. The closer the cut-off value is to the actual optimum in a belief, the better the approximation we obtain. In particular, if the cut-off values coincide with the optimal value, cutting off the initial state is optimal. However, finding optimal values is as hard as solving the original POMDP. We consider *under-approximative value functions* induced by applying *any*<sup>3</sup> FSC to the POMDP and lifting the results to the belief MDP. The better the FSC, the better the cut-off value. We generalise belief exploration with cut-offs such that the approach supports arbitrary sets of FSCs with additional flexibility.

Let F<sup>I</sup> ∈ F<sup>M</sup> be an arbitrary, but fixed FSC for POMDP M. Let ps,n := <sup>P</sup>MF<sup>I</sup> [(s, n) <sup>|</sup><sup>=</sup> ♦T] for state (s, n) <sup>∈</sup> <sup>S</sup> <sup>×</sup> <sup>N</sup> in the corresponding induced MC. For fixed n ∈ N, V (b, n) := - s∈SO(b) b(s) · ps,n denotes the cut-off value for belief b and memory node n. It corresponds to the probability of reaching a target state in <sup>M</sup><sup>F</sup><sup>I</sup> when starting in memory node <sup>n</sup> <sup>∈</sup> <sup>N</sup> and state <sup>s</sup> <sup>∈</sup> <sup>S</sup> according to the probability distribution b. We define the overall cut-off value for b induced by F as V (b) := max<sup>n</sup>∈<sup>N</sup> V (b, n). It follows straightforwardly that <sup>V</sup> (b) <sup>≤</sup> <sup>P</sup>M<sup>B</sup> max b |= ♦T <sup>B</sup> . As values ps,n only need to be computed once, computing V (b) for a given belief b is relatively simple. However, the complexity of the FSC-based cut-off approach depends on the size of the induced MC. Therefore, it is essential that the FSCs used to compute cut-off values are concise.

#### **4.3 Extracting FSC from Belief Exploration**

Model checking the finite approximation MDP M<sup>B</sup> with cut-off values induced by an FSC F<sup>I</sup> yields a maximising memoryless policy σB. Our goal is to represent this policy as an FSC FB. We construct F<sup>B</sup> by considering both F<sup>I</sup> and the necessary memory nodes for each explored belief b ∈ E. Concretely, for each explored belief, we introduce a corresponding memory node. In each such node, the action σB(b) is selected. For the memory update, we distinguish between two cases based on the next belief after executing σB(b) in M<sup>B</sup>. If for observation z ∈ Z, the successor belief b = b|σB(b), z ∈ E, the memory is updated to the corresponding node. Otherwise, b ∈ U holds, i.e., the successor is part of the frontier. The memory is then updated to the memory node n of FSC F<sup>I</sup> that maximises the cut-off value V (b , n). This corresponds to the notion that if the frontier is encountered, we switch from acting according to policy σ<sup>B</sup> to following F<sup>I</sup> (initialised in the correct memory node). This is formalised as:

<sup>3</sup> We remark that [8] considers memoryless FSCs only.

**Definition 5 (Belief-based FSC with cut-offs).** *Let* F<sup>I</sup> = (N,n0, γI, δI) *and* M<sup>B</sup> *as before. The* belief-based FSC with cut-offs *is* F<sup>B</sup> = (E ∪ N, b0, γ,δ) *with action function* γ(b, z) = σB(b) *for* b ∈ E *and* γ(n, z) = γI(n, z) *for* n ∈ N *and arbitrary* z ∈ Z*. The update function* δ *is defined for all* z, z ∈ Z *by* δ(n, z, z ) = δI(n, z, z ) *if* n ∈ N*, and for* b ∈ E *with* b = b|σB(b), z *by:*

δ(b, z, z ) = b *if* b ∈ E, *and* δ(b, z, z ) = argmaxn∈<sup>N</sup> V (b , n) *otherwise.*

## **5 Accelerated Inductive Synthesis**

In this section, we consider inductive synthesis [5], an approach for finding controllers for POMDPs in a set of FSCs. We briefly recap the main idea, then first explain how to use a reference policy. Finally, we introduce and discuss a novel search space for the controllers that we consider in this paper in detail.

## **5.1 Inductive Synthesis with** *k***-FSCs**

In the scope of this paper, inductive synthesis [4] considers a finite family of FSCs F<sup>M</sup> <sup>k</sup> of k-FSCs with memory nodes N = {n0,...,n<sup>k</sup>−<sup>1</sup>}, and the family <sup>M</sup><sup>F</sup>M<sup>k</sup> := {M<sup>F</sup> <sup>|</sup> <sup>F</sup> ∈ F<sup>M</sup> <sup>k</sup> } of associated induced MCs. The states for each MC are tuples (s, n) ∈ S × N. For conciseness, we only discuss the abstractionrefinement framework [10] within the inductive synthesis loop. The overall image is as in Fig. 1. Informally, the *MDP abstraction* of the family <sup>M</sup><sup>F</sup>M<sup>k</sup> of MCs is an MDP MDP(F<sup>M</sup> <sup>k</sup> ) with the set <sup>S</sup> <sup>×</sup><sup>N</sup> of states such that, if some MC <sup>M</sup> ∈ M<sup>F</sup>M<sup>k</sup> executes action α in state (s, n) ∈ S × N, then this action (with the same effect) is also enabled in state (s, n) of MDP(F<sup>M</sup> <sup>k</sup> ). Essentially, MDP(F<sup>M</sup> <sup>k</sup> ) overapproximates the behaviour of all the MCs in the family <sup>M</sup><sup>F</sup>M<sup>k</sup> : it simulates an arbitrary family member in every step, but it may switch between steps.<sup>4</sup>

**Definition 6.** MDP abstraction *for POMDP* M *and family* F<sup>M</sup> <sup>k</sup> = {F1,..., Fm} *of* k*-FSCs is the MDP MDP*(F<sup>M</sup> <sup>k</sup> ) := <sup>S</sup> <sup>×</sup> N,(s0, n0), {1,...,m},P<sup>F</sup>M<sup>k</sup> *with*

<sup>P</sup><sup>F</sup>M<sup>k</sup> ((s, n), i) = <sup>P</sup><sup>F</sup><sup>i</sup> .

While this MDP has m actions, practically, many actions coincide. Below, we see how to utilise the structure of the FSCs. Here, we finish by observing that the MDP is a proper abstraction:

**Lemma 1.** *[10] For all* F ∈ F<sup>M</sup> <sup>k</sup> *,* <sup>P</sup>*MDP*(FM<sup>k</sup> ) min [♦T] <sup>≤</sup> <sup>P</sup>M<sup>F</sup> [♦T] ≤ <sup>P</sup>*MDP*(FM<sup>k</sup> ) max [♦T]*.*

With that result, we can naturally start with the set of all k-FSCs and search through this family by selecting suitable subsets [10]. Since the number k of memory nodes necessary is not known in advance, one can iteratively explore the sequence F<sup>M</sup> <sup>1</sup> , F<sup>M</sup> <sup>2</sup> ,... of families of FSCs of increasing complexity.

<sup>4</sup> The MDP is an game-based abstraction [21] of the all-in-one MC [11].

#### **5.2 Using Reference Policies to Accelerate Inductive Synthesis**

Consider the synthesis process of the optimal k-FSC F ∈ F<sup>M</sup> <sup>k</sup> for POMDP M. To accelerate the search for F within this family, we consider a reference policy, e.g., a policy σ<sup>B</sup> extracted from an (approximation of the) belief MDP, and shrink the FSC family. For each observation z ∈ Z, we collect the set Act[σB](z) := {σB(b) | b ∈ BM, O(b) = z} of actions that were selected by σ<sup>B</sup> in beliefs with observation z. The set Act[σB](z) contains the actions used by the reference policy when in observation z. We focus the search on these actions by constructing a subset of FSCs { (N,n0, γ,δ) ∈ F<sup>M</sup> <sup>k</sup> | ∀n ∈ N,z ∈ Z.γ(n, z) ∈ Act[σB](z)}.

Restricting the action selection may exclude the optimal k-FSC. It also does not guarantee that the optimal FSC in the restricted family achieves the same value as the reference policy σ<sup>B</sup> as σ<sup>B</sup> may have more memory nodes. We first search the restricted space of FSCs before searching the complete space. This also accelerates the search: The earlier a good policy is found, the easier it is to discard other candidates (because they are provably not optimal). Furthermore, in case the algorithm terminates earlier (notice the anytime aspect of our problem statement), we are more likely to have found a reasonable policy.

**Fig. 3.** (a) A POMDP where colours and capital letters encode observations; unlabelled transitions have probability 1/2; omitted actions (e.g. action β in the initial state) are self-loops; the objective is to minimise the expected number of steps to reach state G. (b) The optimal posterior-aware 2-FSC. (Color figure online)

Additionally, we could use sets Act[σB] to determine with which k to search. If in some observation z ∈ Z the belief policy σ<sup>B</sup> uses |Act[σB](z)| distinct actions, then in order to enable the use of all of these actions, we require at least k = max<sup>z</sup>∈<sup>Z</sup> |Act[σB](z)| memory states. However, this may lead to families that are too large and thus we use a more refined view discussed below.

#### **5.3 Inductive Synthesis with Adequate FSCs**

In this section, we discuss the set of candidate FSCs in more detail. In particular, we take a more refined look at the families that we consider.

*More Granular FSCs.* We consider memory models [5] that describe perobservation how much memory may be used:

**Definition 7 (**μ**-FSC).** *A* memory model *for POMDP* M *is a function* μ: Z → <sup>N</sup>*. Let* <sup>k</sup> = maxz∈<sup>Z</sup> <sup>μ</sup>(z)*. The* <sup>k</sup>*-FSC* <sup>F</sup> ∈ F<sup>M</sup> <sup>k</sup> *with nodes* N = {n0,...,nk−1} *is a* μ-FSC *iff for all* z ∈ Z *and for all* i>μ(z) *it holds:* γ(ni, z) = γ(n0, z) *and* δ(ni, z, z ) = δ(n0, z, z ) *for any* z ∈ Z*.*

F<sup>M</sup> <sup>μ</sup> denotes the family of all μ-FSCs. Essentially, memory model μ dictates that for prior observation z only μ(z) memory nodes are utilised, while the rest behave exactly as the default memory node n0. Using memory model μ with μ(z) < k for some observations z ∈ Z greatly reduces the number of candidate controllers. For example, if |Sz| = 1 for some z ∈ Z, then upon reaching this state, the history becomes irrelevant. It is thus sufficient to set μ(z) = 1 (for the specifications in this paper). It also significantly reduces the size of the abstraction, see Appendix A of [3].

*Posterior-aware or Posterior-unaware.* The technique outlined in [5] considers *posterior-unaware FSCs* [2]. An FSC with update function δ is posterior-unaware if the posterior observation is not taken into account when updating the memory node of the FSC, i.e. δ(n, z, z ) = δ(n, z, z) for all n ∈ N, z, z , z ∈ Z. This restriction reduces the policy space and thus the MDP abstraction MDP(F<sup>M</sup> <sup>k</sup> ). On the other hand, general (posterior-aware) FSCs can utilise information about the next observation to make an informed decision about the next memory node. As a result, fewer memory nodes are needed to encode complex policies. Consider Fig. 3a which depicts a simple POMDP. First, notice that in yellow states Y<sup>i</sup> we want to be able to execute two different actions, implying that we need at least


**Input :** POMDP M, set T of target states, timeout values t, t<sup>I</sup> , t<sup>B</sup> **Output:** Best FSCs F<sup>I</sup> and F<sup>B</sup> found so far F<sup>I</sup> ← ⊥, F←F<sup>M</sup><sup>1</sup> , k ← 0, μ ← {z → 1 | z ∈ Z}, F<sup>B</sup> ← ⊥, σ<sup>B</sup> ← ⊥ **while** *not timeout* t **do while** *not timeout* t<sup>I</sup> **do if** F = ∅ **then** k ← k + 1 ∀z ∈ Z : μ(z) ← max{μ(z), k} **<sup>7</sup>** F←F<sup>M</sup>*<sup>µ</sup>* <sup>F</sup>, F<sup>I</sup> <sup>←</sup> search(F, F<sup>I</sup> , Act[σB] **if** <sup>P</sup>MF<sup>I</sup> [♦T] <sup>&</sup>gt; <sup>P</sup>MF<sup>B</sup> [♦T] **else** <sup>⊥</sup>) <sup>σ</sup>B, F<sup>B</sup> <sup>←</sup> explore(tB, F<sup>I</sup> ) **if** <sup>P</sup>MF<sup>I</sup> [♦T] <sup>≤</sup> <sup>P</sup>MF<sup>B</sup> [♦T] *and* <sup>∃</sup><sup>z</sup> <sup>∈</sup> <sup>Z</sup> : <sup>μ</sup>(z) <sup>&</sup>lt; <sup>|</sup>Act[σB](z)<sup>|</sup> **then** ∀z ∈ Z : μ(z) ← |Act[σB](z)| **<sup>12</sup>** F←F<sup>M</sup>*<sup>µ</sup>* **yield** F<sup>I</sup> , F<sup>B</sup>

two memory nodes to distinguish between the two states, and the same is true for the blue states Bi. Second, notice that in each state the visible action always leads to states having different observations, implying that the posterior observation z is crucial for the optimal decision making. If z is ignored, it is impossible to optimally update the memory node. Figure 3b depicts the optimal posterioraware 2-FSC allowing to reach the target within 12 steps on expectation. The optimal posterior-unaware FSC has at least 4 memory nodes and the optimal posterior-unaware 2-FSC uses 14 steps.

*MDP Abstraction.* To efficiently and precisely create and analyse MDP abstractions, Definition 6 is overly simplified. In Appendix A of [3], we present the construction for general, posterior-aware FSCs including memory models.

## **6 Integrating Belief Exploration with Inductive Synthesis**

We clarify the symbiotic approach from Fig. 1 and review FSC sizes.

**Symbiosis by Closing the Loop.** Section 4 shows the potential to improve belief exploration using FSCs, e.g., obtained from an inductive synthesis loop, whereas Sect. 5 shows the potential to improve inductive synthesis using policies from, e.g., belief exploration. A natural next step is to use improved inductive synthesis for belief exploration and improved belief exploration for inductive synthesis, i.e., to alternate between both techniques. This section briefly clarifies the symbiotic approach from Fig. 1 using Algorithm 1.


**Table 1.** Sizes of different types of FSCs.

We iterate until a global timeout t: in each iteration, we make both controllers available to the user as soon as they are computed (Algorithm 1, l. 13). We start in the inductive mode (l. 3-8), where we initially consider the 1-FSCs represented in F<sup>M</sup> <sup>μ</sup> . Method search (l. 8) investigates <sup>F</sup> and outputs the new maximising FSC F<sup>I</sup> (if it exists). If the timeout t<sup>I</sup> interrupts the synthesis process, the method additionally returns yet unexplored parameter assignments. If F is fully explored within the timeout t<sup>I</sup> (l. 4), we increase k and repeat the process. After the timeout <sup>t</sup>I, we run belief exploration explore for <sup>t</sup><sup>B</sup> seconds, where we use F<sup>I</sup> as backup controllers (l. 9). After the timeout t<sup>B</sup> (exploration will continue from a stored configuration in the next belief phase), we use F<sup>I</sup> to obtain cut-off values at unexplored states, compute the optimal policy σM<sup>B</sup> (see Sect. 4) and extract the FSC F<sup>B</sup> which incorporates FI. Before we continue the search, we check whether the belief-based FSC is better and whether that FSC gives any reason to update the memory model (l. 10). If so, we update μ and reset the F (l. 11-12).

**The Size of an FSC.** We have considered several sub-classes of FSCs and wish to compare the sizes of these controllers. For FSC F = (N,n0, γ,δ), we define its size size(F) := size(γ) + size(δ) as the memory required to encode functions γ and δ. Encoding γ : N × Z → Act of a general k-FSC requires size(γ) = - n∈N - <sup>z</sup>∈<sup>Z</sup> 1 = <sup>k</sup>·|Z<sup>|</sup> memory. Encoding <sup>δ</sup> : <sup>N</sup> <sup>×</sup>Z×<sup>Z</sup> <sup>→</sup> <sup>N</sup> requires k·|Z| <sup>2</sup> memory. However, it is uncommon that in each state-memory pair (s, n) all posterior observations can be observed. We therefore encode δ(n, z, ·) as a sparse adjacency list, i.e., as a list of pairs (z , δ(n, z, z )). To define the size of such a list properly, consider the induced MC <sup>M</sup><sup>F</sup> = (<sup>S</sup> <sup>×</sup> N,(s0, n0), {α},P<sup>F</sup> ). Let post(n, z) := O(s ) | ∃s ∈ S<sup>z</sup> : (s , ·) <sup>∈</sup> supp(P<sup>F</sup> ((s, n), α)) denote the set of posterior observations reachable when taking a transition in a state (s, n) of <sup>M</sup><sup>F</sup> with O(s) = z. Table 1 summarises the resulting sizes of FSCs of various subclasses. The derivation is included in Appendix B of [3]. Table 4 on p. 18 shows that we typically find much smaller μ-FSCs (FI) than belief-based FSCs (FB).

## **7 Experiments**

Our evaluation focuses on the following three questions:



**Table 2.** Information about the benchmark POMDPs.

**Selected Benchmarks and Setup.** Our baseline are the recent belief exploration technique [8] implemented in Storm [13] and the inductive (policy) synthesis method [5] implemented in Paynt [6]. Paynt uses Storm for parsing and model checking of MDPs, but not for solving POMDPs. Our symbiotic framework (Algorithm 1) has been implemented on top of Paynt and Storm. In the following, we use Storm and Paynt to refer to the implementation of belief exploration and inductive synthesis respectively, and Saynt to refer to the symbiotic framework. The implementation of Saynt and all benchmarks are publicly available5. Additionally, the implementation and the benchmarks in the form of an artifact are also available at https://doi.org/10.5281/zenodo. 7874513.

*Setup.* The experiments are run on a single core of a machine equipped with an Intel i5-12600KF @4.9GHz CPU and 64GB of RAM. Paynt searches for posterior-unaware FSCs using abstraction-refinement, as suggested by [5]. By default, Storm applies the cut-offs as presented in Sect. 4.1. Saynt uses the default settings for Paynt and Storm while t<sup>I</sup> = 60s and t<sup>B</sup> = 10s were taken for Algorithm 1. Under Q3, we discuss the effect of changing these values.

*Benchmarks.* We evaluate the methods on a selection of models from [5,7,8] supplemented by larger variants of these models (Drone-8-2 and Refuel-20), by one model from [16] (Milos-97) and by the synthetic model (Lanes+) described in Appendix C of [3]. We excluded benchmarks for which Paynt or Storm finds the (expected) optimal solution in a matter of seconds. The benchmarks were selected to illustrate advantages as well as drawbacks of all three synthesis approaches: belief exploration, inductive (policy) search, and the symbiotic technique. Table 2 lists for each POMDP the number |S| of states, the total number -Act := - <sup>s</sup> |Act(s)| of actions, the number |Z| of observations, the specification (either maximising or minimising a reachability probability P or expected reward R), and a known over-approximation on the optimal value computed using the technique from [7]. These over-approximations are solely used as rough estimates of the optimal values. Table 5 on p. 20 reports the quality of the resulting FSCs on a broader range of benchmarks and demonstrates the impact of the non-default settings.

#### **Q1: FSCs provide better approximations of the belief MDP**

In these experiments, Paynt is used to obtain a sub-optimal F<sup>I</sup> within 10s which is then used by Storm. Table 3 (left) lists the results. Our main finding is that *belief exploration can yield better FSCs (and sometimes faster) using FSCs from* Paynt—even if the latter FSCs are far from optimal. For instance, Storm with provided F<sup>I</sup> finds an FSC with value 0.97 for the Drone-4-2 benchmark within a total of 10s (1s+9s for obtaining FI), compared to obtaining an FSC of value 0.95 in 56s on its own. A value improvement is also obtained if Storm runs longer. For the Network model, the value improves with 37% (short-term) and 47% (long-term) respectively, at the expense of investing 3s to find FI. For the other models, the relative improvement ranges from 3% to 25%. A further value improvement can be achieved when using better FSCs F<sup>I</sup> from Paynt;

<sup>5</sup> https://github.com/randriu/synthesis.

see Q3. Sometimes, belief exploration does not profit from FI. For Hallway, the unexplored part of the belief MDP becomes insignificant rather quickly, and so does the impact of FI. Clipping [8], a computationally expensive extension of cut-offs, is beneficial only for Rocks-12, rendering F<sup>I</sup> useless. Though even in this case, using F<sup>I</sup> significantly improves Short Storm that did not have enough time to apply clipping.

### **Q2: Belief-based FSCs improve inductive synthesis**

In this experiment, we run Storm for at most 1s, and use the result in Paynt. Table 3 (right) lists the results. Our main finding is that *inductive synthesis can find much better FSCs—and sometimes much faster—when using FSCs from belief exploration.* For instance, for the 4 × 5 × 2 benchmark, an FSC is obtained about six times faster while improving the value by 116%. On some larger models, Paynt alone struggles to find any good F<sup>I</sup> and using F<sup>B</sup> boosts this; e.g., the value for the Refuel-20 model is raised by a factor 20 at almost no run time penalty. For the Tiger benchmark, a value improvement of 860% is achieved (albeit not as good as F<sup>B</sup> itself) at the expense of doubling the run time. Thus: *even a shallow exploration of the belief MDP pays off in the inductive synthesis*. The inductive search typically profits even more when exploring the belief MDP further. This is demonstrated, e.g., in the Rocks-12 model: using the FSC F<sup>B</sup> computed using clipping (see Table 3 (left)) enables Paynt to find FSC F<sup>I</sup> with the same (optimal) value 20 as F<sup>B</sup> within 1s. Similarly, for the Milos-97 model, running Storm for 45s (producing a more precise FB) enables Paynt to find an FSC F<sup>I</sup> achieving a better value than controllers found by Storm or Paynt alone within the timeout. (These results are not reported in the tables.) However, as opposed to Q1, where a better FSC F<sup>I</sup> naturally improves the belief MDP, longer exploring the belief MDP does not always yield a better FI: a larger M<sup>B</sup> with a better F<sup>B</sup> may yield a larger memory model μ, thus inducing a significantly larger family where Paynt struggles to identify good FSCs.

### **Q3: The practical benefits of the symbiotic approach**

The goals of these experiments are to investigate whether the symbiotic approach improves the run time (can FSCs of a certain value be obtained faster?), the memory footprint (how is the total memory consumption affected?), the controller's value (can better FSCs be obtained with the same computational resources?) and the controller's size (are more compact FSCs obtained?).

*Value of the Synthesised FSCs.* Figure 4 plots the value of the FSCs produced by Storm, Paynt, and Saynt versus the computation time. Note that for maximal objectives, the aim is to obtain a high value (the first 4 plots) whereas for minimal objectives a lower value prevails. From the plots, it follows that *the FSCs from the symbiotic approach are superior in value to the ones obtained by the standalone approaches.* The relative improvement of the value of the resulting FSCs differs across individual models, similar to the trends in Q1 and Q2. When

**Table 3. Left (Q1)**: Experimental results on how a (quite sub-optimal) FSC F<sup>I</sup> computed by Paynt within 10s impacts Storm. (For Drone-8-2, the largest model in our benchmark, we use 30s). The "Paynt" column indicates the value of F<sup>I</sup> and its run time. The "Short Storm" column runs storm for 1s and compares the value of FSC F<sup>B</sup> found by Storm alone to Storm using F<sup>I</sup> . The "Long Storm" column is analogous, but with a 300s timeout for Storm. In the last row, \* indicates that clipping was used. **Right (Q2)**: Experimental results on how an FSC F<sup>B</sup> obtained by a shallow exploration of the belief MDP impacts the inductive synthesis by Paynt. The "Storm" column reports the value of F<sup>B</sup> computed within 1s. The "Paynt" column compares the values of the FSCs F<sup>I</sup> obtained by Paynt itself to Paynt using the FSCs F<sup>B</sup> within a 300s timeout.


comparing the best FSC found by Storm or Paynt alone with the best FSC found by Saynt, the improvement ranges from negligible (4 × 3-95) to around 3%-7% (Netw-3-8-20, Milos-97, Query-s3) and sometimes goes over 40% (Refuel-20, Lines+). We note that the distance to the (unknown) optimal values remains unclear. The FSC value never decreases but sometimes does also not increase, as indicated by Hallway and Rocks-12 (see also Q2). Our experiments (see Table 5) also indicate that the improvement over the baseline algorithms is typically more significant in the larger variants of the models. Furthermore, the plots in Fig. 4 also include the FSC value by the one-shot combination of Storm and Paynt. We see that Saynt *can improve the FSC value over the one-shot combination.* This is illustrated in, e.g., the 4 × 3-95 and Lanes+ benchmarks, see the 1st and 3rd plots in Fig. 4 (left).

**Fig. 4.** Value of the generated FSCs over time. The last graph shows the average memory usage of Storm and Saynt. The lines ending before the timeout indicate that the 64GB memory limit was hit. • indicates that Paynt and Saynt synthesised posterior-aware FSCs. indicates that Saynt ran with t<sup>I</sup> =90s. (Color figure online)

*Total Synthesis Time.* Saynt initially needs some time for the first iteration (one inductive and one belief phase) in Algorithm 1 and thus during the beginning of the synthesis process, the standalone tools may provide FSCs of a certain value faster. *After the first iteration, however,* Saynt *typically provides better FSCs in a shorter time.* For instance, for the Refuel-20 benchmark Saynt swiftly overtakes Storm after the first iteration. The only exception is Rocks-12 (discussed before), where Saynt with the default settings needs significantly more time than Storm to obtain an FSC of the same value.

**Table 4.** Trade-offs between the value and size in the resulting FSCs F<sup>I</sup> and F<sup>B</sup> found by Saynt. Each cell reports value/size. The first three models have a minimising objective. indicates that Saynt ran with t<sup>I</sup> =90s.


*Memory Footprint.* Belief exploration typically has a large memory footprint: Storm quickly hits the 64GB memory limit on exploring the belief MDP. Saynt *reduces the memory footprint of* Storm *alone by a factor 3 to 4* , see the bottom right plot of Fig. 4. The average memory footprint of running Paynt standalone quickly stabilises around 700MB. The memory footprint of Saynt is thus dominated by the restricted exploration of the belief MDP.

*The Size of the Synthesised FSCs.* For selected models, Table 4 shows the tradeoffs between the value and size of the resulting FSCs F<sup>I</sup> and F<sup>B</sup> found by Saynt. The experiments show that *the FSCs* F<sup>I</sup> *provided by inductive synthesis are typically about one to two orders of magnitude smaller than the belief-based FSCs* F<sup>B</sup> *with only a small penalty in their values*. There are models (e.g. Refuel-06) where a very small FB, having even slightly smaller size than FI, does exist. The integration mostly reduces the size of F<sup>B</sup> due to the better approximation of the belief MDP by up to a factor of two. This reduction has a negligible effect on the size of FI. This observation further strengthens the usefulness of Saynt that jointly improves the value of F<sup>I</sup> and FB. Hence, Saynt gives users a unique opportunity to run a single, time-efficient synthesis and select the FSC according to the trade-off between its value and size.

*Customising the* Saynt *Setup.* In contrast to the standalone approaches as well as to the one-way integrations presented in Q1 and Q2, Saynt *provides a single synthesis method that is efficient for a general class of models without tuning its parameters*. Naturally, adjusting the parameters to individual benchmarks can further improve the quality of the computed controllers: captions of Fig. 4 and Table 4 describe which non-default settings were used for selected models.

#### **Additional Results**

In Table 5, we compare values and sizes of FSCs synthesised by the particular methods on a broader range of benchmarks. We can see that FSCs F<sup>I</sup> obtained by Saynt achieve better values than the controllers computed by Paynt; sizewise, these better FSCs of Saynt are similar or only slightly bigger. Meanwhile, for FSCs F<sup>B</sup> obtained by Saynt, we sometimes observe a significant size reduction while still improving the value compared to the FSCs produced by Storm. Two models are notable: On Drone-8-2, Saynt obtains 50% smaller F<sup>B</sup> while having a 41% better value. On Network-3-8-20, the size of F<sup>B</sup> is reduced by 40% while again providing better value.

**Table 5.** The quality and size of resulting FSCs provided by Paynt, Storm, and Saynt within the 15-min timeout. The run times indicate the time needed to find the best FSC. Non-default settings: ∗ marks experiments where clipping was enabled, • marks experiments where PAYNT synthesised posterior-aware FSCs, marks experiments where integration parameter t<sup>I</sup> was set to 90 s.


In the following, we further discuss the impact of non-default settings for selected benchmarks, as presented in Table 5. For instance, using posterioraware FSCs generally significantly slows down the synthesis process, however, for Network and 4 × 3-95, it helps improve the value of the default posteriorunaware FSCs by 2% and 4%, respectively. For the former model, a better F<sup>I</sup> also improves F<sup>B</sup> by about a similar value. In some cases, e.g. for Query-s3, it is beneficial to increase the parameter tI, giving Paynt enough time to search for a good FSC F<sup>I</sup> (the relative improvement is 6%), which also improves the value of the resulting FSC F<sup>B</sup> by about a similar value. Tuning t<sup>I</sup> and t<sup>B</sup> can also have an impact on the value-size trade-off, as seen in the Milos-97 model, where setting longer timeout t<sup>I</sup> results in finding a 2% better F<sup>B</sup> with 130% size increase. A detailed analysis of the experimental results suggests that usually, it is more beneficial to invest time into searching for good F<sup>I</sup> that is used to compute better cut-off values, rather than into deeper exploration of belief MDP. However, the timeouts still need to allow for multiple subsequent iterations of the algorithm in order to utilise the full potential of the symbiosis.

#### **8 Conclusion and Future Work**

We proposed Saynt, a symbiotic integration of the two main approaches for controller synthesis in POMDPs. Using a wide class of models, we demonstrated that Saynt substantially improves the value of the resulting controllers and provides an any-time, push-button synthesis algorithm allowing users to select the controller based on the trade-off between its value and size, and the synthesis time.

In future work, we plan to explore if the inductive policy synthesis can also be successfully combined with point-based approximation methods, such as SAR-SOP, and on discounted reward properties. A preliminary comparison on discounting properties provides two interesting observations: 1) For models with large reachable belief space and discount factors (very) close to one, SARSOP typically fails to update its initial *alpha-vectors* and thus produces low-quality controllers. In these cases, SAYNT outperforms SARSOP. 2) For common discount factors, SARSOP beats SAYNT on the majority of benchmarks. This is not surprising, as the MDP engine underlying SAYNT does not natively support discounting and instead computes a much harder fixed point. See [15], for a recent discussion on the differences between discounting and not discounting.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Security and Quantum Systems**

## AutoQ: An Automata-Based Quantum Circuit Verifier

Yu-Fang Chen1(B) , Kai-Min Chung<sup>1</sup> , Ondřej Lengál2(B) , Jyun-Ao Lin<sup>1</sup> , and Wei-Lun Tsai1(B)

<sup>1</sup> Institute of Information Science, Academia Sinica, Taipei, Taiwan yfc@iis.sinica.edu.tw, alan23273850@gmail.com <sup>2</sup> Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic lengal@fit.vutbr.cz

Abstract. We present a specification language and a fully automated tool named AutoQ for verifying quantum circuits symbolically. The tool implements the automata-based algorithm from [14] and extends it with the capabilities for symbolic reasoning. The extension allows to specify *relational* properties, i.e., relationships between states before and after executing a circuit. We present a number of use cases where we used AutoQ to fully automatically verify crucial properties of several quantum circuits, which have, to the best of our knowledge, so far been proved only with human help.

### 1 Introduction

Recently, quantum computing has received much attention, driven by several technological breakthroughs [7] and increasing investments. Prototype quantum computers are already available. The opportunities for the general public particularly students, researchers, and technology enthusiasts—to access quantum computing devices are rapidly increasing, e.g., through cloud services such as Amazon Braket [1] or IBM Quantum [2]. Due to the complexity and probabilistic nature of quantum computing, the chance of errors in quantum programs is much higher than that of traditional programs, and conventional means for correctness assurance, such as testing, are much less applicable in the quantum world. Quantum programmers need better tools to help them write correct programs. Therefore, researchers anticipate that formal verification will play a crucial role in quantum software quality assurance and have, in recent years, invested significant effort in this direction [5,11,21,41–43,45,46]. Nevertheless, practical tools for automated quantum program/circuit verification are still missing.

This paper introduces AutoQ<sup>1</sup>, a fully automated tool for quantum circuit verification based on the approach proposed in [14]. In particular, AutoQ checks the validity of a Hoare-style specification {Pre} <sup>C</sup> {Post}, where <sup>C</sup> is a quantum circuit (a sequence of quantum gates) in the OpenQASM format [17] and the

c The Author(s) 2023

<sup>1</sup> Available at https://github.com/alan23273850/AutoQ.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 139–153, 2023. https://doi.org/10.1007/978-3-031-37709-9\_7

precondition Pre and postcondition Post represent sets of (pure) quantum states. The check is done by executing the circuit with all quantum states satisfying Pre (using a symbolic representation) and testing that all resulting quantum states are in the set denoted by Post.

AutoQ combines two main techniques to efficiently and effectively represent and reason about (potentially infinite) sets of quantum states:


By combining these two techniques, i.e., using TAs with symbolic variables in leaves, we can have a representation of all n-qubit quantum states where an arbitrary basis has a strictly larger amplitude than other basis states using <sup>O</sup>(n) states and transitions.

Using such a symbolic encoding is essential to allow us to describe *relational specifications*, e.g., it allows us to express properties like "the probability amplitude of the basis state <sup>|</sup>000 is increased after executing the circuit C" (for this, in the postcondition, we use TAs accepting trees with *predicates* in leaves, a subclass of symbolic tree automata of [36]). Such a property can then be verified by executing the quantum circuit *symbolically* in the spirit of symbolic execution [27] (i.e., such that the values of amplitudes are not complex numbers but, instead, *symbolic terms*) and checking whether all trees in the language of the resulting TA satisfy the desired property (using a modified antichain-based algorithm for testing TA language inclusion [4,10]). Combining TAs and symbolic variables as the language for quantum predicates allows full automation and can be used to express many crucial properties of quantum circuits, as we will demonstrate later. AutoQ is the first tool implementing this approach.

*Related Work.* Our work belongs to the line of *Hoare-style verification* of quantum programs, which has been widely discussed in the past [22,29,35,40,44]. This family of approaches follows D'Hondt and Panangaden's suggestion of using various Hermitian operators as quantum predicates, resulting in a very powerful yet complete proof system [20]. However, specifying properties using Hermitian operators is often not intuitive and is inconvenient for automation due to their enormous matrix sizes. Therefore, often these approaches are implemented on top of proof assistants such as Coq [9] and Isabelle [37] and require significant manual work in proof search. The Qbricks [12] approach alleviates the difficulty of the proof search by combining state-of-the-art theorem provers with decision procedures building on top of the Why3 platform [24]. The approach, however, still requires a significant amount of human intervention.

Regarding other quantum program/circuit/protocol verification tools, *circuit equivalence checkers* [5,11,15,26,39] are often quite efficient but less flexible in specifying the desired property (only equivalence). They are particularly useful in *compiler validation*; notable tools include Qcec [11], and Feynman [5]. *Quantum model checking* supports a rich specification language (flavors of temporal logic [23,30,38]) and is more suitable for *verifying high-level protocols* due to the quite limited scalability [6]. One notable tool in this category is QPMC [23]. *Quantum abstract interpretation* [32,43] is particularly efficient in processing large-scale circuits, but it grossly over-approximates the state space (it cannot verify basic properties of, e.g., Grover's algorithm) and cannot conclude anything when verification fails. In contrast, AutoQ can be conveniently used for quantum program development and debugging since it automatically computes the exact set of reachable states<sup>2</sup>. The mentioned tools are fully automated but have different goals or address different parts of the software development cycle than AutoQ.

*Contributions.* AutoQ evolved from a simple prototype used for performance evaluation in [14] into a robust tool. In addition, we added the following major extensions:


These improvements are pushing the capabilities of AutoQ, and also of practical quantum circuit verification itself, much further.

*Outline.* In Sect. 2, we describe our approach to TA-based specification and verification of quantum circuits. In Sect. 3, we discuss the new entailment-checking algorithm for the symbolic TA representation. We discuss the architecture of AutoQ in Sect. 4 and demonstrate the use of the specification language and AutoQ for automated verification of several case studies in Sect. 5.

<sup>2</sup> A predecessor of the presented version of AutoQ has already caught a bug in Qcec, cf. [3].

Fig. 1. Verification of a circuit <sup>C</sup> amplifying the amplitude of <sup>|</sup>00 w.r.t. the specification {P, ϕ} C {Q} with ϕ: |v<sup>h</sup> + 3v-| > |2vh|. R is the TA obtained by executing P on C.

#### 2 Tree Automata-Based Verification of Quantum Circuits

We will begin with minimal formal definitions of the TA-based specification and demonstrate how to use them to verify quantum circuits in AutoQ with examples. We assume a basic knowledge of quantum computation (see, e.g., the classical textbook [31]).

Let us fix a finite set of *quantum variables* <sup>X</sup> = {x1,...,x<sup>n</sup>} with a linear ordering (we assume x<sup>1</sup> <...<xn) and a disjoint non-empty leaf alphabet Σ. We will, in particular, work with <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup><sup>t</sup> <sup>Σ</sup><sup>p</sup> where <sup>Σ</sup><sup>t</sup> is the alphabet of *terms* and <sup>Σ</sup><sup>p</sup> is the alphabet of *predicates* in a suitable first-order theory (discussed later).

We use {0, 1}<sup>≤</sup><sup>n</sup> to denote - <sup>0</sup>≤i≤<sup>n</sup>{0, 1}<sup>i</sup> . A *(symbolic binary decision) tree* over <sup>X</sup> and <sup>Σ</sup> is a function <sup>τ</sup> : {0, <sup>1</sup>}<sup>≤</sup><sup>n</sup> <sup>→</sup> (<sup>X</sup> <sup>∪</sup>Σ) such that for all positions <sup>p</sup> <sup>∈</sup> {0, <sup>1</sup>}<sup>i</sup> with i<n, we have <sup>τ</sup> (p) = <sup>x</sup><sup>i</sup>+1 and for all positions <sup>p</sup> ∈ {0, <sup>1</sup>}<sup>n</sup>, we have <sup>τ</sup> (p) <sup>∈</sup> <sup>Σ</sup>. An example of a tree <sup>τ</sup> can be found in Fig. 1b, where <sup>Σ</sup> = {vh, v-}, <sup>τ</sup> () = <sup>x</sup>1, <sup>τ</sup> (0) = <sup>τ</sup> (1) = <sup>x</sup>2, <sup>τ</sup> (00) = <sup>v</sup>h, and <sup>τ</sup> (p) = <sup>v</sup>for <sup>p</sup> ∈ {0, <sup>1</sup>}<sup>2</sup> \ {00}.

<sup>A</sup> *(symbolic) tree automaton* (TA) is a tuple <sup>A</sup> = (S, Δ, F) where <sup>S</sup> is a finite set of *states*, <sup>Δ</sup> <sup>⊆</sup> (<sup>S</sup> <sup>×</sup> <sup>X</sup> <sup>×</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup>) <sup>∪</sup> (<sup>S</sup> <sup>×</sup> <sup>Σ</sup>) is a *transition relation*, and <sup>F</sup> <sup>⊆</sup> <sup>S</sup> is the set of *root (final) states*. We denote transitions from <sup>Δ</sup> as <sup>s</sup> <sup>x</sup><sup>i</sup> −→ (s0, s<sup>1</sup>) and <sup>s</sup> <sup>a</sup> −→ () respectively. An example of a TA with the set of root states {s} can be found in Fig. 1a.

<sup>A</sup> *run* of <sup>A</sup> on <sup>τ</sup> is a function <sup>ρ</sup>: {0, 1}<sup>≤</sup><sup>n</sup> <sup>→</sup> <sup>S</sup> s.t. for all positions <sup>p</sup> ∈ {0, <sup>1</sup>}<sup>i</sup> with i<n, it holds that <sup>ρ</sup>(p) <sup>τ</sup>(p) −−−→ (ρ(p.0), ρ(p.1)) <sup>∈</sup> <sup>Δ</sup> and for all positions <sup>p</sup> ∈ {0, 1}<sup>n</sup>, it holds that <sup>ρ</sup>(p) <sup>τ</sup>(p) −−−→ () <sup>∈</sup> <sup>Δ</sup>. The run <sup>ρ</sup> is *accepting* iff <sup>ρ</sup>() <sup>∈</sup> <sup>F</sup> and the *language* of <sup>A</sup> is <sup>L</sup>(A) = {<sup>τ</sup> | A has an accepting run on <sup>τ</sup>}. Observe that the tree in Fig. 1b is in the language of the TA P in Fig. 1a with the run <sup>ρ</sup> such that <sup>ρ</sup>() = <sup>s</sup>, <sup>ρ</sup>(0) = <sup>s</sup>1, <sup>ρ</sup>(1) = <sup>s</sup>0, <sup>ρ</sup>(00) = <sup>s</sup>3, and <sup>ρ</sup>(p) = <sup>s</sup><sup>2</sup> for <sup>p</sup> ∈ {0, 1}<sup>2</sup> \ {00}.

Now we are ready to demonstrate how to write specifications of quantum circuits with TAs using a running example. We assume that C is a 2-qubit circuit that amplifies the amplitude of the basis state <sup>|</sup>00-(under some constraint ϕ over input states) and reduces the amplitudes of other basis states. We first prepare the precondition of <sup>C</sup>, which consists of a pair (P, ϕ), where <sup>P</sup> is a TA with the root state s, a set of terms Σ<sup>t</sup> as the leaf alphabet, and the set of transitions from Fig. 1a, and ϕ is a first-order constraint over the variables used in Σt. In Σt, we use two variables over complex numbers, v and vh, to denote the corresponding amplitude (*low* and *high*). The constraint <sup>ϕ</sup> states that <sup>|</sup>v<sup>h</sup> + 3v-<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>2vh<sup>|</sup> (required by this circuit <sup>C</sup>, cf. Sect. 5.4). Recall that the TA <sup>P</sup> from Fig. 1a accepts the tree from Fig. 1b, which in turn represents the quantum state

$$s = v\_h \left| 00 \right> + v\_\ell \left| 01 \right> + v\_\ell \left| 10 \right> + v\_\ell \left| 11 \right> . \tag{1}$$

AutoQ will execute the gates in <sup>C</sup> to transform the TA <sup>P</sup> to another TA <sup>R</sup> capturing the effect of executing <sup>C</sup> over all quantum states encoded in <sup>P</sup>. The algorithm for gate operations is almost the same as the one in [14], except that now the update of leaf symbols works symbolically (similarly to symbolic execution [27]: each leaf symbol is a term over v<sup>h</sup> and v and quantum gates change the terms by accumulating the operations that would be performed on them, potentially simplifying them). In this example, the TA R will accept only one tree representing the quantum state

$$s' = \left(\frac{v\_h + 3v\_\ell}{2}\right)|00\rangle + \left(\frac{v\_h - v\_\ell}{2}\right)|01\rangle + \left(\frac{v\_h - v\_\ell}{2}\right)|10\rangle + \left(\frac{v\_h - v\_\ell}{2}\right)|11\rangle,\tag{2}$$

Observe that under the precondition <sup>ϕ</sup> <sup>=</sup> <sup>|</sup>v<sup>h</sup> + 3v-<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>2v<sup>h</sup>|, the probability of <sup>|</sup>00 is indeed increased (| vh+3v- <sup>2</sup> | <sup>2</sup> <sup>&</sup>gt; <sup>|</sup>v<sup>h</sup><sup>|</sup> <sup>2</sup>). The tree representation of s can be found in Fig. 1c. The TA Q of the postcondition can be found in Fig. 1a. The leaf alphabet of <sup>Q</sup> is the set of predicates <sup>Σ</sup><sup>p</sup> <sup>=</sup> {|-<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>v<sup>h</sup>|, <sup>|</sup>-<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>v-|} where - denotes a free variable. Observe that Q accepts the tree from Fig. 1e.

#### 2.1 High-Level Specification Language

In AutoQ, we provide a simple specification language that can be automatically translated to TAs. The language allows users to focus on the properties they want to express without the need to specify details of the TA structure. Our language is particularly suitable for describing sets of states with one high probability branch and other branches with uniformly low or zero probability, a very common pattern of quantum circuit's correctness properties. For example, in the language, we can use (|00-: <sup>v</sup>h, |∗-: v-), where "|∗-" denotes "other basis states," to define the tree language of the TA in Fig. 1a, which accepts a single tree representing the quantum state <sup>v</sup><sup>h</sup> <sup>|</sup>00- + <sup>v</sup>- <sup>|</sup>01- + <sup>v</sup>- <sup>|</sup>10- + <sup>v</sup>- <sup>|</sup>11 from Fig. 1b. Similarly, we can use (|00-: <sup>|</sup>-<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>v<sup>h</sup>|, |∗-: <sup>|</sup>-<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>v-<sup>|</sup>) to represent the language of the TA in Fig. 1d. The set of all 2-qubit basis states {|i-<sup>|</sup> <sup>i</sup> ∈ {0, 1}<sup>2</sup>} is expressed as <sup>∃</sup><sup>i</sup> ∈ {0, 1}<sup>2</sup> : (|i-: 1, |∗-: 0) (we can see it as a predicate that is satisfied by the described quantum states). We also allow the *tensor product* ⊗ operator, which multiplies the amplitude of the product basis states. For example, (|00-: 1, |∗-: 0)⊗(|00-: <sup>v</sup>h, |∗-: v-)⊗(|00-: 1, |∗-: 0) represents the (singleton) set of states compactly {v<sup>h</sup> <sup>|</sup>000000- + <sup>j</sup>∈{01,11,10} <sup>v</sup>- <sup>|</sup>00j00-}.

A more challenging example is to represent the set of states

$$\left\{ v\_h \left| i \dot{u} 000 \right\rangle + \sum\_{j \in \{0, 1\}^3 \land j \neq i} v\_\ell \left| i j 000 \right\rangle \; \middle| \; i \in \{0, 1\}^3 \right\}. \tag{3}$$

Such a set can be described with the help of the ⊗ and ∃ operators as follows:

$$\exists i \in \{0, 1\}^3 \colon (|i\rangle \colon 1, |\*\rangle \colon 0) \otimes (|i\rangle \colon v\_h, |\*\rangle \colon v\_\ell) \otimes (|000\rangle \colon 1, |\*\rangle \colon 0). \tag{4}$$

Below is the grammar of specification spec:

$$\begin{aligned} \label{eq:SDAC} \quad & \dot{\text{spec}} ::= state \mid \exists i \in \{0, 1\}^n \colon state \mid \text{spec}, state \\ \quad & state ::= (\langle c\_1 \rangle \colon t, \dots, \langle c\_k \rangle \colon t, \vert \ast \rangle \colon t) \mid (\langle i \rangle \colon t, \vert \ast \rangle \colon t) \mid state \otimes state \\ & \qquad & t \in \Sigma, \ n \in \mathbb{N}, \text{ and } c\_1, \dots, c\_k \in \{0, 1\}^n \end{aligned}$$

A spec is ill-formed when a free variable i appears in state, if some basis is repeated in the rule (|c1-: t, . . . , <sup>|</sup>c<sup>k</sup>-: t, |∗-: <sup>t</sup>), or if the previous rule contains two bases of different lengths. If all basis states of the given length are specified in (|c1-: t, . . . , <sup>|</sup>c<sup>k</sup>-: t, |∗-: <sup>t</sup>), the |∗-: <sup>t</sup> part is not required any more. The specification is then converted into a TA using a straightforward algorithm; in the following we often confuse a TA and its specification.

#### 2.2 Complex Number Representation

In a (pure) quantum state, the amplitude of a basis computational state is a *complex number*, and the corresponding probability is the square of the absolute value of the amplitude. For verification, we need an exact representation of complex numbers that can be used in computers. In AutoQ, we use a subset of complex numbers that can be expressed by the following algebraic encoding (cf. [14,34,46]):

$$(\frac{1}{\sqrt{2}})^k (a + b\omega + c\omega^2 + d\omega^3),\tag{5}$$

2 where a, b, c, d <sup>∈</sup> <sup>Z</sup>, <sup>k</sup> <sup>∈</sup> <sup>N</sup>, and <sup>ω</sup> = <sup>e</sup> iπ 4 = cos 45◦+isin 45◦ = <sup>√</sup><sup>2</sup> <sup>2</sup> <sup>+</sup><sup>i</sup> √2 <sup>2</sup> , the unit vector that makes an angle of 45◦ with the positive real axis in the complex plane. A complex number is then represented by a quadruple (a, b, c, d) of integers and a normalization factor k. Although the considered set of complex numbers is only a small subset of all complex numbers (it is countable, while the set of all complex numbers is uncountable), the subset is sufficient to describe various standard quantum gates. Currently, AutoQ supports the set of quantum gates X, H, Y, Z, S, T, Rx( <sup>π</sup> <sup>2</sup> ), Ry( <sup>π</sup> <sup>2</sup> ), CNOT, CZ, Toffoli (cf. the list in [14]), which already includes a set of universal quantum gates. From the Solovay-Kitaev theorem [18], gates performing rotations of <sup>π</sup> <sup>2</sup><sup>n</sup> , used, e.g., in Shor's algorithm [33] and *quantum Fourier transform* (QFT) [16], can be approximated with an error rate by <sup>O</sup>(log<sup>3</sup>.<sup>97</sup>( <sup>1</sup> ))-many H, CNOT, and T gates. The algebraic representation is also sufficient to represent all reachable states in OpenQASM circuits with the set of supported gates, where the initial basis state is <sup>|</sup>0 ... 0-.

AutoQ operates on the introduced representation of complex numbers. More precisely, for a specification {P, ϕ} <sup>C</sup> {Q}, the leaf symbols of <sup>P</sup> are quadruples of integer terms (a, b, c, d). We assume that all leaf symbols of <sup>P</sup> share a common normalization factor k, so we do not store the value of k explicitly since it can be inferred from the fact that the probability sum over all basis states is one. Instead, we remember a constant natural number value kc, the difference of the <sup>k</sup> value between <sup>P</sup> and <sup>R</sup>, and use it to normalize the amplitudes. Recall that <sup>R</sup> is the TA accepting all states after executing <sup>C</sup> from some states accepted by <sup>P</sup>. The initial value of <sup>k</sup><sup>c</sup> is zero, and each application of H, Rx( <sup>π</sup> <sup>2</sup> ), or Ry( <sup>π</sup> 2 ) gates will increase it by one (cf. [14]). We normalize all quadruple leaf symbols (a, b, c, d) of <sup>R</sup> by multiplying them with √ 1 2 <sup>k</sup><sup>c</sup> once <sup>R</sup> is computed.

Next, we show how to compose a specification of our running example from Fig. 1 using the algebraic representation. The specification can now be written as

<sup>P</sup> : (|00-: (va <sup>h</sup>, v<sup>b</sup> <sup>h</sup>, v<sup>c</sup> <sup>h</sup>, v<sup>d</sup> <sup>h</sup>), |∗-: (va - , v<sup>b</sup> - , v<sup>c</sup> - , v<sup>d</sup> - ) <sup>Q</sup>: (|00-: <sup>|</sup>(-<sup>1</sup>, -<sup>2</sup>, -<sup>3</sup>, -4)| <sup>2</sup> <sup>&</sup>gt; <sup>|</sup>(v<sup>a</sup> <sup>h</sup>, v<sup>b</sup> <sup>h</sup>, v<sup>c</sup> <sup>h</sup>, v<sup>d</sup> h)| <sup>2</sup>, |∗-: <sup>|</sup>(-<sup>1</sup>, -<sup>2</sup>, -<sup>3</sup>, -4)| <sup>2</sup> <sup>&</sup>lt; <sup>|</sup>(v<sup>a</sup> - , v<sup>b</sup> - , v<sup>c</sup> - , v<sup>d</sup> - )| 2), where <sup>|</sup>(a, b, c, d)<sup>|</sup> 2 = <sup>|</sup><sup>a</sup> + bω + cω<sup>2</sup> + dω<sup>3</sup><sup>|</sup> 2

$$\begin{aligned} &= \left| a + b(\frac{\sqrt{2}}{2} + \frac{\sqrt{2}}{2}i) + ci + d(-\frac{\sqrt{2}}{2} + \frac{\sqrt{2}}{2}i) \right|^2 \\ &= (a + b\frac{\sqrt{2}}{2} - d\frac{\sqrt{2}}{2})^2 + (b\frac{\sqrt{2}}{2} - c + d\frac{\sqrt{2}}{2})^2 \end{aligned}$$

#### 2.3 Precise Semantics of the Specification

As mentioned above, for verifying {P, ϕ} <sup>C</sup> {Q}, we start with a TA <sup>P</sup> representing the set of all quantum states satisfying the precondition and compute a TA <sup>R</sup> representing the set of states reachable after executing the circuit <sup>C</sup>. Then, we test whether <sup>R</sup> entails <sup>Q</sup> (w.r.t. <sup>ϕ</sup>), i.e., whether all reachable states satisfy the postcondition.

Formally, we say that a tree τ<sup>1</sup> is *entailed* by a tree τ<sup>2</sup> w.r.t. a first-order formula <sup>ϕ</sup>, denoted as <sup>τ</sup><sup>1</sup> <sup>|</sup>=<sup>ϕ</sup> <sup>τ</sup>2, if for all positions <sup>p</sup> ∈ {0, <sup>1</sup>}<sup>n</sup> it holds that either (i) <sup>τ</sup><sup>1</sup>(p) = <sup>τ</sup><sup>2</sup>(p) or (ii) <sup>τ</sup><sup>1</sup>(p)=(t1,...,t<sup>k</sup>) <sup>∈</sup> <sup>Σ</sup>t, <sup>τ</sup><sup>2</sup>(p) = <sup>ψ</sup> <sup>∈</sup> <sup>Σ</sup>p, and <sup>ϕ</sup> <sup>⇒</sup> <sup>ψ</sup>[t1/-<sup>1</sup>] ... [tk/<sup>k</sup>]. We lift the entailment to TAs: <sup>A</sup><sup>1</sup> <sup>|</sup>=<sup>ϕ</sup> <sup>A</sup><sup>2</sup> iff for all trees <sup>τ</sup><sup>1</sup> ∈ L(A<sup>1</sup>) there exists a tree <sup>τ</sup><sup>2</sup> ∈ L(A<sup>2</sup>) s.t. <sup>τ</sup><sup>1</sup> <sup>|</sup>=<sup>ϕ</sup> <sup>τ</sup>2. 3

#### 3 Entailment Checking

We will now describe how we perform the entailment check R |=<sup>ϕ</sup> <sup>Q</sup>. Since we operate with trees and tree automata over symbolic values, we cannot establish entailment by running a classical TA language inclusion test based on complementing the automaton Q first. Instead, our algorithm for testing the entailment R |=<sup>ϕ</sup> <sup>Q</sup> is based on an on-the-fly TA inclusion checking algorithm [4,10],

<sup>3</sup> We never have a predicate from Σ<sup>p</sup> on the left-hand side of the entailment test, so we do not need to test implication between predicates, which would be needed for a complete procedure.

Algorithm 1: Checking R |=<sup>ϕ</sup> <sup>Q</sup> Input: A TA <sup>R</sup> = (Sr, Δr, Fr), a TA <sup>Q</sup> = (Sq, Δq, Fq), a formula <sup>ϕ</sup> Output: *true* if R |=<sup>ϕ</sup> <sup>Q</sup>, *false* otherwise <sup>1</sup> *Processed* ← ∅; <sup>2</sup> *Worklist* ← *Min*{(sr, Uq) | s<sup>r</sup> <sup>t</sup><sup>r</sup> −→ () <sup>∈</sup> <sup>Δ</sup>r, <sup>3</sup> U<sup>q</sup> = {u<sup>q</sup> ∈ Q<sup>q</sup> | u<sup>q</sup> <sup>t</sup><sup>r</sup> −→ () ∨ ∃u<sup>q</sup> pq −→ () ∈ Δ<sup>q</sup> : ϕ ⇒ pq[tr/-]}}; <sup>4</sup> while *Worklist* <sup>=</sup> <sup>∅</sup> do <sup>5</sup> (sr, Uq) ← *Worklist*.*pop*(); <sup>6</sup> if <sup>s</sup><sup>r</sup> <sup>∈</sup> <sup>F</sup><sup>r</sup> <sup>∧</sup> <sup>U</sup><sup>q</sup> <sup>∩</sup> <sup>F</sup><sup>q</sup> <sup>=</sup> <sup>∅</sup> then return *false* ; <sup>7</sup> *Processed* ← *Min*(*Processed* ∪ {(sr, Uq)}); <sup>8</sup> *tmp* ← ({(sr, Uq)} × *Processed*) ∪ (*Processed* × {(sr, Uq)}); <sup>9</sup> foreach ((s<sup>1</sup> r, U<sup>1</sup> <sup>q</sup> ), (s<sup>2</sup> r, U<sup>2</sup> <sup>q</sup> )) <sup>∈</sup> *tmp*, α <sup>∈</sup> <sup>X</sup> do <sup>10</sup> H<sup>r</sup> ← {s- <sup>r</sup> ∈ Q<sup>r</sup> | s- r α −→ (s<sup>1</sup> r, s<sup>2</sup> <sup>r</sup>) ∈ Δr}; 11 U- <sup>q</sup> ← {s<sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>q</sup> | ∃s<sup>1</sup> <sup>q</sup> <sup>∈</sup> <sup>U</sup><sup>1</sup> <sup>q</sup> , <sup>∃</sup>s<sup>2</sup> <sup>q</sup> <sup>∈</sup> <sup>U</sup><sup>2</sup> <sup>q</sup> : s<sup>q</sup> α −→ (s<sup>1</sup> q, s<sup>2</sup> <sup>q</sup>) ∈ Δq}; <sup>12</sup> foreach <sup>s</sup>- <sup>r</sup> ∈ H<sup>r</sup> *s.t.* (s- r, U- <sup>q</sup>) ∈ / *Processed* <sup>∪</sup> *Worklist* do <sup>13</sup> *Worklist* ← *Min*(*Worklist* ∪ {(s- r, U- <sup>q</sup>)}); <sup>14</sup> return *true*;

which avoids complementation. The on-the-fly inclusion-checking algorithm can be seen as an optimization of the classical construction, which would establish <sup>L</sup>(R)∩L(Q) ? = <sup>∅</sup> by first computing the complement <sup>Q</sup> of Q (using a bottom-up TA determinization), followed by computing the intersection A<sup>∩</sup> of Q and R, and, finally, checking language emptiness of A∩. In particular, the on-the-fly inclusion checking algorithm can be seen as doing all the operations at once. Furthermore, the algorithms in [4,10] also make use of the so-called *antichains* and TA *simulation* to prune the explored state space.

Our modification of the inclusion algorithm to test TA entailment, given in Algorithm 1, mainly differs from [4,10] in the way initial sets of state pairs are computed on Line 3. In particular, we match a state s<sup>r</sup> that can perform a leaf transition over <sup>t</sup><sup>r</sup> in <sup>R</sup> with the set <sup>U</sup><sup>q</sup> of all states in <sup>Q</sup> that can perform a leaf transition either over <sup>t</sup><sup>r</sup> or over a predicate <sup>p</sup><sup>q</sup> such that <sup>ϕ</sup> <sup>⇒</sup> <sup>p</sup><sup>q</sup>[tr/-] (we use <sup>p</sup><sup>q</sup>[tr/-] for a tuple <sup>t</sup><sup>r</sup> to denote the substitution of the tuple's components into the corresponding free variables of pq).

After that, the algorithms perform a simultaneous bottom-up traversal through <sup>R</sup> (represented by states <sup>s</sup>r) and the determinized version of <sup>Q</sup> (represented by sets of states <sup>U</sup>q). For each such pair (sr, U<sup>q</sup>), the algorithm first checks whether s<sup>r</sup> is a root state and U<sup>q</sup> does not contain any root state (cf. Line 6; this would mean that R accepts some tree that is not accepted by Q). If this does not hold, then the algorithm tries to find all already processed pairs that can make a transition with (sr, U<sup>q</sup>) (cf. Line 8) and continue from all such pairs. Each bottom-up successor (s <sup>r</sup>, U <sup>q</sup>) is then added to *Worklist* in the case it has not been seen previously (cf. Line 13).

The algorithm uses the function *Min* (cf. Lines 3, 7, and 13) to minimize the sets *Worklist* and *Processed* w.r.t. a subsumption relation, and the downward closure for *Processed* ∪ *Worklist* on Line 12 to prune the explored state space. Due to lack of space, we refer to the works [4,10] for more details about these optimizations.

### 4 Architecture

We illustrate the architecture of AutoQ in Fig. 2. The tool is written in C++ and uses the following external tools: the TA library Vata [28] for efficient testing of TA inclusion (when the postcondition uses only the term alphabet Σt) and the SMT solver Z3 for entailment checking of leaf symbols in

Fig. 2. The architecture of AutoQ. The input verification problem is {P, ϕ} C {Q}.

Algorithm 1. We allow any theory solver supported by Z3. In our experiment, we use QF\_NIRA. AutoQ takes as an input a quantum circuit in the OpenQASM format accompanied with the specification written as tree automata (.aut files) or the high-level specification language (.hsl files) introduced in Sect. 2.1.

*Preprocessor* reads the input files (.aut, .smt, .qasm, and .hsl files), translates specifications in the .hsl files into tree automata, and stores them using AutoQ's internal data structures. *Circuit Executor* then reads the circuit C and the TA P and generates another TA R obtained as the result after executing <sup>C</sup> from states in <sup>P</sup>, using the approach of [14] with the symbolic extension discussed in Sect. 2. AutoQ can also output the TA R for further analysis. Finally, *Entailment Checker* checks whether R |=<sup>ϕ</sup> <sup>Q</sup> and reports "verified" when the entailment holds and "bug found" otherwise.

#### 5 Use Cases

In this section, we describe several use cases of quantum algorithms and their important properties that we were able to verify using AutoQ fully automatically. We focus on the use of symbolic TA in this set of experiments and refer the readers to [14] for other experimental results. A selection of the obtained results is given in Table 1. An artifact that allows reproduction of the results is available as [13].

#### 5.1 Hadamard Square is Identity

Our first use case shows that the single qubit gate C that runs two consecutive H gates has the same effect as an identity matrix. We use the specification {P, ϕ} <sup>C</sup> {Q} with

$$\begin{aligned} \mathcal{P} &\colon (|0\rangle \colon (v\_a, v\_b, v\_c, v\_d), |1\rangle \colon (v'\_a, v'\_b, v'\_c, v'\_d)), & \varphi \colon true, \\ \mathcal{Q} &\colon (|0\rangle \colon (\Box\_a, \Box\_b, \Box\_c, \Box\_d) = (v\_a, v\_b, v\_c, v\_d), |1\rangle \colon (\Box\_a, \Box\_b, \Box\_c, \Box\_d) = (v'\_a, v'\_b, v'\_c, v'\_d)). \end{aligned}$$

In this simple example, the precondition P encodes an infinite number of quantum states, which is not expressible using the technique in [14]. We also included a buggy version by altering one of the H gates, and AutoQ managed to detect the injected bug. The results can be found in rows H<sup>2</sup> in Table 1.

#### 5.2 Zero Imaginary Part of Amplitudes

One property, which is shared by multiple algorithms, e.g., Bernstein-Vazirani's [8] and Grover's algorithm [25], is that the imaginary part of all amplitudes of the result is zero.

Let us focus on Bernstein-Vazirani's algorithm [8], which finds a secret bitstring s from an oracle using a single query. The algorithm begins with the quantum state <sup>|</sup>0<sup>n</sup>-, where <sup>n</sup> is the length of <sup>s</sup>, and ends with the quantum state <sup>|</sup>s-. The amplitudes of all basis states are either zero or one, the imaginary part of the amplitudes is, therefore, always zero. For a three-qubit circuit C implementing the algorithm, we can therefore use the specification: {P, ϕ} <sup>C</sup> {Q} with

$$\mathcal{P} \colon (|000\rangle \colon (1,0,0,0), |\*\rangle \colon (0,0,0,0)), \qquad \varphi \colon true, \qquad \mathcal{Q} \colon (|\*\rangle \colon \psi\_{\text{Im}}),$$

where <sup>ψ</sup>Im <sup>≡</sup> (<sup>b</sup> <sup>=</sup> <sup>−</sup><sup>d</sup> ∧ <sup>c</sup> = 0) (it will also be used later). In the definition of P, recall that we use the integer-quadruple representation of complex numbers (cf. Eq. (5)). In the postcondition Q, the free variables <sup>a</sup>, <sup>b</sup>, <sup>c</sup>, <sup>d</sup> are to be substituted by the corresponding terms in the obtained integer term quadruple (a, b, c, d) in the entailment check. Note that (a, b, c, d) represents the complex number (<sup>a</sup> + <sup>b</sup> √2 <sup>2</sup> <sup>−</sup> <sup>d</sup> √2 <sup>2</sup> ) + <sup>i</sup>(<sup>b</sup> √2 <sup>2</sup> <sup>−</sup> <sup>c</sup> <sup>+</sup> <sup>d</sup> √2 <sup>2</sup> ) (obtained from Eq. (5)). Because a, b, c, d are all integers, for the imaginary part to be zero, it must hold that <sup>c</sup> = 0 and <sup>b</sup> = <sup>−</sup>d.

When we run <sup>C</sup> from <sup>P</sup>, we obtain a TA <sup>R</sup> encoding (|010-: (1, 0, 0, 0), |∗-: (0, <sup>0</sup>, <sup>0</sup>, 0)) and the entailment R |=<sup>ϕ</sup> <sup>Q</sup> holds. See the rows BV(n) in Table <sup>1</sup> for the results of verifying the algorithm for circuits with secrets of size n. As in the previous example, we also included a buggy version to demonstrate AutoQ's bugfinding capability. We can see that AutoQ could verify the algorithm for secrets of a quite large size.

#### 5.3 Probability of Measuring the Correct Answer

Grover's algorithm [25] assumes a Boolean function f over n bits with only one satisfying assignment s and an oracle that evaluates f for a given input. The algorithm finds <sup>s</sup> with a high probability, say <sup>&</sup>gt; 0.9, using only <sup>O</sup>( √2<sup>n</sup>) oracle queries. The algorithm works iteratively, where each *Grover iteration* queries the oracle once and amplifies the amplitude of <sup>|</sup>s-. First, let C be a 6-qubit circuit Table 1. Results of verifying our use cases with AutoQ. The maximum peak memory consumption was 52 MiB for GroverAll(9). In most cases, the time of entailment was negligible, with the exception of GroverAll circuits. For instance, GroverAll(8) takes 2 m18 s for entailment checking (70% of the total time) and GroverAll(9) takes 21 m36 s for entailment checking (85% of the total time).


implementing Grover's search with the satisfying assignment <sup>s</sup> = 010, where the first three qubits of C are the work tape, and the following three are the ancillae. We use the following specification:

$$\begin{aligned} \mathcal{P} &\colon (|000000\rangle \colon \vec{1}, |\*\rangle \colon \vec{0}) \quad \text{where } \vec{1} = (1, 0, 0, 0) \text{ and } \vec{0} = (0, 0, 0, 0), & \varphi \colon true, \vec{1}, \\\mathcal{Q} &\colon (|010\rangle \colon |\square\_a|^2 > 0.9 \wedge \psi\_{\text{Im}}, |\*\rangle \colon |\square\_a|^2 < 0.1 \wedge \psi\_{\text{Im}}) \otimes (|000\rangle \colon \vec{1}, |\*\rangle \colon \vec{0}). \end{aligned}$$

Note that the postcondition Q also checks that all amplitudes in the result of the algorithm have a zero imaginary part (using <sup>ψ</sup>Im). See rows GroverSingle(n) in Table 1 for the results on circuits for n-bit functions f and a single oracle.

Next, we also show the correctness of Grover's algorithm w.r.t. all possible 3-qubit oracles. Let <sup>C</sup> be a <sup>9</sup>-qubit circuit implementing the algorithm, where the first three qubits are used for oracle generation, and the following six are the work tape and ancillae, similarly to GroverSingle. Our specification is now

$$\begin{aligned} \mathcal{P} & \exists i \in \{0, 1\}^3 : (|i000000\rangle \colon \vec{1}, |\*\rangle \colon \vec{0}), & \varphi & \colon true, \\ \mathcal{Q} & \exists i \in \{0, 1\}^3 : (|i\rangle \colon \vec{1}, |\*\rangle \colon 0) \otimes (|i\rangle \colon |\Box\_a|^2 > 0.9 \wedge \psi\_{\text{Im}}, |\*\rangle \colon |\Box\_a|^2 < 0.1 \wedge \psi\_{\text{Im}}) \\ & \qquad \quad \otimes (|000\rangle \colon \vec{1}, |\*\rangle \colon \vec{0}). \end{aligned}$$

Note that in the postcondition, we use i to relate the oracle value and the value on the work tape. The results are in rows GroverAll(n) in Table 1.

#### 5.4 Increasing Amplitude of the Correct Answer

Above, we show that we are able to automatically verify moderate-sized circuits for Grover's algorithm for the values of n up to 9 (for GroverAll) and 20 (for GroverSingle), which is quite large, but have difficulties going beyond that. The size of the circuit is <sup>O</sup>( √2<sup>n</sup>), which is quite large. Therefore, we also verify the algorithm w.r.t. a weaker property, which is, that in one iteration, the amplitude of the correct answer will increase.

Consider a function <sup>f</sup> over 2 bits with 01 being the only satisfying assignment and let <sup>C</sup> be a 4-qubit circuit encoding one Grover iteration, with two qubits as the work tape and two ancilla qubits. From Grover's correctness proof [25], we can derive that when v- <sup>&</sup>gt; <sup>0</sup>∧v<sup>h</sup> <sup>&</sup>gt; <sup>0</sup>∧(2<sup>n</sup> <sup>−</sup>1)v- > vh, a correct implementation will increase the probability of <sup>|</sup>01 and reduce others. We specify the verification problem as follows:

$$\begin{aligned} &\mathcal{P}\colon(|01\rangle\colon(v\_h,0,0,0),|\*\rangle\colon(v\_\ell,0,0,0))\otimes(|00\rangle\colon\vec{1},|\*\rangle\colon\vec{0}),\\ &\varphi\colon v\_\ell>0\wedge v\_h>0\wedge(2^2-1)v\_\ell>v\_h,\\ &\mathcal{Q}\colon(|01\rangle\colon|\square\_a|>|v\_h|\wedge\psi\_{\rm Im},|\*\rangle\colon|\square\_a|<|v\_\ell|\wedge\psi\_{\rm Im})\otimes(|00\rangle\colon\vec{1},|\*\rangle\colon\vec{0}).\end{aligned}$$

The results can be found in rows GroverIter(n) in Table 1. We can see that verification of one Grover iteration w.r.t. the weaker (but still quite useful) property scales much better than verification of full Grover's circuits, scaling to sizes of <sup>n</sup> <sup>≥</sup> 100.

## 6 Conclusion

We presented a specification language for specifying useful properties of quantum circuits and a tool AutoQ that can establish the correctness of the specification using an approach combining the technique from [14] with symbolic execution. Using the tool, we were able to fully automatically verify several important properties of a selection of quantum circuits. To the best of our knowledge, for some of the properties, we are the first ones that could verify them fully automatically.

Acknowledgements. We thank the reviewers for their useful remarks that helped us improve the quality of the paper. This work was supported by the Czech Ministry of Education, Youth and Sports project LL1908 of the ERC.CZ programme, the Czech Science Foundation project GA23-07565S, the FIT BUT internal project FIT-S-23- 8151, and the NSTC QC project under Grant no. NSTC 111-2119-M-001-004- and 112-2119-M-001-006-.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Bounded Verification for Finite-Field-Blasting

## In a Compiler for Zero Knowledge Proofs

Alex Ozdemir1(B) , Riad S. Wahby<sup>2</sup>, Fraser Brown<sup>2</sup>, and Clark Barrett<sup>1</sup>

> <sup>1</sup> Stanford University, Stanford, USA aozdemir@cs.stanford.edu <sup>2</sup> Carnegie Mellon University, Pittsburgh, USA

Abstract. Zero Knowledge Proofs (ZKPs) are cryptographic protocols by which a prover convinces a verifier of the truth of a statement without revealing any other information. Typically, statements are expressed in a high-level language and then compiled to a low-level representation on which the ZKP operates. Thus, *a bug in a ZKP compiler can compromise the statement that the ZK proof is supposed to establish.* This paper takes a step towards ZKP compiler correctness by partially verifying a *field-blasting* compiler pass, a pass that translates Boolean and bitvector logic into equivalent operations in a finite field. First, we define correctness for field-blasters and ZKP compilers more generally. Next, we describe the specific field-blaster using a set of encoding rules and define verification conditions for individual rules. Finally, we connect the rules and the correctness definition by showing that if our verification conditions hold, the field-blaster is correct. We have implemented our approach in the CirC ZKP compiler and have proved bounded versions of the corresponding verification conditions. We show that our partially verified field-blaster does not hurt the performance of the compiler or its output; we also report on four bugs uncovered during verification.

### 1 Introduction

Zero-Knowledge Proofs (ZKPs) are powerful tools for building privacy-preserving systems. They allow one entity, the *prover* P, to convince another, the *verifier* V, that some secret data satisfies a public property, *without revealing anything else about the data*. ZKPs underlie a large (and growing!) set of critical applications, from billion-dollar private cryptocurrencies, like Zcash [24,53] and Monero [2], to research into auditable sealed court orders [20], private gun registries [26], privacy-preserving middleboxes [23], and zero-knowledge proofs of exploitability [11]. This breadth of applications is possible because of the generality of ZKPs. In general, P knows a secret *witness* w, whereas V knows a *property* φ and a public *instance* x. P must show that φ(x, w) = -. Typically, x and w are vectors of variables in a finite field F, and φ can be any system of equations over the variables, using operations + and ×. Because φ itself is an input to P and V, and because of the expressivity of field equations, a single implementation of P and V can serve many different purposes.

Humans find it difficult to express themselves directly with field equations, so they use *ZKP compilers*. A ZKP compiler converts a high-level predicate φ into an equivalent system of field equations φ. In other words, a ZKP compiler *generalizes* a ZKP: by compiling φ to φ and then using a ZKP for φ, one obtains a ZKP for φ . There are many industrial [3,5,6,14,21,45,55,66] and academic [4,18,28,29,46,48,50,54,63] ZKP compilers.

The correctness of a ZKP compiler is critical for security— a bug in the compiler could admit proofs of false statements— but verification is challenging for three reasons. First, the definition of correctness for a ZKP compiler is nontrivial; we discuss later in this section. Second, ZKP compilers span multiple domains. The high-level predicate φ is typically expressed in a language with common types such as Booleans and fixed-width integers, while the output φ is over a large, prime-order field. Thus, any compiler correctness definition must span these domains. Third, ZKP compilers are evolving and performance-critical; verification must not inhibit future changes or degrade compiler performance.

In this work, we develop tools for automatically verifying the *field-blaster* of a ZKP compiler. A ZKP compiler's field-blaster is the pass that converts from a formula over Booleans, fixed-width integers, and finite-field elements, to a system of field equations; as a transformation from bit-like types to field equations, the field-blaster exemplifies the challenge of cross-domain verification.

Our paper makes three contributions. First, we formulate a precise correctness definition for a ZKP compiler. Our definition ensures that a correct compiler preserves the completeness and soundness of the underlying ZK proof system.<sup>1</sup> More specifically, given a ZK proof system where statements are specified in a low-level language L, and a compiler from a high-level language H to L, if the compiler is correct by our definition, it extends the ZK proof system's soundness and completeness properties to statements in H. Further, our definition is preserved under sequential composition, so proving the correctness of each compiler pass individually suffices to prove correctness of the compiler itself.

Second, we give an architecture for a verifiable field-blaster. In our architecture, a field-blaster is a set of "encoding rules." We give verification conditions (VCs) for these rules, and we show that if the VCs hold, then the field-blaster is correct. Our approach supports *automated* verification because (bounded versions of) the VCs can be checked automatically. This reduces both the up-front cost of verification and its maintenance cost.

Third, we do a case study. Using our architecture, we implement a new field-blaster for CirC [46] ("SIR-see"), an infrastructure used by state-of-theart ZKP compilers. We verify bounded versions of our field-blaster's VCs using SMT-based finite-field reasoning [47], and show that our field blaster does not compromise CirC's performance. We also report on four bugs that our verification effort uncovered, including a soundness bug that allowed the prover to "lie" about the results of certain bit-vector comparisons. We note that the utility of

<sup>1</sup> Roughly speaking, a ZK proof system is complete if it is possible to prove every true statement, and is sound if it is infeasible to prove false ones.

our techniques is not limited to CirC: most ZKP compilers include something like the field-blaster we describe here.

In the next sections, we discuss related work (Sect. 1.1), give background on ZKPs and CirC (Sect. 2), present a field-blasting example (Sect. 3), describe our architecture (Sect. 4), give our verification conditions (Sect. 5), and present the case study (Sect. 6).

#### 1.1 Related Work

*Verified Compilers.* There is a rich body of work on verifying the correctness of traditional compilers. We focus on compilation for ZKPs; this requires different correctness definitions that relate bit-like types to prime field elements. In the next paragraphs, we discuss more fine-grained differences.

Compiler verification efforts fall into two broad categories: *automated*—verification leveraging automated reasoning solvers—and *foundational*—manual verification using proof assistants (e.g., Coq [8] or Isabelle [44]). CompCert [36], for example, is a Coq-verified C compiler with verified optimization passes (e.g., [40]). Closest to our work is backend verification, which proves correct the translation from an intermediate representation to machine code. CompCert's lowering [37] is verified, as is CakeML's [31] lowering to different ISAs [19,57]. While such foundational verification offers strong guarantees, it imposes a heavy proof burden; creating CompCert, for example, took an expert team eight years [56], and any updates to compiler code require updates to proofs.

Automated verification, in contrast, does not require writing and maintaining manual proofs.<sup>2</sup> Cobalt [34], Rhodium [35], and PEC [32] are domain-specific languages (DSLs) for writing automatically-verified compiler optimizations and analyses. Most closely related to our work is Alive [39], a DSL for expressing verified peephole optimizations, local rewrites that transform snippets of LLVM IR [1] to better-performing ones. Alive addresses transformations over fixed types (while we address lowering to finite field equations) and formulates correctness in the presence of undefined behavior (while we formulate correctness for ZKPs). Beyond Alive, Alive2 [38] provides translation validation [41,51] for LLVM [33], and VeRA [10] verifies range analysis in the Firefox JavaScript engine.

There is also work on verified compilation for domains more closely related to ZKPs. The Porcupine [15] compiler automatically synthesizes representations for fully-homomorphic encryption [62], and Gillar [58] proves that optimization passes in the Qiskit [60] quantum compiler are semantics-preserving. While these works compile from high-level languages to circuit representations, the correctness definitions for their domains do not apply to ZKP compilers.

*Verified Compilation to Cryptographic Proofs.* Prior works on verified compilation for ZKPs (or similar) take the foundational approach (with attendant proof maintenance burdens), and they do not formulate a satisfactory definition of compiler correctness. PinocchioQ [18] builds on CompCert [36]. The

<sup>2</sup> Automated verification generally leverages solvers. This is a particularly appealing approach in our setting, since CirC (our compiler infrastructure of interest) already supports compilation to SMT formulas.

authors formulate a correctness definition that preserves the *existential soundness* of a ZKP but does not consider completeness, knowledge soundness, or zero-knowledge (see Sect. 2.2). Leo [14] is a ZKP compiler that produces (partial) ACL2 [27] proofs of correct compilation; work to emit proofs from its field-blaster is ongoing.

Recent work defines security for *reductions of knowledge* [30]. These let P convince V that it knows a witness for an instance of relation R<sup>1</sup> by proving it knows a witness for an instance of an easier-to-prove relation R2. Unlike ZKP compilers, P and V *interact* to derive R<sup>2</sup> using V's randomness (e.g., proving that two polynomials are nonzero w.h.p. by proving that a random linear combination of them is), whereas ZKP compilers run ahead of time and non-interactively.

Further afield, Ecne [65] is a tool that attempts to verify that the input to a ZKP encodes a *deterministic* computation. It does not consider any notion of a specification of the intended behavior. A different work [25] attempts to automatically verify that a "widget" given to a ZKP meets some specification. They consider widgets that could be constructed manually or with a compiler. Our focus is on verifying a compiler pass.

#### 2 Background

#### 2.1 Logic

We assume usual terminology for many-sorted first-order logic with equality ( [17] gives a complete presentation). We assume every signature includes the sort Bool, constants True and False of sort Bool, and symbol family <sup>≈</sup><sup>σ</sup> (abbreviated ≈) with sort σ × σ → Bool for each sort σ. We also assume a family of conditionals: symbols *ite*<sup>σ</sup> ("if-then-else", abbreviated *ite*) of sort Bool × σ × σ → σ.

A *theory* is a pair T = (Σ, **I**), where Σ is a signature and **I** is a class of Σinterpretations. A Σ-*formula* is a term of sort Bool. A Σ-formula φ is *satisfiable* (resp., *unsatisfiable*) in T if it is satisfied by some (resp., no) interpretation in **I**. We focus on two theories. The first is T*BV* , the SMT-LIB theory of bitvectors [52,61], with signature Σ*BV* including a bit-vector sort BV[n] for each n > <sup>0</sup> with bit-vector constants <sup>c</sup>[n] of sort BV[n] for each <sup>c</sup> <sup>∈</sup> [0, <sup>2</sup><sup>n</sup> <sup>−</sup> 1], and operators including & and <sup>|</sup> (bitwise and, or) and <sup>+</sup>[n] (addition modulo <sup>2</sup><sup>n</sup>). We write t[i] to refer to the i *th* bit of bit-vector t, where t[0] is the least-significant bit. The other theory is T*<sup>F</sup><sup>p</sup>* , which is the theory corresponding to the finite field of order p, for some prime p [47]. This theory has signature Σ*<sup>F</sup><sup>p</sup>* containing the sort FFp, constant symbols 0,...,p − 1, and operators + and ×.

In this paper, we assume all interpretations interpret sorts and symbols in the same way. We write dom(v) for the set interpreting the sort of a variable v. We assume that Bool, True, and False are interpreted as {-, ⊥}, -, and ⊥, respectively; Σ*BV* -interpretations follow the SMT-LIB standard; and Σ*<sup>F</sup><sup>p</sup>* interpretations interpret symbols as the corresponding elements and operations in Fp, a finite field of order p (for concreteness, this could be the integers modulo p). Note that only the values of variables can vary between two interpretations.

For a signature Σ, let t be a Σ-term of sort σ, with free variables x1,...,xn, respectively of sort σ1,...,σn. We define the function t <sup>ˆ</sup> : dom(x1) × ··· ×

$$\underbrace{\mathcal{P}(\phi, x, w)}\_{\mathsf{Prove}(\mathsf{pk}, x, w)} \xleftarrow{\mathsf{pk}} \overbrace{\pi}^{\mathsf{Setup}(\phi)} \xleftarrow{\mathsf{yk}} \underbrace{\mathcal{V}(\phi, x)}\_{\mathsf{Verify}(\mathsf{pk}, x, \pi)}$$

Fig. 1. The information flow for a zero-knowledge proof.

dom(xn) <sup>→</sup> dom(t) as follows. Let **<sup>x</sup>** <sup>∈</sup> dom(x1) ×···× dom(xn). Let <sup>M</sup> be an interpretation that interprets each x<sup>i</sup> as xi. Then t ˆ(**x**) = t <sup>M</sup> (i.e., the interpretation of t in M). For example, the term t = a∧¬a defines t <sup>ˆ</sup> : Bool <sup>→</sup> Bool <sup>=</sup> λ x.⊥. In the following, we follow the convention used above in using the standard font (e.g., x) for logical variables and a sans serif font (e.g., x) to denote meta-variables standing for values (i.e., elements of σ<sup>M</sup> for some σ and M). Also, abusing notation, we'll conflate single variables (of both kinds) with vectors of variables when the distinction doesn't matter. Note that a formula φ is *satisfiable* if there exist values <sup>x</sup> such that <sup>φ</sup>ˆ(x) = -. It is *valid* if for all values <sup>x</sup>, <sup>φ</sup>ˆ(x) = -.

For terms s, t and variable x, t[x → s] denotes t with all occurrences of x replaced with s. For a sequence of variable-term pairs, S = (x<sup>1</sup> → s1,...,x<sup>n</sup> → sn), t[S] is defined to be t[x<sup>1</sup> → s1] ··· [x<sup>n</sup> → sn].

#### 2.2 Zero Knowledge Proofs

As mentioned above, Zero-knowledge proofs (ZKPs) make it possible to prove that some secret data satisfies a public property—without revealing the data itself. See [59] for a full presentation; we give a brief overview here, and then describe how general-purpose ZKPs are used.

*Overview and Definitions.* In a cryptographic proof system, there are two parties: a *verifier* V and a *prover* P. V knows a public *instance* x and asks P to show that it has knowledge of a secret *witness* w satisfying a public *predicate* φ(x, w) from a predicate class <sup>Φ</sup> (a set of formulas) (i.e., <sup>φ</sup>ˆ(x,w) = -). Figure 1 illustrates the workflow. First, a trusted party runs an efficient (i.e., polytime in an implicit security parameter λ) algorithm Setup(φ) which produces a *proving key* pk and a *verifying key* vk. Then, P runs an efficient algorithm Prove(pk, x,w) → π and sends the resulting *proof* π to V. Finally, V runs an efficient verification algorithm Verify(vk, x, π) → {-, ⊥} that accepts or rejects the proof. A zero-knowledge argument of knowledge for class Φ is a tuple Π=(Setup, Prove,Verify) with three informal properties for every φ ∈ Φ and every x ∈ dom(x),w ∈ dom(w):


Technically, the system is an "argument" rather than a "proof" because soundness only holds against efficient adversaries. Also note that knowledge soundness requires that an entity must "know" a valid w to produce a proof; it is not enough for a valid w to simply exist. We give more precise definitions in Appendix A.

*Representations for ZKPs.* As mentioned above, ZKP applications are manifold (Sect. 1)—from cryptocurrencies to private registries. This breadth of applications is possible because ZKPs support a broad class of predicates. Most commonly, these predicates are expressed as *rank-1 constraint systems* (R1CSs). Recall that F<sup>p</sup> is a prime-order finite field (also called a *prime field*). We will drop the subscript p when it is not important. In an R1CS, x and w are vectors of elements in <sup>F</sup>; let <sup>z</sup> <sup>∈</sup> <sup>F</sup><sup>m</sup> be their concatenation. The function <sup>φ</sup><sup>ˆ</sup> can be defined by three matrices <sup>A</sup>,B, <sup>C</sup> <sup>∈</sup> <sup>F</sup>n×m; <sup>φ</sup>ˆ(x,w) holds when Az◦Bz <sup>=</sup> Cz, where ◦ is the element-wise product. Thus, φ can be viewed as n conjoined *constraints*, where each constraint i is of the form ( - <sup>j</sup> aij z<sup>j</sup> ) × ( - <sup>j</sup> bij z<sup>j</sup> ) ≈ ( - <sup>j</sup> cij z<sup>j</sup> ) (where the aij , bij and cij are constant symbols from Σ*<sup>F</sup><sup>p</sup>* , and the z<sup>j</sup> are a vector of variables of sort FFp). That is, each constraint enforces a single non-linear multiplication.

#### 2.3 Compilation Targeting Zero Knowledge Proofs

To write a ZKP about a high-level predicate φ, that predicate is first compiled to an R1CS. A *ZKP compiler* from class Φ (a set of Σ-formulas) to class Φ (a set of Σ -formulas) is an efficient algorithm Compile(φ ∈ Φ) → (φ ∈ Φ , Extx, Extw). Given a predicate φ(x, w), it returns a predicate φ (x , w ) as well as two efficient and deterministic algorithms, instance and witness *extenders*: Ext<sup>x</sup> : dom(x) → dom(x ) and Ext<sup>w</sup> : dom(x) × dom(w) → dom(w ). <sup>3</sup> For example, CirC [46] can compile a Boolean-returning C function (in a subset of C) to an R1CS.

At a high-level, φ and φ should be "equisatisfiable", with Ext<sup>x</sup> and Ext<sup>w</sup> mapping satisfying values for φ to satisfying values for φ . That is, for all x ∈ dom(x) and <sup>w</sup> <sup>∈</sup> dom(w) such that <sup>φ</sup>ˆ(x,w) = -, if x = Extx(x) and w = Extw(x,w), then φˆ (x ,w ) = -. Furthermore, for any x, it should be impossible to (efficiently) find w satisfying φˆ (Extx(x),w ) = without knowing a w satisfying <sup>φ</sup>ˆ(x,w) = -. In Sect. 5.1, we precisely define correctness for a predicate compiler.

One can build a ZKP for class Φ from a compiler from Φ to Φ and a ZKP for Φ . Essentially, one runs the compiler to get a predicate φ ∈ Φ , as well as Ext<sup>x</sup> and Extw. Then, one writes a ZKP to show that φˆ (Extx(x), Extw(x,w)) = -. In Appendix A, we give this construction in full and prove it is secure.

*Optimization.* The primary challenge when using ZKPs is cost: typically, Prove is at least three orders of magnitude slower than checking φ directly [64]. Since Prove's cost scales with n (the constraint count), it is *critical* for the compiler to minimize n. The space of optimizations is large and complex, for two reasons. First, the compiler can introduce fresh variables. Second, only equisatifiability not logical equivalence—is needed. Compilers in this space exploit equisatisfiability heavily to efficiently represent high-level constructs (e.g., Booleans, bitvectors, arrays, . . . ) as an R1CS.

<sup>3</sup> For technical reasons, the runtime of Ext*<sup>x</sup>* and the size of its description must be poly(λ, |x|)—not just poly(λ) (Appendix A). .

$$\mathsf{legm} \xrightarrow[]{} \begin{matrix} (1) \\ \text{front-end} \\ \end{matrix} \xrightarrow{} \mathsf{IR} \xrightarrow{} \cdots \xrightarrow{} \mathsf{IR}[\Sigma\_{BV} \cup \Sigma\_{F}] \xrightarrow[]{} \begin{matrix} (3) \\ \text{lowering} \\ \text{field-blasing} \end{matrix} \xrightarrow{} \mathsf{R1CS}$$

Fig. 2. The architecture of CirC

As a (simple!) example, consider the Boolean computation a ≈ c<sup>1</sup> ∨···∨ ck. Assume that c 1,...,c <sup>k</sup> are variables of sort FF and that we add constraints c <sup>i</sup>(1 − c <sup>i</sup>) ≈ 0 to ensure that c <sup>i</sup> has to be 0 or 1 for each i. Assume further that (c <sup>i</sup> ≈ 1) encodes c<sup>i</sup> for each i. How can one additionally ensure that a (also of sort FF) is also forced to be equal to 0 or 1 and that (a ≈ 1) is a correct encoding of a? Given that there are k − 1 ORs, natural approaches use Θ(k) constraints. One clever approach is to introduce variable x and enforce constraints x ( - i c <sup>i</sup>) ≈ a and (1 − a )(- i c <sup>i</sup>) ≈ 0. In any interpretation where any c<sup>i</sup> is true, the corresponding interpretation for a must be 1 to satisfy the second constraint; setting x to the sum's inverse satisfies the first. If all c<sup>i</sup> are false, the first constraint ensures a is 0. This technique assumes the sum does not overflow; since ZKP fields are typically large (e.g., with p on the order of 2<sup>255</sup>), this is usually a safe assumption.

*CirC.* CirC [46] is an infrastructure for building compilers from high-level languages (e.g., a C subset), to R1CSs. It has been used in research projects [4,12], and in industrial R&D. Figure 2 shows the structure of an R1CS compiler built with CirC. First, the front-end of the compiler converts the source program into CirC-IR. CirC-IR is a term IR based on SMT-LIB that includes: Booleans, bit-vectors, fixed-size arrays, tuples, and prime fields.<sup>4</sup> Second, the compiler optimizes and simplifies the IR so that the only remaining sorts are Booleans, bit-vectors, and the target prime field. Third, the compiler lowers the simplified IR to an R1CS predicate over the target field. For ZKPs built with CirC, *the completeness, soundness, and zero-knowledge of the end-to-end system depend on the correctness of CirC itself*.

#### 3 Overview and Example

To start, we view CirC's lowering pass as two passes (Fig. 2). The first pass, "(finite-)field-blasting," converts a many-sorted IR (representable as a (Σ*BV* ∪ Σ*F*)-formula) to a conjunction of field equations (Σ*F*-equations). The second pass, "flattening," converts this conjunction of field equations to an R1CS.

Our focus is on verifying the first pass. We begin with a worked example of how to field-blast a small snippet of CirC-IR (Sect. 3.1). This example will illustrate four key ideas (Sect. 3.2) that inspire our field-blaster's architecture.

<sup>4</sup> We list all CirC-IR operators for Booleans, bit-vectors, and prime fields in Appendix C. Almost all are from SMT-LIB.


Table 1. New variables and assertions when compiling the example <sup>φ</sup>.

#### 3.1 An Example of Field-Blasting

We start with an example CirC-IR predicate expressed as a (Σ*BV* ∪Σ*F*)-formula:

$$\phi \triangleq (x\_0 \oplus w\_0) \land (w\_1 +\_{[4]} x\_1 \approx w\_1) \land (x\_2 \&\ w\_1 \approx x\_2) \land (x\_3 \approx w\_2 \times w\_2) \tag{1}$$

The predicate includes: the XOR of two Booleans ("⊕"), a bit-vector sum, a bitvector AND, and a field product. x<sup>0</sup> and w<sup>0</sup> are of sort Bool, x1, x2, and w<sup>1</sup> are of sort BV[4], and <sup>x</sup><sup>3</sup> and <sup>w</sup><sup>2</sup> are of sort FFp. We'll assume that <sup>p</sup> <sup>2</sup><sup>4</sup>. Table <sup>1</sup> summarizes the new variables and assertions we create during field-blasting; we describe the origin of each assertion and new variable in the next paragraphs.

*Lowering Clause One (Booleans).* We begin with the Boolean term (x<sup>0</sup> ⊕ w0). We will use 1 and 0 to represent and ⊥. We introduce variables x <sup>0</sup> and w <sup>0</sup> of sort FF<sup>p</sup> to represent x<sup>0</sup> and w<sup>0</sup> respectively. To ensure that w <sup>0</sup> is 0 or 1, we assert: w 0(w <sup>0</sup>−1) <sup>≈</sup> <sup>0</sup>. <sup>5</sup> <sup>x</sup>0⊕w<sup>0</sup> is then represented by the expression <sup>1</sup>−x 0−w 0+2x 0w 0. Setting this equal to 1 enforces that x<sup>0</sup> ⊕ w<sup>0</sup> must be true. These new assertions and fresh variables are reflected in the first three rows of the table.

*Lowering Clause Two and Three (Bit-vectors).* Before describing how to bitblast the second and third clauses in φ, we discuss bit-vector representations in

<sup>5</sup> Later (Sect. 5), we will see that "well-formedness" constraints like this are unnecessary for instance variables, such as x0. .

general. A bit-vector t can be viewed as a sequence of b bits or as a non-negative integer less than 2<sup>b</sup>. These two views suggest two natural representations in a prime-order field: first, as one field element t <sup>u</sup>, whose unsigned value agrees with t (assuming the field's size is at least 2<sup>b</sup>); second, as b elements t 0,...,t <sup>b</sup>−1, that encode the bits of t as 0 or 1 (in our encoding, t <sup>0</sup> is the least-significant bit and t <sup>b</sup>−<sup>1</sup> is the most-significant bit). The first representation is simple, but with it, some field values (e.g., 2<sup>b</sup>) don't corresponding to any possible bit-vector. With the second approach, by including equations t i(t <sup>i</sup>−1) ≈ 0 in our system, we ensure that any satisfying assignment corresponds to a valid bit-vector. However, the extra b equations increase the size of our compiler's output.

We represent φ's w<sup>1</sup> bit-wise: as w 1,0,...,w <sup>1</sup>,3, and we represent the instance variable x<sup>1</sup> as x <sup>1</sup>,u. <sup>6</sup> For the constraint <sup>w</sup><sup>1</sup> <sup>+</sup>[4] <sup>x</sup><sup>1</sup> <sup>≈</sup> <sup>w</sup>1, we compute the sum in the field and bit-decompose the result to handle overflow. First, we introduce new variable s and set it equal to x <sup>1</sup>,u + -3 <sup>i</sup>=0 2<sup>i</sup> w <sup>1</sup>,i. Then, we bit-decompose s , requiring s ≈ -4 <sup>i</sup>=0 2<sup>i</sup> s <sup>i</sup>, and s i(s <sup>i</sup> − 1) ≈ 0 for i ∈ [0, 4]. Finally, we assert s <sup>i</sup> ≈ w <sup>1</sup>,i for i ∈ [0, 3]. This forces the lowest 4 bits of the sum to be equal to w1.

The constraint x<sup>2</sup> & w<sup>1</sup> ≈ x<sup>2</sup> is more challenging. Since x<sup>2</sup> is an instance variable, we initially encode it as x <sup>2</sup>,u. Then, we consider the bit-wise AND. There is no obvious way to encode a bit-wise operation, other than bit-bybit. So, we convert x <sup>2</sup>,u to a bit-wise representation: We introduce witness variables x 2,0,...,x <sup>2</sup>,<sup>3</sup> and equations x 2,i(x <sup>2</sup>,i − 1) ≈ 0 as well as equation x <sup>2</sup>,u ≈ -3 <sup>i</sup>=0 2<sup>i</sup> x <sup>2</sup>,i. Then, for each i we require x 2,iw <sup>1</sup>,i ≈ x <sup>2</sup>,i.

*Lowering the Final Clause (Field Elements).* Finally, we consider the field equation x<sup>2</sup> ≈ w<sup>2</sup> ×w2. Our target is also field equations, so lowering this is straightforward. We simply introduce primed variables and copy the equation.

#### 3.2 Key Ideas

This example highlights four ideas that guide the design of our field-blaster:


<sup>6</sup> We represent w<sup>1</sup> bit-wise so that we can ensure the representation is well-formed with constraints w- 1*,i*(w- <sup>1</sup>*,i* −1) ≈ 0. As previously noted, such well-formedness constraints are not needed for an instance variable like x1.(See footnote 5).


Table 2. Encodings for each term sort. Only bit-vectors have two encoding kinds.

### 4 Architecture

In this section, we present our field-blaster architecture. To compile a predicate φ to a system of field equations φ , our architecture processes each term t in φ using a post-order traversal. Informally, it represents each t as an "encoding" in φ : a term (or collection of terms) over variables in φ . Each encoding is produced by a small algorithm called an "encoding rule".

Below, we define the type of encodings Enc (Sect. 4.1), the five different types of encoding rules (Sect. 4.2), and a calculus that iteratively applies these rules to compile all of φ (Sect. 4.3).

#### 4.1 Encodings

Table 2 presents our tagged union type Enc of possible term encodings. Each variant comprises the term being encoded, its tag (the *encoding kind*), and a sequence of field terms. The encoding kinds are bit (a Boolean as 0/1), uint (a bit-vector as an unsigned integer), bits (a bit-vector as a sequence of bits), and field (a field term trivially represented as a field term). Each encoding has an intended semantics: a condition under which the encoding is considered valid. For instance, a bit encoding of Boolean t is valid if the field term f is equal to *ite*(t, 1, 0).

#### 4.2 Encoding Rules

An encoding rule is an algorithm that takes and/or returns encodings, in order to represent some part of the input predicate as field terms and equations.

Primitive Operations. A rule can perform two primitive operations: creating new variables and emitting assertions. In our pseudocode, the primitive function fresh(name, t, isInst) <sup>→</sup> <sup>x</sup> creates a fresh variable. Argument isInst is a Boolean indicating whether x is an instance variable (as opposed to a witness). Argument t is a field term (over variables from φ and previously defined primed variables) that expresses how to compute a value for x . For example, to create a field variable w that represents Boolean witness variable w, a rule can call fresh(w , *ite*(w, 1, 0), ⊥). The compiler uses t to help create the Ext<sup>x</sup> and Ext<sup>w</sup> algorithms. A rule asserts a formula t (over primed variables) by calling assert(t ).

Fig. 3. Pseudocode for some bit-vector rules: variable uses a uint encoding for instances and bit-splits witnesses to ensure they're well-formed, const bit-splits the constant it's given, assertEq asserts unsigned or bit-wise equality, and convert either does a bit-sum or bit-split.

Rule Types. There are five types of rules: (1) Variable rules variable(t, isInst) <sup>→</sup> e take a variable t and its instance/witness status and return an encoding of that variable made up of fresh variables. (2) Constant rules const(t) → e take a constant term t and produce an encoding of t comprising terms that depend only on t. Since t is a constant, the terms in e can be evaluated to field constants (see the calculus in Sect. 4.3).<sup>7</sup> The const rule cannot call fresh or assert. (3) Equality rules assertEq(e, e ) take two encodings of the same kind and emit assertions that equate the underlying terms. (4) Conversion rules convert(e, kind ) → e take an encoding and convert it to an encoding of a different kind. Conversions are only non-trivial for bit-vectors, which have two encoding kinds: uint and bits. (5) Operator rules apply to terms t of form o(t1,...,tn). Each operator rule takes t, o, and encodings of the child terms t<sup>i</sup> and returns an encoding of t. Some operator rules require specific kinds of encodings; before using such an operator rule, our calculus (Sect. 4.3) calls the convert rule to ensure the input encodings are the correct kind. Figure 3 gives pseudocode for the first four rule types, as applied to bit-vectors. Figure 4 gives pseudocode for two bit-vector operator encoding rules. A field blaster uses many operator rules: in our case study (Sect. 6) there are 46.

<sup>7</sup> Having const(t) return terms that depend on t (rather than directly returning constants) is useful for constructing verification conditions for const.

Fig. 4. Pseudocode for some bit-vector operator rules. bvZeroExt zero-extends a bitvector; for bit-wise encodings, it adds zero bits, and for unsigned encodings, it simply copies the original encoding. bvMulUint multiplies bit-vectors, all assumed to be unsigned encodings. We show only the case where the multiplication cannot overflow in the field: in this case the rule performs the multiplication in the field, and bit-splits the result to implement reduction modulo 2*<sup>b</sup>* . The rules use ff2bv, which converts from a field element to a bit-vector (discussed in Sect. 6.1).

#### 4.3 Calculus

We now give a non-deterministic calculus describing how our field-blaster applies rules to compile a predicate φ(x, w) into a system of field equations.

A calculus state is a tuple of three items: (E, A, F). The *encoding store* E is a (multi-)map from terms to sets of encodings. The *assertions formula* A is a conjunction of all field equations asserted via assert. The *fresh variable definitions sequence* F is a sequence consisting of pairs, where each pair (v, t) matches a single call to fresh(v, t, . . .).

Figure 5 shows the transitions of our calculus. We denote the result of a rule as A , F , e ← r(...), where A is a formula capturing any new assertions, F is a sequence of pairs capturing any new variable definitions, and e is the rule's return value. We may omit one or more results if they are always absent for a particular rule. For encoding store E, E ∪(t → e) denotes the store with e added to t's encoding set.

There are five kinds of transitions. The Const transition adds an encoding for a constant term. The const rule returns an encoding e whose terms depend on the constant c; e is a new encoding identical to e, except that each of its terms has been evaluated to obtain a field constant. The Var transition adds an encoding for a variable term. The Conv transition takes a term that is already encoded and re-encodes it with a new encoding kind. The kinds operator returns all legal values of kind for encodings of a given sort. The Op<sup>r</sup> transition applies operator rule r. This transition is only possible if r's operator kind agrees with o, and if its input encoding kinds agree with e. The Finish transition applies when φ has been encoded. It uses const and assertEq to build assertions that hold when φ = -. Rather than producing a new calculus state, it returns the outputs of the calculus: the assertions and the variable definitions.

To meet the requirements of the ZKP compiler, our calculus must return two extension function: Ext<sup>x</sup> and Ext<sup>w</sup> (Sect. 2.2). Both can be constructed from the fresh variable definitions F. One subtlety is that Extx(x) (which assigns values to fresh instance variables) is a function of x only—it cannot depend on the witness variables of φ. We ensure this by allowing fresh instance variables to only be created by the variable rule, and only when it is called with isInst <sup>=</sup> -.

*Strategy.* Our calculus is non-deterministic: multiple transitions are possible in some situations; for example, some conversion is almost always applicable. The strategy that decides which transition to apply affects field blaster performance (Appendix D) but *not* correctness.

## 5 Verification Conditions

In this section, we first define correctness for a ZKP compiler (Sect. 5.1). Then, we give verification conditions (VCs) for each type of encoding rule (Sect. 5.2). Finally, we show that if these VCs hold, our calculus is a correct ZKP compiler (Sect. 5.3).

#### 5.1 Correctness Definition

Definition 1 (Correctness). *A ZKP compiler* Compile(φ) → (φ , Extx, Extw) *is correct if it is demonstrably complete and demonstrably sound.*

• *demonstrable completeness*: For all x ∈ dom(x),w ∈ dom(w) such that <sup>φ</sup>ˆ(x,w) = -,

$$\hat{\phi}'(\mathsf{Ext}\_x(\mathsf{x}), \mathsf{Ext}\_w(\mathsf{x}, \mathsf{w})) = \top$$

• *demonstrable soundness*: There exists an efficient algorithm Inv(x ,w ) → w such that for all x ∈ dom(x),w ∈ dom(w ) such that φˆ (Extx(x),w ) = -,

> φˆ(x, Inv(Extx(x),w )) = -

Demonstrable completeness (respectively, soundness) requires the existence of a witness for φ (resp., φ) when a witness exists for φ (resp., φ ); this existence is *demonstrated* by an efficient algorithm Ext<sup>w</sup> (resp., Inv) that computes the witness.

Correct ZKP compilers are important for two reasons. First, since sequential composition preserves correctness, one can prove a multi-pass compiler is correct pass-by-pass. Second, a correct ZKP compiler from Φ to Φ can be used to generalize a ZKP for Φ to one for Φ. We prove both properties in Appendix A.

Theorem 1 (Compiler Composition). *If* Compile *and* Compile *are correct, then the compiler* Compose(Compile , Compile) *(Appendix A) is correct.*

Theorem 2 (ZKP Generalization). *(informal) Given a correct ZKP compiler* Compile *from* Φ *to* Φ *and a ZKP for* Φ *, we can construct a ZKP for* Φ*.*

#### 5.2 Rule VCs

Recall (Sect. 4) that our language manipulates encodings through five types of encoding rules. We give verification conditions for each type of rule. Intuitively, these capture the correctness of each rule in isolation. Next, we'll show that they imply the correctness of a ZKP compiler that follows our calculus.

Our VCs quantify over valid encodings. That is, they have the form: "for any valid encoding e of term t, . . . " We can quantify over an encoding e by making each t<sup>i</sup> ∈ terms(e) a fresh variable, and quantifying over the ti. Encoding validity is captured by a predicate *valid*(e, t), which is defined to be the validity condition in Table 2. Each VC containing encoding variables *e* implicitly represents a conjunction of instances of that VC, one for each possible tuple of kinds of *e*, which is fixed for each instance. If a VC contains *valid*(e, t), the sort of t is constrained to be *compatible* with kind(e). For a kind and a sort to be compatible, they must occur in the same row of Table 2. We define the equality predicate *equal*(e, e ) as <sup>i</sup> terms(e)[i] ≈ terms(e )[i].

*Encoding Uniqueness.* First, we require the uniqueness of valid encodings, for any fixed encoding kind. Table 3 shows the VCs that ensure this. Each row is a formula that must be valid, for all compatible encodings and terms. The first two rows ensure that there is a bijection from terms to their valid encodings (in the first row, we consider only instances for which kind(e) = kind(e )). The function *fromTerm*(t, kind) → e maps a term and an encoding kind to a valid encoding of that kind, and the function *toTerm*(e) → t maps a valid encoding to its encoded term. The third and fourth rows ensure that *fromTerm* and *toTerm* are correctly defined. We will use *toTerm* in our proof of calculus soundness (Appendix B) and we will use *fromTerm* to optimize VCs for faster verification (Sect. 6.1).


Table 3. VCs related to encoding uniqueness.

Table 4. VCs for encoding rules.


For an example of the *valid*, *fromTerm*, and *toTerm* functions, consider a Boolean b encoded as an encoding e with kind bit and whose terms consist of a single field element f. Validity is defined as *valid*(e, b) = f ≈ *ite*(b, 1, 0), *toTerm*(f) is defined as <sup>f</sup> <sup>≈</sup> <sup>1</sup>, and *fromTerm*(b, bit) is (b, bit, *ite*(b, <sup>1</sup>, 0)).

*VCs for Encoding Rules.* Table 4 shows our VCs for the rules of Fig. 5. For each rule application, A and F denote, respectively, the assertions and the variable declarations generated when that rule is applied. We explain some of the VCs in detail.

First, consider a rule r<sup>o</sup> for operator o applied to inputs t1,...,tk. The rule takes input encodings e1,...,e<sup>k</sup> and returns an output e . It is sound if the validity of its inputs and its assertions imply the validity of its output. It is complete if the validity of its inputs implies its assertions and the validity of its output, after substituting fresh variable definitions.

Second, consider a variable rule. Its input is a variable term t, and it returns e , a putative encoding thereof. Note that e does not actually contain t, though the substitutions in F may bind the fresh variables of e to functions of t. For the rule to be sound when t is a witness variable (t ∈ w), the assertions must imply that e is valid for *some* term t . For the rule to be sound when t is an instance variable (t ∈ x), the assertions must imply that e is valid for t, when the instance variables in e are replaced with their definition (F<sup>x</sup> denotes F, restricted to its declarations of instance variables).<sup>8</sup> For the variable rule to be complete (for an instance or a witness), the assertions and the validity of e for t must follow from F.

Third, consider a constant rule. Its input is a constant term t, and it returns an encoding e. Recall that the terms of e are always evaluated, yielding e which only contains constant terms. Thus, correctness depends only on the fact that e is always a valid encoding of the input t. This can be captured with a single VC.

#### 5.3 A Correct Field-Blasting Calculus

Given rules that satisfy these verification conditions, we show that the calculus of Sect. 4.3 is a correct ZKP compiler. The proof is in Appendix B.

Theorem 3 (Correctness). *With rules that satisfy the conditions of Sect. 5.2, the calculus of Sect. 4.3 is demonstrably complete and sound (Def. 1).*

## 6 Case Study: A Verifiable Field-Blaster for CirC

We implemented and partially verified a field-blaster for CirC [46]. Our implementation is based on a refactoring of CirC's original field blaster to conform to our encoding rules (Sect. 4.2) and consists of <sup>≈</sup>850 lines of code (LOC).<sup>9</sup> As described below, we have (partially) verified our encoding rules, but trust our calculus (Sect. 4.3, ≈150 LOC) and our flattening implementations (Fig. 2, ≈160 LOC).

While porting rules, we found 4 bugs in CirC's original field-blaster (see Appendix G), including a severe soundness bug. Given a ZKP compiled with CirC, the bug allowed a prover to incorrectly compare bit-vectors. The prover, for example, could claim that the unsigned value of 0010 is greater than *or less than* that of 0001. A patch to fix all 4 bugs (in the original field blaster) has been upstreamed, and we are in the process of upstreaming our new field blaster implementation into CirC.

#### 6.1 Verification Evaluation

Our implementation constructs the VCs from Sect. 5.2 and emits them as SMT-LIB (extended with a theory of finite fields [47]). We verify them with cvc5, because it can solve formulas over bit-vectors and prime fields [47]. The verification is partial in that it is bounded in two ways. We set <sup>b</sup> <sup>∈</sup> <sup>N</sup> to be the maximum bit-width of any bit-vector and <sup>a</sup> <sup>∈</sup> <sup>N</sup> to be the maximum number of arguments to any n-ary operator. In our evaluation, we used a = 4 and b = 4. These bounds are small, but they were sufficient to find the bugs mentioned above.

<sup>8</sup> The different soundness conditions for instance and witness variables play a key role in the proof of Theorem 3. Essentially: since the condition for instances replaces variables with their definitions, the validity of the encodings of instance variables need not be explicitly enforced in A. This is why some constraints could be omitted in our field-blasting example.(See footnote 5).

<sup>9</sup> Our implementation is in Rust, as is CirC.

*Optimizing Completeness VCs.* Generally, cvc5 verifies soundness VCs more quickly than completeness VCs. This is surprising at first glance. To see why, consider the soundness (S) and completeness (C) conditions for a conversion rule from e to e that generates assertions A and definitions F:

$$S \stackrel{\Delta}{=} (A \land valid(e, t)) \to valid(e', t) \qquad C \stackrel{\Delta}{=} (valid(e, t) \to (A \land valid(e', t)))[F]$$

In both, t is a variable, e contains variables, and there are variables in e and A that are defined by F. In C, though, some variables are replaced by their definitions in F—which makes the number of variables (and thus the search space)—seem smaller for C than S. Yet, cvc5 is slower on C.

The problem is that, while the field operations in A are standard (e.g., +, ×, and =), the definitions in F use a CirC-IR operator that (once embedded into SMT-LIB) is hard for cvc5 to reason about. That operator, (ff2bv b), takes a prime field element x and returns a bit-vector v. If x's integer representative is less than 2<sup>b</sup>, then v's unsigned value is equal to x; otherwise, v is zero.

The ff2bv operator is trivial to evaluate but hard to embed. cvc5's SMT-LIB extension for prime fields only supports +, × and =, so no operator can directly relate x to v. Instead, we encode the relationship through b Booleans that represent the bits of v. To test whether x < 2<sup>b</sup>, we use the polynomial <sup>f</sup>(x) = <sup>2</sup>*b*−<sup>1</sup> <sup>i</sup>=0 (x−i), which is zero only on [0, <sup>2</sup><sup>b</sup>−1]. The bit-splitting essentially forces cvc5 to guess v's value; further, f's high degree slows down the Gröbner basis computations that form the foundation of cvc5's field solver.

To optimize verification of the completeness VCs, we reason about CirC-IR directly. First, we use the uniqueness of valid encodings and the *fromTerm* function. Since the VC assumes *valid*(e, t), we know e is equal to *fromTerm*(t, kind(e)). We use this equality to eliminate e from the completeness VC, leaving:

$$(A \land valid(e', t))[F][e \mapsto fromTerm(t, \mathsf{kind}(e))]$$

Since F defines all variables in A and e , the only variable after substitution is t. So, when t is a Boolean or small bit-vector, an exhaustive search is very effective;<sup>10</sup> we implemented such a solver in 56 LOC, using CirC's IR as a library.

For soundness VCs, this approach is less effective. The *fromTerm* substitution still applies, but if F introduces fresh field variables, they are not eliminated and thus, the final formula contains field variables, so exhaustion is infeasible.

*Verification Results.* We ran our VC verification on machines with Intel Xeon E5- 2637 v4 CPUs.<sup>11</sup> Each attempt is limited to one physical core, 8GB memory, and 30 min. Figure 6 shows the number of VCs verified by cvc5 and our exhaustive solver. As expected, the exhaustive solver is effective on completeness VCs for Boolean and bit-vector rules, but ineffective on soundness VCs for rules that introduce fresh field variables. There are four VCs that neither solver verifies

<sup>10</sup> So long as the exhaustive solver reasons directly about all CirC-IR operators.

<sup>11</sup> We omit the completeness VCs for ff2bv. See Appendix C.


Fig. 6. VCs verified by different solvers. 'uniq' denotes the VCs of Table 3; others are from Table 4. 'C' denotes completeness; 'S': soundness.

Fig. 7. The performance of CirC with the verified and unverified field-blaster. Metrics are summed over the 61 functions in the Z# standard library.

within 30 min: bvadd with (b = 4, a = 4), and bvmul with (b = 3, a = 4) and (b = 4, a ≥ 3). Most other VCs verify instantly. In Appendix E, we analyze how VC verification time depends on a and b.

#### 6.2 Performance and Output Quality Evaluation

We compare CirC with our field-baster ("Verified") against CirC with its original field-blaster ("Unverified")<sup>12</sup> on three metrics: compiler runtime, memory usage, and the final R1CS constraint count. Our benchmark set is the standard library for CirC's Z# input language (which extends ZoKrates [16,68] v0.6.2). Our testbed runs Linux with 32GB memory and an AMD Ryzen 2700.

There is no difference in constraints, but the verified field-blaster slightly improves compiler performance: –8% time and –2% memory (Fig. 7). We think that the small improvement is unrelated to the fact that the new field blaster is verified. In Appendix E, we discuss compiler performance further.

#### 7 Discussion

In this work, we present the first automatically verifiable field-blaster. We view the field-blaster as a set of rules; if some (automatically verifiable) conditions hold for each rule, then the field-blaster is correct. We implemented a performant and partially verified field-blaster for CirC, finding 4 bugs along the way.

Our approach has limitations. First, we require the field-blaster to be written as a set of encoding rules. Second, we only verify our rules for bit-vectors of bounded size and operators of bounded arity. Third, we assume that each rule is a pure function: for example, it doesn't return different results depending on

<sup>12</sup> After fixing the bugs we found. See Sect. 6.

the time. Future work might avoid the last two limitations through bit-widthindependent reasoning [42,43,67] and a DSL (and compiler) for encoding rules. It would also be interesting to extend our approach to: a ZKP with a nonprime field [7,13], a compiler IR with partial or non-deterministic semantics, or a compiler with correctness that depends on computational assumptions.

Acknowledgements. We appreciate the help and guidance of Andres Nötzli, Dan Boneh, and Evan Laufer.

This material is in part based upon work supported by the DARPA SIEVE program and the Simons foundation. Any opinions, findings, and conclusions or recommendations expressed in this report are those of the author(s) and do not necessarily reflect the views of DARPA. It is also funded in part by NSF grant number 2110397 and the Stanford Center for Automated Reasoning.

## A Zero-Knowledge Proofs and Compilers

This appendix is available in the full version of the paper [49].

## B Compiler Correctness Proofs

This appendix is available in the full version of the paper [49].

## C CirC-IR

This appendix is available in the full version of the paper [49].

## D Optimizations to the CirC Field-Blaster

This appendix is available in the full version of the paper [49].

## E Verified Field-Blaster Performance Details

This appendix is available in the full version of the paper [49].

## F Verifier Performance Details

This appendix is available in the full version of the paper [49].

## G Bugs Found in the CirC Field Blaster

This appendix is available in the full version of the paper [49].

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Formally Verified EVM Block-Optimizations

Elvira Albert<sup>1</sup> , Samir Genaim<sup>1</sup> , Daniel Kirchner2,3 , and Enrique Martin-Martin1(B)

<sup>1</sup> Complutense University of Madrid, Madrid, Spain {elvira,samir.genaim}@fdi.ucm.es, emartinm@ucm.es <sup>2</sup> Ethereum Foundation, Zug, Switzerland daniel.kirchner@ethereum.org <sup>3</sup> University of Bamberg, Bamberg, Germany

Abstract. The efficiency and the security of *smart contracts* are their two fundamental properties, but might come at odds: the use of optimizers to enhance efficiency may introduce bugs and compromise security. Our focus is on EVM (Ethereum Virtual Machine) *block-optimizations*, which enhance the efficiency of jump-free blocks of opcodes by eliminating, reordering and even changing the original opcodes. We reconcile efficiency and security by providing the verification technology to formally prove the correctness of EVM block-optimizations on smart contracts using the Coq proof assistant. This amounts to the challenging problem of proving semantic equivalence of two blocks of EVM instructions, which is realized by means of three novel Coq components: a symbolic execution engine which can execute an EVM block and produce a symbolic state; a number of simplification lemmas which transform a symbolic state into an equivalent one; and a checker of symbolic states to compare the symbolic states produced for the two EVM blocks under comparison.

Artifact: https://doi.org/10.5281/zenodo.7863483

Keywords: Coq · Ethereum Virtual Machine · Smart Contracts · Optimization · Theorem Proving

## 1 Introduction

In many contexts, security requirements are critical and formal verification today plays an essential role to verify/certify these requirements. One of such contexts is the blockchain, in which software bugs on *smart contracts* have already caused several high profile attacks (e.g., [14–17,30,37]). There is hence huge interest and investment in guaranteeing their correctness, e.g., Certora [1], Veridise [2], apriorit [3], Consensys [4], Dedaub [5] are companies that offer smart contract audits using formal methods' technology. In this context, efficiency is of high relevance as well, as deploying and executing smart contracts has a cost (in the corresponding cryptocurrency). Hence, optimization tools for smart contracts have

This work was funded partially by the Ethereum Foundation under Grant ID FY22- 0698 and the Spanish MCI, AEI and FEDER (EU) project PID2021-122830OB-C41.

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 176–189, 2023. https://doi.org/10.1007/978-3-031-37709-9\_9

emerged in the last few years (e.g., ebso [29], SYRUP [12], GASOL [11], the solc optimizer [9]). Unfortunately, there is a dichotomy of efficiency and correctness: as optimizers can be rather complex tools (not formally verified), they might introduce bugs and potential users might be reluctant of optimizing their code. This has a number of disruptive consequences: owners will pay more to deploy (non-optimized) smart contracts; clients will pay more to run transactions every time they are executed; the blockchain will accept less transactions as they are more costly. Rather than accepting such a dichotomy, our work tries to overturn it by developing a fully automated formal verification tool for proving the correctness of the optimized code.

The general problem addressed by the paper is formally verifying semantic equivalence of two bytecode programs, an initial code I and an optimization of it O –what is considered a great challenge in formal verification. For our purpose, we will narrow down the problem by (1) considering fragments of code that are *jump-free* (i.e., they do not have loops nor branching), and by (2) considering only stack EVM operations (memory/storage opcodes and other blockchainspecific opcodes are not considered). These assumptions are realistic as working on jump-free blocks still allows proving correctness for optimizers that work at the level of the blocks of the CFG (e.g., super-optimizers [11,12,29] and many rule-based optimizations performed by the Solidity compiler [9]). Considering only stack optimizations, and leaving out memory and storage simplifications and blockchain-specific bytecodes, does not restrict the considered programs, as we work at the smaller block partitions induced by the not handled operations found in the block (splitting into the block before and after). Even in our narrowed setting, the problem is challenging as block-optimizations can include any elimination, reorder and even change of the original bytecodes.

Consider the next block I, taken from a real smart contract [8]. The GASOL optimizer [11], relying on the commutativity of OR and AND, optimizes it to O:

This saves 11 bytes because (1) the expression SUB(SHL(168,1),256) –that corresponds to "PUSH2 0x100 PUSH1 0x1 PUSH1 0xa8 SHL SUB" – is computed twice; but it can be duplicated if the stack operations are properly made saving 8 bytes; and (2) two SWAPs are needed instead of 5, saving 3 more bytes.

This paper proposes a technique, and a corresponding tool, to automatically verify the correctness of EVM block-optimizations (as those above) on smart contracts using the Coq proof assistant. This amounts to the challenging problem of proving semantic equivalence of two blocks of EVM instructions, which is realized by means of three main components which constitute our main contributions (all formalized and proven correct in Coq): (1) a symbolic interpreter in Coq to symbolically execute the EVM blocks I and O and produce resulting symbolic states S<sup>I</sup> and SO, (2) a series of simplification rules, which transform S<sup>I</sup> and S<sup>O</sup> into

I: PUSH2 0x100 PUSH1 0x1 PUSH1 0xa8 SHL SUB NOT SWAP1 SWAP2 AND PUSH1 0x8 SWAP2 SWAP1

SWAP2 SHL PUSH2 0x100 PUSH1 0x1 PUSH1 0xa8 SHL SUB AND OR PUSH1 0x5

O: PUSH2 0x100 PUSH1 0x1 PUSH1 0xa8 SHL SUB DUP1 NOT SWAP2 PUSH1 0x8 SHL AND

SWAP2 AND OR PUSH1 0x5

equivalent ones S <sup>I</sup> and S <sup>O</sup>, (3) a checker of symbolic states in Coq to decide if two symbolic states S <sup>I</sup> and S <sup>O</sup> are semantically equivalent.

## 2 Background

The Ethereum VM (EVM) [38] is a stack-based VM with a word size of 256-bits that is used to run the smart contracts on the Ethereum blockchain. The EVM has the following categories of bytecodes: *(1)* Stack operations; *(2)* Arithmetic operations; *(3)* Comparison and bitwise logic operations;*(4)* Memory and storage manipulation;*(5)* Control flow operations; *(6)* Blockchain-specific opcodes, e.g., block and transaction environment information, compute hash, calls, etc. The first three types of opcodes are handled within our verifier, and handling optimizations on opcodes of types 4-6 is discussed in Sect. 6.

The focus of our work is on optimizers that perform optimizations only at the level of the blocks of the CFG (i.e., intra-block optimizations). A well-known example is the technique called *super-optimization* [26] which, given a loop-free sequence of instructions searches for the optimal sequence of instructions that is semantically equivalent to the original one and has optimal cost (for the considered criteria). This technique dates back to 1987 and has had a revival [25,31] thanks to the availability of SMT solvers that are able to do the search efficiently. We distinguish two types of possible intra-block optimizations: *(i)* Rule-based optimizations which consist in applying arithmetic/bitwise simplifications like ADD(X,0)=X or NOT(NOT(X))=X (see a complete list of these rules in App. A in [10]); and *(ii)* Stack-data optimizations which consist in searching for alternative stack operations that lead to an output stack with exactly the same data.

*Example 1 (Intra-block optimizations).* The rule-based optimization (*i*) X+0 → X simplifies the block "PUSH1 0x5, PUSH1 0x0, ADD" to "PUSH1 0x5". On the other hand, stack-data optimizations (*ii*) can optimize to "ADD, DUP1" the block "DUP2, DUP2, ADD, SWAP2, ADD", as duplicating the operands and repeating the ADD operation is the same as duplicating the result. Unlike rule-based optimization, stack-data optimizations cannot be expressed as simple patterns that can be easily recognized.

The first type of optimizations are applied by the optimizer integrated in the Solidity compiler [9] as rule transformations, and they are also applied by EVM optimizers in different ways. ebso [29] encodes the semantics of arithmetic and bitwise operations in the SMT encoding so that the SMT solver searches for these optimizations together with those of type (ii). Instead, SYRUP [12] and GASOL [11] apply rule-based optimizations in a pre-phase and leave to the SMT solver only the search for the second type of optimizations. This classification of optimizations is also relevant for our approach as (i) will require integrating and proving all simplification rules correct (Sect. 4.2) while (ii) are implicit within the symbolic execution (Sect. 4.1). A block of EVM code that has been subject to optimizations of the two types above is in principle "provable" using our tool.

There is not much work yet on formalizing the EVM semantics in Coq. One of the most developed approaches is [22], which is a definition of the EVM semantics in the Lem [28] language that can be exported to interactive theorem provers like Isabelle/HOL or Coq. According to the comparison in [21], this implementation of EVM "is executable and passes all of the VM tests except for those dealing with more complicated intercontract execution". However, we have decided not to use it for our checker due to three reasons: *(a)* the generated Coq code from Lem definitions is not "necessarily idiomatic" and thus it would generate a very complex EVM formalization in Coq that would make theorems harder to state and prove; *(b)* the author of the Lem definition states that "the Coq version of the EVM definition is highly experimental"; and *(c)* it is not kept up-to-date.

The other most developed implementation of the EVM semantics in Coq that we have found is [23]. It supports all the basic EVM bytecodes we consider in our checker, and looked promising as our departing point. The implementation uses *Bedrock Bit Vectors (bbv)* [7] for representing the EVM 256-bit values, as we use as well. It is not a full formalization of the EVM because it does not support calling or creation of smart contracts, but provides a function that simulates consequent application of opcodes to the given execution state, call info and Ethereum state mocks. The latter two pieces of information would add complexity and are not needed for our purpose. Therefore, we decided to develop our own EVM formalization in Coq (presented in Sect. 3) which builds upon some ideas of [23], but introduces only the minimal elements we need to handle the instructions supported by the checker. This way the proofs will be simpler and conciser.

#### 3 EVM Semantics in Coq

Our EVM formalization is a concrete interpreter that executes a block of EVM instructions. For representing EVM words we use EVMWord that stands for the type "word 256" of the bbv library [7]. For representing instructions we use:


Type stack\_oper\_instr defines instructions that operate only on the stack, i.e., each pops a fixed number of elements and pushes a single value back (see App. B in [10] for the full list). Type instr encapsulates this category together with the stack manipulation instruction (PUSH, etc.). The type block stands for "list instr".

To keep the framework general, and simplify the proofs, the actual implementation of instructions from stack\_op\_instr are provided to the interpreter as input. For this, we use a map that associates instructions to implementations:

```
Inductive stack_operation :=
 | StackOp (comm: bool) (n : nat) (f : list EVMWord → option EVMWord).
```
#### Definition stack\_op\_map := map stack\_oper\_instr stack\_operation.

The type stack\_operation defines an implementation for a given operation: comm indicates if the operation is commutative; n is the number of stack elements to be removed and passed to the operation; and f is the actual implementation. The type stack\_op\_map maps keys of type stack\_oper\_instr to values of type stack\_operation. Suppose evm\_add and evm\_mul are implementations of ADD and MUL (see App. C in [10]), the actual stack operations map is constructed as:

```
Definition evm_stack_opm : stack_op_map :=
 ADD |→i StackOp true 2 evm_add; MUL |→i StackOp true 2 evm_mul; ...
```
In addition, we require the operations in the map to be valid with respect to the properties that they claim to satisfy (e.g., commutativity), and that when applied to the right number of arguments they should succeed (i.e., do not return None). We refer to this property as valid\_stack\_op\_map.

An execution state (or simply state) includes only a stack (currently we support only stack operations) which is as a list of EVMWord, and the interpreter is a function that takes a block, an initial state, and a stack operations map, and iteratively executes each of the block's instructions:

```
Definition stack := list EVMWord.
Inductive state :=
 | ExState (stk: stack).
Fixpoint concr_int (p: block) (st: state) (ops: stack_op_map): option state := ...
```
The result can be either Some st or None in case of an error which are caused only due to stack overflow. In particular, we are currently not taking into account the amount of *gas* needed to execute the block. Our implementation follows the EVM semantics [38], considering the simplicity of the supported operations, the concrete interpreter is a minimal trusted computing base. In the future, we plan to test it using the EVM test suite.

## 4 Formal Verification of EVM-Optimizations in Coq

Two jump-free blocks p1 and p2 are equivalent *wrt.* to an initial stack size k, if for any initial stack of size k, the executions of p1 and p2 succeed and lead to the same state. Formally:

```
Definition sem_eq_blocks: (p1 p2: block) (k: nat) (ops: stack_op_map) : Prop :=
 ∀ (in_st: state) (in_stk: stack),
   get_stack in_st = in_stk → length in_stk = k →
       ∃ (out_st : state), concr_int p1 in_st ops = Some out_st ∧
                                concr_int p2 in_st ops = Some out_st
```
Note that when concr\_int returns None for both p1 and p2, they are not considered equivalent because in the general case they can fail due to different reasons. Note also that EVM operations are deterministic, so if concr\_int evaluates to a sucessful final state out\_st it will be unique.

An EVM block equivalence checker is a function that takes two blocks, the size of the initial stack, and returns true/false. Providing the size k of the initial stack is not a limitation of the checker, as this information is statically known in advance. Note that the maximum stack size in EVM is bounded by 1024, and that if the execution (of one or both blocks) *wrt.* to this concrete initial stack size leads to under/over stack overflow they cannot be reported equivalent. The soundness of the equivalence checker is stated as follows:

```
Definition eq_block_chkr_snd (chkr : block → block → nat → bool) : Prop :=
 ∀ (p1 p2: block) (k: nat),
     chkr p1 p2 k = true → sem_equiv_blocks p1 p2 k evm_stack_opm
```
Given two blocks *p*<sup>1</sup> and *p*2, checking their equivalence (in Coq) has the following components: *(i) Symbolic Execution (Sect.* 4.1*):* it is based on an interpreter that symbolically executes a block, *wrt.* an initial symbolic stack of size k, and generates a final symbolic stack. It is applied on both *p*<sup>1</sup> and *p*<sup>2</sup> to generate their corresponding symbolic output states S<sup>1</sup> and S2. *(ii) Rule optimizations (Sect.* 4.2*):* it is based on simplification rules that are often applied by program optimizers, which rewrite symbolic states to equivalent "simpler" ones. This step simplifies S<sup>1</sup> and S<sup>2</sup> to S <sup>1</sup> and S <sup>2</sup>. *(iii) Equivalence Checker (Sect.* 4.3*):* it receives the simplified symbolic states, and determines if they are equivalent for any concrete instantiation of the symbolic input stack. It takes into account, for example, the fact that some stack operations are commutative.

#### 4.1 EVM Symbolic Execution in Coq

Symbolic execution takes an initial symbolic state (i.e., stack) [*s*0*,...,sk*], a block, and a map of stack operations, and generates a final symbolic state (i.e., stack) with symbolic expressions, e.g., [5+*s*0*, s*1*, s*2], representing the corresponding computations. In order to incorporate rule-based optimizations in a simple and efficient way, we want to avoid compound expressions such as 5+(*s*<sup>0</sup> ∗ *s*1), and instead use temporal fresh variables together with a corresponding map that assigns them to simpler expressions. E.g, the stack [5 + (*s*<sup>0</sup> ∗ *s*1)*, s*2] would be represented as a tuple ([*e*1*, s*2]*,* {*e*<sup>1</sup> → 5 + *e*0*, e*<sup>0</sup> → *s*<sup>0</sup> ∗ *s*1}) where *e<sup>i</sup>* are fresh variables. To achieve this, we define the *symbolic stack* as a list of elements that can be numeric constant values, initial stack variables or fresh variables:

```
Inductive sstack_val : Type :=
 | Val (val: EVMWord) | InStackVar (var: nat) | FreshVar (var: nat).
Definition sstack := list sstack_val.
```
and the map that assigns meaning to fresh variables is a list that maps each fresh variable to a sstack\_val, or to a compound expression:

```
Inductive smap_val : Type :=
 | SymBasicVal (val: sstack_val)
 | SymOp (opcode : stack_op_instr) (args : list sstack_val).
Definition smap := list (nat∗smap_val).
```
Finally, a symbolic state is defined as a SymState term where k is the size of the initial stack, maxid is the maximum id used for fresh variables (kept for efficiency), sstk is a symbolic stack, and m is the map of fresh variables.


*Example 2 (Symbolic execution).* Given *p*<sup>1</sup> ≡"PUSH1 0x5 SWAP2 MUL ADD" and *p*<sup>2</sup> ≡ "PUSH1 0x0 ADD MUL PUSH1 0x5 ADD", symbolically executing them with <sup>k</sup>=3 we obtain the symbolic states represented by sst1 <sup>≡</sup> ([*e* <sup>1</sup>*, s*2]*,* {*e* <sup>1</sup> → *e* <sup>0</sup> + 5*, e* <sup>0</sup> → *<sup>s</sup>*<sup>1</sup> <sup>∗</sup> *<sup>s</sup>*0}) and sst2 <sup>≡</sup> ([*e*2*, s*2]*,* {*e*<sup>2</sup> → 5 + *<sup>e</sup>*1*, e*<sup>1</sup> → *<sup>e</sup>*<sup>0</sup> <sup>∗</sup> *<sup>s</sup>*1*, e*<sup>0</sup> → 0 + *<sup>s</sup>*0}).

Note that we impose some requirements on symbolic states to be valid. E.g., for any element *i* → *v* of the fresh variables map, all fresh variables that appear in *v* have smaller indices than *i*. We refer to these requirements as valid\_sstate.

Given a symbolic (final) state and a concrete initial state, we can convert the symbolic state into a concrete one by replacing each *s<sup>i</sup>* by its corresponding value, and evaluating the corresponding expressions (following their definition in the stack operations map). We have a function to perform this evaluation that takes the stack operations map as input:

```
Definition eval_sstate (in_st: state) (sst: sstate) (ops : stack_op_map)
 : option state := ...
```
Our symbolic execution engine is a function that takes the size of the initial stack, a block, a map of stack operations, and generates a symbolic final state:

Definition sym\_exec (p: block) (k: nat) (ops: stack\_op\_map) : option sstate := ...

Note that we do not pass an initial symbolic state, but rather we construct it inside using k. Also, the result can be None in case of failure (the causes are the same as those of conc\_interpreter).

Soundness of sym\_exec means that whenever it generates a symbolic state as a result, then the concrete execution from any stack of size k will succeed and produce a final state that agrees with the generated symbolic state:

```
Theorem sym_exec_snd:
 ∀ (p: block) (k: nat) (ops: stack_op_map) (sst: sstate),
     valid_stack_op_map ops →
     sym_exec p k ops = Some sst →
     valid_sstate sst ∧
     ∀ (in_st : state) (in_stk : stack),
       get_stack in_st = in_stk →
       length in_stk = k →
         ∃ (out_st : state),
           concr_int p in_st ops = Some out_st ∧
           eval_sstate in_st sst ops = Some out_st
```
#### 4.2 Simplification Rules

To capture equivalence of programs that have been optimized according to "rule simplifications" (type (i) in Sect. 2) we need to include the same type of simplifications (see App. A in [10]) in our framework. Without this, we will capture EVM-blocks equivalence only for "data-stack equivalence optimizations" (type (ii) in Sect. 2).

An *optimization function* takes as input a symbolic state, and tries to simplify it to an equivalent state. E.g, if a symbolic state includes *e<sup>i</sup>* → *s*<sup>3</sup> + 0, we can replace it by *e<sup>i</sup>* → *s*3. The following is the type used for optimization functions:

Definition optim := sstate → sstate∗bool.

Optimization functions never fail, i.e., in the worst case they return the same symbolic state. This is why the returned value includes a Boolean to indicate if any optimization has been applied, which is useful when composing optimizations later. The soundness of an optimization function can be stated as follows:

```
Definition optim_snd (opt: optim) : Prop :=
 forall (sst: sstate) (sst': sstate) (b: bool),
   valid_sstate sst → opt sst = (sst', b) →
     (valid_sstate sst' ∧
        forall (st st': state), eval_sstate st sst evm_stack_opm = Some st' →
                                eval_sstate st sst' evm_stack_opm = Some st').
```
We have implemented and proven correct the most-used simplification rules (see App. A in [10]). E.g., there is an optimization function optimize\_add\_zero that rewrites expressions of the form *E* + 0 or 0 + *E* to *E*, and its soundness theorem is:

Theorem optimize\_add\_zero\_snd: optim\_snd optimize\_add\_zero.

*Example 3.* Consider again the blocks of Example 2. Using optimize\_add\_zero we can rewrite sst2 to sst2 <sup>≡</sup> ([*e*2*, s*2]*,* {*e*<sup>2</sup> → 5 + *<sup>e</sup>*1*, e*<sup>1</sup> → *<sup>e</sup>*<sup>0</sup> <sup>∗</sup> *<sup>s</sup>*1*, e*<sup>0</sup> → *<sup>s</sup>*0}), by replacing *e*<sup>0</sup> → 0 + *s*<sup>0</sup> by *e*<sup>0</sup> → *s*0.

Note that the checker can be easily extended with new optimization functions, simply by providing a corresponding implementation and a soundness proof. Optimization functions can be combined to define *simplification strategies*, which are also functions of type optim. E.g., assuming that we have *basic* optimization functions f1,...,f*n*: *(1)* Apply f1,...,f*<sup>n</sup>* iteratively such that in iteration *i* function f*<sup>i</sup>* is applied as many times as it can be applied. *(2)* Apply each f*<sup>i</sup>* once in some order and repeat the process as many times as it can be applied. *(3)* Use the simplifications that were used by the optimizer (it needs to pass these hints).

#### 4.3 Stacks Equivalence Modulo Commutativity

We say that two symbolic stacks sst1 and sst2 are equivalent if for every possible initial concrete state st they evaluate to the same state. Formally:

Definition eq\_sstate (sst1 sst2: sstate) (ops : stack\_op\_map) : Prop := ∀ (st: state), eval\_sstate st sst1 ops = eval\_sstate st sst2 ops.

However, this notion of semantic equivalence is not computable in general, and thus we provide an effective procedure to determine such equivalence by checking that at every position of the stack both contain "similar" expressions:

Definition eq\_sstate\_chkr (sst1 sst2: sstate) (ops : stack\_op\_map) : bool := ...

To determine if two stack elements are similar, we follow their definition in the map if needed until we obtain a value that is not a fresh variable, and then either (1) both are equal constant values; (2) both are equal initial stack variables; or (3) both correspond to the same instruction and their arguments are (recursively) equivalent (taking into account the commutativity of operations). E.g., the stack elements (viewed as terms) DIV(MUL(*s*0,ADD(*s*1,*s*2)),0x16) and DIV(MUL(ADD(*s*2,*s*1),*s*0),0x16) are considered equivalent because the operations ADD and MUL are commutative.

*Example 4.* eq\_sstate\_chkr fails to prove equivalence of sst1 and sst2 of Example 2, because, when comparing *e*<sup>2</sup> and *e* <sup>1</sup>, it will eventually check if 0+*s*<sup>0</sup> and *s*<sup>0</sup> are equivalent. It fails because the comparison is rather "syntactic". However, it succeeds when comparing sst1 and sst2' (Example 3), which is a simplification of sst2.

This procedure is an approximation of the semantic equivalence, and it can produce false negatives if two symbolic states are equivalent but are expressed with different syntactic constructions. However, it is sound:

Theorem eq\_sstate\_chkr\_snd: ∀ (sst1 sst2: sstate) (ops : stack\_op\_map), valid\_stack\_op\_map ops → valid\_sstate sst1 → valid\_sstate sst2 → eq\_sstate\_chkr sst1 sst2 ops = true → eq\_sstate sst1 sst2 ops.

Note that we require the stack operations map to be valid in order to guarantee that the operations declared commutative in ops are indeed commutative. In order to reduce the number of false negatives, the simplification rules presented in Sect. 4.2 are very important to rewrite symbolic states into closer syntactic shapes that can be detected by eq\_sstate\_chkr.

Finally, given all the pieces developed above, we can now define the block equivalence checker as follows:

```
Definition evm_eq_block_chkr (opt: optim) (p1 p2: block) (k: nat) : bool :=
match sym_exec p1 k evm_stack_opm with
   match sym_exec p2 k evm_stack_opm with
   | None ⇒ false
   | Some sst2 ⇒ let (sst2', _) := opt sst1 in
                    let (sst1', _) := opt sst2 in
                      eq_sstate_chkr sst1' sst2' evm_stack_opm
   end
end.
```
It symbolically executes p1 and p2, simplifies the resulting symbolic states by applying optimization opt, and finally calls eq\_sstate\_chkr to check if the states are equivalent. Note that it is important to apply the optimization rules to both blocks, as the checker might apply optimization rules that were not applied by the external optimizer. This would lead to equivalent symbolic states with different shapes that will not be detected by the symbolic state equivalence checker.


Table 1. Summary of experiments using GASOL.


The above checker is sound when opt is sound:


### 5 Implementation and Experimental Evaluation

The different components of the tool have been implemented in Coq v8.15.2, together with complete proofs of all the theoretical results (more than 180 proofs in ∼7000 lines of Coq code). The source code, executables and benchmarks can be found at https://github.com/costa-group/forves/tree/stack-only and the artifact at https://doi.org/10.5281/zenodo.7863483. The tool currently includes 15 simplification rules (see App. A in [10]). We have tried our implementation on the outcome of two optimization tools: (1) the standalone GASOL optimizer and, (2) the optimizer integrated within the official Solidity compiler solc. For (1), we have already fully automated the communication among the optimizer and checker and have been able to perform a thorough experimental evaluation. While in (2), the communication is more difficult to automate because the CFG of the original program can change after optimization, i.e., it can make crossblock optimization. Hence, in this case, we have needed human intervention to disable intra-block optimizations and obtain the blocks for the comparison (we plan to automate this usage in the future). For evaluating (2) we have used as benchmarks 1*,* 280 blocks extracted from the smart contracts in the semantic test suite of the solc compiler [6], succeeding to prove equivalence on 1*,* 045 out of them. We have checked that the fails are due to the use of optimization rules not yet implemented by us. As these blocks are obtained from the test suite of the official solc Solidity compiler, optimized using the solc optimizer, the good results on this set suggest the validity can be generalized to other optimizers. Now we describe in detail the experimental evaluation on (1) for which we have used as benchmarks 147*,* 798 blocks belonging to 96 smart contracts (see App. D in [10]).

GASOL allows enabling/disabling the application of simplification rules and choosing an optimization criteria: GAS consumption or bytes SIZE (of the code) [11]; combining these parameters we obtain 4 different sets of pairs-ofblocks to be verified by our tool. From these blocks, we consider only those that were actually optimized by GASOL, i.e., the optimized version is syntactically different from the original one. In all the cases, the average size of blocks is 8 instructions. Table 1 summarizes our results, where each row corresponds to one setting out of the 4 mentioned above: *Column 1* includes the optimization criteria; *Column 2* indicates if rule simplifications were applied by GASOL; *Column 3* indicates how many pairs-of-blocks were checked; *Columns 4-7* report the results of applying 2 versions of the checker, namely CHKR corresponds to the checker that only compares symbolic states and CHKR*<sup>s</sup>* corresponds to the checker that also applies all the implemented rule optimizations iteratively as much as they can be applied (see Sect. 4.2). For each we report the number of instances it proved equivalent and the total runtime in seconds. The experiments have been performed on a machine with an Intel i7-4790 at 3*.*60 GHz and 16GB of RAM.

For sets in which GASOL does not apply simplification rules (marked with ×), both CHKR and CHKR*<sup>s</sup>* succeed to prove equivalence of all blocks. When simplifications are applied (marked with -), CHKR*<sup>s</sup>* succeeds in 99% of the blocks while CHKR ranges from 63% for GAS to 99% for SIZE. This difference is due to the fact that GASOL requires the application of rules to optimize more blocks *wrt.* GAS (∼ 37% of the total) than *wrt.* SIZE (∼ 1%). Moreover, all the blocks that CHKR*<sup>s</sup>* cannot prove equivalent have been optimized by GASOL using rules which are not currently implemented in the checker, so we predict a success rate of 100% when all the rules in App. A in [10] are integrated. Regarding time, CHKR*<sup>s</sup>* is 3–5 times slower than CHKR because of the overhead of applying rule optimizations, but it is still very efficient (all 147*.*798 instances are checked in 50*.*61 seconds). As a final comment, thanks to the checker we found a bug in the parsing component of GASOL, that treated the SGT bytecode as GT. The bug was directly reported to the GASOL developers and is already fixed [19].

## 6 Conclusions, Related and Future Work

Our work provides the first tool able to formally verify the equivalence of jumpfree EVM blocks and has required the development of all components within the verification framework. The implementation is not tied to any specific tool and could be easily integrated within any optimization tool. Ongoing work focuses on handling memory and storage optimizations. This extension needs to support the execution of memory/storage operations at the level of the concrete interpreter, and design an efficient data structure to represent symbolic memory/storage. Full handling of blockchain-specific opcodes is straightforward, it only requires adding the corresponding implementations to the stack operations map evm\_stack\_opm. A more ambitious direction for future work is to handle cross-block optimizations.

There are two approaches to verify program optimizations, (1) verify the correctness of the optimizations and develop a *verified tool*, e.g., this is the case of optimizations within the CompCert certified compiler [24] and a good number of optimizations that have been formally verified in Coq [13,18,27,32,33], (2) or use a *translation validation* approach [20,34–36] in which rather than verifying the tool, each of the compiled/optimized programs are formally checked to be correct using a verified checker.We argue that translation validation [34] is the most appropriate approach for verifying EVM optimizations because: (i) EVM compilers (together with their built-in optimizers) are continuously evolving to adjust to modifications in the rather new blockchain programming languages, (ii) existing EVM optimizers use external components such as SMT solvers to search for the optimized code and verifying an SMT solver would require a daunting effort, (iii) we aim at generality of our tool rather than restricting ourselves to a specific optimizer and, as already explained, the design of our checker has been done having generality and extensibility in mind, so that new optimizations can be easily incorporated. Finally, it is worth mentioning the KEVM framework [21], which in principle could be the basis for verifying optimizations as well. However, we have chosen to develop it in Coq due to its maturity.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## SR-SFLL: Structurally Robust Stripped Functionality Logic Locking

Gourav Takhar(B) and Subhajit Roy

Indian Institute of Technology Kanpur, Kanpur, India {tgourav,subhajit}@cse.iitk.ac.in

Abstract. Logic locking was designed to be a formidable barrier to IP piracy: given a logic design, logic locking modifies the logic design such that the circuit operates correctly only if operated with the "correct" *secret* key. However, strong attacks (like SAT-based attacks) soon exposed the weakness of this defense. *Stripped functionality logic locking* (SFLL) was recently proposed as a strong variant of logic locking. SFLL was designed to be resilient against SAT attacks, which was the bane of conventional logic locking techniques. However, all SFLL-protected designs share certain "circuit patterns" that expose them to new attacks that employ *structural analysis* of the locked circuits.

In this work, we propose a new methodology—*Structurally Robust SFLL* (SR-SFLL)—that uses the power of modern satisfiability and synthesis engines to produce semantically equivalent circuits that are resilient against such structural attacks. On our benchmarks, SR-SFLL was able to defend all circuit instances against both structural and SAT attacks, while all of them were broken when defended using SFLL. Further, we show that designing such defenses is challenging: we design a variant of our proposal, SR-SFLL(0), that is also robust against existing structural attacks but succumbs to a new attack, SyntAk (also proposed in this work). SyntAk uses synthesis technology to compile SR-SFLL(0) locked circuits into semantically equivalent variants that have structural vulnerabilities. SR-SFLL, however, remains resilient to SyntAk.

Keywords: Logic Locking · SFLL · Program Synthesis

## 1 Introduction

Semiconductor design houses often outsource the fabrication of the integrated circuits (IC) to third-party foundries [17]. This allows effective use of the fabrication equipment and facilities at the foundry, while the design houses can concentrate solely on the design. Though this separation of concerns provides attractive cost benefits, it also opens up certain threats: malicious agents at a foundry may now fabricate illegal copies of the ICs that can be sold in the gray market leading to serious loss in revenue for the design house.

Logic locking was proposed as an effective mechanism to combat such intellectual property (IP) threats. Logic locking modifies the original IC in a manner

Fig. 1. SFLL-HD locked circuit (ϕ); <sup>C</sup> is the (unprotected) original circuit. (Color figure online)

that the circuit operates correctly only after it is activated with a *secret* key. This *secret* key is loaded into tamperproof memory by the design house postfabrication. However, soon powerful attacks, especially those involving SAT solver [23,26,34], were invented to thwart this defense. Since then, more powerful defenses were proposed that were resistant to such SAT attacks. One such SAT-resilient attack that has gained a lot of popularity is Stripped Functionality Logic Locking (SFLL) [44].

SFLL operates by using the *secret* key to identify a set of inputs as *protected patterns*—the circuit is forced to produce incorrect results if the input matches any of these protected patterns. The *cube stripping circuit* (see Fig. 1) is responsible for matching the inputs to the protected patterns. An additional *restore circuit* is used to restore the correct functionality for the protected patterns. The circuit does not operate correctly with an incorrect key as the restore circuit, then, identifies a different set of patterns to be restored. Though quite potent against SAT attacks, attackers soon identified certain unique structural patterns in the design of SFLL that could be leveraged to build attacks via *structural analysis* [4,32,40].

In this work, we propose a scheme, *Structurally Robust Stripped Functionality Logic Locking* (SR-SFLL), to defend against such structural analysis. SR-SFLL uses efficient *synthesis* [33] machinery powered by modern SAT solvers to ensure that certain structural security constraints are met that ensures its resilience against the structural attacks.

SR-SFLL operates as follows: (1) identify a "cut" of the original design <sup>C</sup> to break the design into two segments C<sup>1</sup> and C<sup>2</sup> (see Fig. 1), and (2) introduce a carefully synthesized perturbation unit Q between C<sup>1</sup> and C<sup>2</sup> (see Fig. 2b). As the perturbation unit does not have any specific structural signature and is hidden deep within the original design, our scheme is no more vulnerable to

Fig. 2. Transformation of SFLL locked circuit to SR-SFLL locked circuit.

Table 1. Attack resilience of logic locking techniques: ✔ (resp. ✖) represents resilience (resp. vulnerability) to attacks.


attacks by structural analysis. Further, the location of the "cut" is unknown to the attacker and the perturbation unit misses any structural pattern, making it challenging to apply other attacks like removal attack [43].

We argue that designing such a defense scheme is non-trivial: we show a version (SR-SFLL(0)) of SR-SFLL that is also resistant to structural attacks. However, we could design a novel structural attack algorithm, SyntAk, that breaks SR-SFLL(0): in our experiments, SyntAk breaks 71.25% of SR-SFLL(0) locked benchmark instances. Our attack algorithm, SyntAk, is a novel attack that also uses synthesis machinery to compile an existing circuit to a semantically equivalent one that is amenable to structural analysis. However, SR-SFLL is robust against SyntAk.

Table 1 summarizes the resiliency of various logic locking techniques, with the attacks listed for the rows and the defenses in the columns. For a table cell (A, D), we use ✖ to show that attack A breaks defense D (in most cases); the mark ✔ shows that defense D is robust against attack A. The attack and defense techniques marked with a red background are proposed in this paper. As SR-SFLL locked circuits remain semantically equivalent to the SFLL locked circuits, SR-SFLL locked circuit provides the same security against the SATbased [34], removal [43], and AppSAT [30] attacks.

We evaluated SR-SFLL(0), SR-SFLL, and SyntAk on 80 benchmarks from the *ISCAS'85* and *MCNC* benchmark suites with different numbers of key inputs and cube stripping functionalities. Our experiments showed that circuits locked by SR-SFLL are robust to structural attacks—none of the SR-SFLL locked designs could be broken by existing structural attacks (like SFLLUnlock, FALL, and GNNUnlock), or by our SyntAk (also proposed in this work). While the structural attacks failed to recover the structural patterns altogether, SyntAk could not break the SR-SFLL even over two days for circuits that were locked in less than an hour.

SR-SFLL provides asymmetric advantage to the defender over the attacker on multiple counts: the secret key K used to lock the circuit, knowledge of the *secret cut* where FSC is partitioned, and a much harder synthesis problem (on attacks using SyntAk).

We make the following contributions to this work:


#### 2 Background

#### 2.1 Stripped Functionality Logic Locking (SFLL)

Figure 1 shows a stripped functionality logic locked circuit, ϕ. The original circuit (C) takes a set of input bits, X, and produces an output bit, o1. The SFLL locked design, ϕ, consumes input bits, X, and a *secret key* (bits) K to output y.

The core idea of SFLL is to create a *functionality stripped circuit* (FSC) that would produce incorrect output for certain *protected patterns*. The *cube stripping circuit* (S) recognizes the protected patterns, and makes the signal sfo high if any of them is encountered; an XOR gate flips the output of the original circuit (o1) for these protected patterns (i.e. when sfo is high). Hence, o<sup>2</sup> is the correct output for inputs not in the protected patterns, but the complement of it for the protected patterns.

The correct functionality is re-established using the *restore circuit* (R). The restore circuit accepts (secret) key bits K along with the input X to produce the signal sro; if the correct key K is supplied, sro is high if and only if the input is amongst the protected bits. The cube stripping circuit (S) is functionally equivalent to R but uses a hardcoded key value. Hence, the restore unit restores the correct output for the protected inputs (via an XOR of o<sup>2</sup> and sro). Hence, the locked circuit now works correctly if the correct key is applied (i.e. the key K supplied to R matches the hardcoded key in S).

While many possible choices exist for a function that identifies protected patterns of inputs based on a key, the *hamming distance* was found to be an interesting choice [44]. The corresponding variant of SFLL, known as SFLL-HD, identifies an input (X) as a protected pattern if it has a certain hamming distance (h) from the key (K).

#### 2.2 SFLL Attacks

SFLL is robust to all known attacks on (conventional) logic locking [4]. However, subsequently, many structural attacks were proposed that break SFLL. These attacks use one or more of the structural properties exhibited by SFLL implementations [4,40]:


All the following attacks assume that they know the hamming distance (h) used to lock the circuit.

SFLLUnlock. SFLLUnlock [40] uses the first and the second structural properties (see above) to identify a few signals that may be sfo (referred to as *candidate* signals). Next, for each of the candidates, the attack uses the following technique: it uses SAT solver to extract an input such that the candidate signal is 1; if the candidate signal is indeed sfo, this input must be a protected pattern (which has a hamming distance of h with the correct key). Then, it attempts to identify the correct key as follows:


The inferred key is, finally, validated using a working circuit as an oracle.

Functional Analysis Attacks on Logic Locking (FALL). The first step of FALL [32] is to identify a set of candidate signals that may be the output of the cube stripping circuit i.e. sfo. FALL achieves this by exploiting the first and second vulnerabilities of SFLL. To finalize if a signal is sfo (among these candidate signals), FALL derives a set of lemmas that exploit the functional properties of hamming distance. FALL proposes three algorithms based on these lemmas for a specific range of hamming distance values. For example, the AnalyzeUnateness algorithm is only applicable when h = 0, Hamming2D is applicable when h ≤ |K|/4, and SlidingWindow is for larger hamming distances.

GNNUnlock. GNNUnlock [4] automates the removal of cube stripping circuit and restore circuit from the locked circuit to obtain the original circuit. For their analysis, the circuit is transformed into a graph representation where the nodes of the graph represent the gates, and the edges represent the wires. Each node in the graph is associated with a feature vector that contains information that describes its characteristics (in-degree, out-degree, type of gate of the node, whether the node is connected to key input (K), circuit input (X), or circuit output (Y ), type of gates appearing in the neighborhood on the node, etc.).

GNNUnlock uses graph neural networks [45] to train over the nodes of the graph to classify the nodes belonging to the original circuit (C), cube stripping circuit (S), or restore circuit (R). The final step is to remove the nodes classified as part of S and R from the locked circuit obtaining the original circuit C.

#### 2.3 Analysis of the Structural Attacks on SFLL

FALL and SFLLUnlock are dependent on finding the output of cube stripping circuit sfo. Hence, hiding/removing sfo from the locked circuit will ensure robustness against such attacks. GNNUnlock works by removing the cube stripping circuit and restore circuit from the SFLL locked circuit. Hence, removing/hiding a part or whole of the cube stripping circuit from the locked design makes the locked design robust to such attacks.

#### 3 Overview

#### 3.1 Preliminaries

Attack Model. We assume that the attacker has access to a functional circuit (which can be used as an *oracle*) and knows the hamming distance (h).

Graph Representation of Circuit. We work with the circuit in And Inverter Graph (AIG) format. An AIG consists of two inputs AND gates and NOT gates. We construct a graph G from the circuit in AIG format as follows: the gates in the circuit map to nodes in G. A wire (or signal) connecting gates map to edges on the graph. The input and output signals are marked as special nodes.

Fig. 3. An example of circuit <sup>C</sup> and its corresponding graph representation.

If not otherwise specified, we construct the graph of a circuit with the node representing the final output signal as the start node (we assume a single output bit in this paper for simplicity). Figure 3b shows the graph of the circuit in Fig. 3a.

The *distance* between two nodes (say g<sup>1</sup> and y in Fig. 3b) is the (minimum) number of edges in the path(s) from nodes g<sup>1</sup> to y (which is 3, in this case). We define the *depth*, d, of a node n as the maximum distance from the start node (y in Fig. 3b) of the graph to n.

We define a *cut* on graph G as a partitioning of nodes into two disjoint (connected) subsets such that the inputs and outputs belong to distinct partitions. A cut is defined by a *cut-set*, a selection of edges (which are said to *cross* a cut) such that its endpoints are in distinct partitions. We define the *depth of cut* as the maximum amongst the depths of the nodes in the subset containing the start node. In the rest of the paper, we refer to *cut on a circuit* to refer to the cut on the underlying graph. The dotted red lines show a cut at depth 3 in Fig. 3.

Notations. We show combinational circuits with n inputs X and m outputs <sup>Y</sup> as boolean functions <sup>Y</sup> <sup>↔</sup> <sup>C</sup>(X), where <sup>X</sup> is an <sup>n</sup>-bit vector (x1, x2,...,x*n*), and Y is an m-bit vector (y1, y2,...,y*m*). We also use the functional notation, C(X), to denote the output of the circuit C, i.e. the signal Y . We use capital letters to denote bit-vectors and small letters to denote individual bits. We use ⊕ to denote the XOR gate and ◦ to denote function (or circuit) composition.

We use blackboard-bold capital letters for circuits (like C). We use ϕ for complete SFLL locked designs and <sup>ϕ</sup> for complete SR-SFLL(0) or SR-SFLL locked designs. We use subscripts to denote sub-parts of a circuit. For example, if we use C to denote the circuit shown in Fig. 3a, we use C*<sup>a</sup>* and C*<sup>b</sup>* to denote the subcircuits with outputs a (red block) and b (blue block).

#### 3.2 Approach

Recall that the known structural attacks on SFLL exploit the structural characteristics of sfo (see Sect. 2.2). Our defense techniques attempt to synthesize

Fig. 4. Transformation of SFLL locked circuit to SR-SFLL(0) locked circuit. (Color figure online)

a circuit that is semantically equivalent to the original circuit but misses these prominent structural characteristics that make structural attacks feasible.

*SR*-SFLL(0). SR-SFLL(0) identifies a cut on the FSC, through both the original circuit (C) and the cube stripping circuit (S), as shown by the reddotted line in Fig. 1, separating the inputs (X) and the output (o2) of the FSC. The cut-set is marked by the wires {A, V } (as shown in Fig. 4a).

Next, it synthesizes a perturbation unit Q (as shown in Fig. 4b) such that it ensures the following conditions:

– <sup>Q</sup> is semantically equivalent to the removed circuit, i.e. <sup>C</sup><sup>2</sup> <sup>⊕</sup> <sup>S</sup>2;

– No wire in Q is semantically equivalent to the output of S<sup>2</sup> (i.e. sfo).

The first condition ensures *soundness*, that is the functioning of the new circuit is the same as that of the SFLL locked circuit. The second condition ensures *security* as sfo is not present in the new design, and hence, the new design misses all the structural characteristics (see Sect. 2.2) that made SFLL vulnerable to attacks.

SyntAk. SR-SFLL(0) is robust to existing attacks as reverse engineering using the sfo signal is not possible anymore. However, in contrast to existing attacks that attempt to reverse engineer an existing locked circuit, what if we synthesize an alternate, semantically equivalent circuit that has a structure amenable to reverse engineering? Our novel attack employs a similar strategy.

The attack attempts to recover an alternate locked design that exposes the XOR gate Gxor (as shown in Fig. 5b), in which case it becomes easy to identify the sfo signal—it must be one of i or j. SyntAk, thus, side-steps the challenge of reverse engineering the SR-SFLL(0) locked circuit with missing sfo by, instead, resynthesizing *another* locked circuit that has an easily identifiable sfo signal.

This algorithm proceeds as follows:

– cut the FSC of SR-SFLL(0) locked circuit into FSC<sup>1</sup> and FSC2;

Fig. 5. SyntAk on SR-SFLL(0) locked circuit.

– synthesize a new circuit <sup>P</sup>*<sup>i</sup>* <sup>⊕</sup> <sup>P</sup>*<sup>j</sup>* that is semantic equivalent to FSC2.

With sfo clearly identifiable, the existing SFLL attacks now become feasible.

However, this attack may only succeed if the identified cut is such that FSC<sup>2</sup> contains the whole of Q. Hence, the attacker may have to "guess" different cuts (e.g. by progressively increasing the depth of the cut) till the attack succeeds. We say that the attack succeeds if any of the existing attacks are able to break the defense with the identified sfo signal in the resynthesized circuit.

The attack is made easier by the fact that it is not required to select a cut that *exactly* isolates Q. The attack will still succeed even if some portion of C<sup>1</sup> and <sup>S</sup><sup>1</sup> enters FSC<sup>2</sup> (see Fig. 7b). However, the synthesis of <sup>P</sup>*<sup>i</sup>* <sup>⊕</sup> <sup>P</sup>*<sup>j</sup>* becomes increasingly expensive with the increasing size of FSC2.

Further, even with the "right" cut, not all synthesis candidates may yield a signal semantically equivalent to sfo. Hence, the attacker needs to correctly guess the cut as well as the correct synthesis candidate for a successful attack. However, our experiments demonstrate that even with these uncertainties, SyntAk is able to break SR-SFLL(0) in 71.25% of cases.

*SR*-SFLL. The primary reason why SyntAk breaks SR-SFLL(0) is that we are able to synthesize a new circuit <sup>P</sup>*<sup>i</sup>* <sup>⊕</sup>P*<sup>j</sup>* such that there are two XOR gates at the *end of the circuit*. If instead, a new circuit that is synthesized introduces the functionality of S<sup>2</sup> in the *middle* of C, SyntAk would not have been feasible.

Figure 2b shows our improved design for the SR-SFLL locked circuit. Instead of resynthesizing the circuits C<sup>2</sup> and S2, we place a *perturbation unit* (Q) in between C<sup>1</sup> and C2. The perturbation unit is made to operate semantically equivalent to the original SFLL locked circuit. The shaded portion, consisting of S<sup>2</sup> (that produces sfo) and one of the XOR gates, is eliminated from the design.

As the attacker is unaware of the location of the perturbation unit, and as the perturbation unit is *not at the end* of the circuit, the attacker's task gets more challenging: the attacker needs to synthesize a new circuit at the end of the

Fig. 6. Illustration of SFLL, SR-SFLL(0), and SR-SFLL locked circuits along with SyntAk on SR-SFLL(0)

design with an XOR gate (that would provide access to sfo) that re-establishes the functionality of *both* S<sup>2</sup> and C2. On the other hand, the defender only has to synthesize Q to re-establish the functionality of S2.

SR-SFLL is scalable to large circuits. The scalability of SR-SFLL depends on the depth of the cut, as the complexity of our synthesis problem only depends on the circuit that is subjected to (semantically equivalent) rewriting (C<sup>2</sup> and S2 in Fig. 2b). Hence, the size of the base circuit has no impact on the scalability of SR-SFLL.

Example. Figure 6a shows the SFLL locked version of a circuit. The SR-SFLL(0) locked version is shown in Fig. 6b: we can see that the sfo signal (available in the SFLL locked circuit) is not available in the SR-SFLL(0) locked circuit anymore; hence, it is robust to structural attacks. After applying SyntAk (Fig. 6c), SyntAk could recover the sfo signal in the synthesized circuit. Finally, Fig. 6d shows the SR-SFLL locked circuit: it is structurally robust (does not include sfo) and does not succumb to SyntAk.

SR-SFLL provides a stronger asymmetric advantage versus SR-SFLL(0): in SR-SFLL(0), both the attack and the defense need to resynthesize the functionalities of C<sup>2</sup> and S<sup>2</sup> within Q. This prevents the defense from taking deep cuts for FSC<sup>2</sup> to keep the task of synthesizing <sup>Q</sup> feasible. Hence, SR-SFLL(0) only holds the advantage of knowing the secret "cut". On the other hand, SR-SFLL only needs to synthesize the functionalities of S<sup>2</sup> while the attacker would need to resynthesize the functionalities of *both* C<sup>2</sup> and S<sup>2</sup> to recover sfo, making the synthesis task overly challenging. This gives SR-SFLL a dual advantage of the knowledge of the secret cut as well as an asymmetric advantage in the synthesis task.

## 4 *SR*-SFLL

#### 4.1 Problem Statement

Given an SFLL locked circuit ϕ(X, K) (where X is the input to the circuit and K is the key-bits used in the circuit), synthesize a structurally robust locked circuit ϕ-(X, K), such that:

(correctness) The altered circuit is semantically equivalent to the original SFLL locked circuit, that is, <sup>∀</sup>X. <sup>∀</sup>K. ϕ(X, K) = <sup>ϕ</sup>-

$$\forall X. \; \forall K. \; \varphi(X, K) = \hat{\varphi}(X, K) \tag{1}$$

(security) There does not exist any signal, z, in the altered circuit that is equivalent to sfo in ϕ; that is, <sup>∀</sup>z. <sup>∃</sup>X.ϕsfo(X) <sup>=</sup> <sup>ϕ</sup>-

$$\forall z.\ \exists X.\varphi\_{\mathsf{sfo}}(X)\neq\widehat{\varphi}\_z(X)\tag{2}$$

The first condition ensures that functionality is preserved, that is, the synthesized circuit <sup>ϕ</sup> preserves the properties of the input SFLL locked circuit ϕ. The second condition ensures that structural patterns that were available to attackers in SFLL, made available through the sfo signal, are not available in <sup>ϕ</sup>-.

## 4.2 Intuition: *SR*-SFLL

The current synthesis tools do not scale up to the above synthesis task for the whole locked circuit <sup>ϕ</sup>- (unless the locked circuit is very small). Hence, a straightforward implementation of the above equations is not feasible. Instead, we construct the circuit <sup>ϕ</sup>-

 by synthesizing a "small" circuit Q that can be introduced *within* the original circuit <sup>C</sup>, with <sup>Q</sup> ◦ <sup>C</sup><sup>2</sup> preserving the functionality of <sup>S</sup><sup>2</sup> <sup>⊕</sup> <sup>C</sup>2.

We use the following (simplified) description to provide the necessary intuition. Let the functionality of the original circuit (i.e. C) be denoted as f(X), where X are the circuit inputs. Then, let the stripped functionality circuit (i.e. S in Fig. 1) be denoted as g, where g is a boolean function that returns true if and only if it detects the protected input patterns. The functionality of the circuit ϕ*<sup>o</sup>*<sup>2</sup> in Fig. 1 can then be represented as: <sup>ϕ</sup>*<sup>o</sup>*<sup>2</sup> = (<sup>f</sup> <sup>⊕</sup> <sup>g</sup>)(X) = f(X) if <sup>g</sup>(X)=0

$$\varphi\_{o\_2} = (f \oplus g)(X) = \begin{cases} f(X) & \text{if } g(X) = 0 \\ \neg f(X) \text{ if } g(X) = 1 \end{cases} \tag{3}$$

We "cut" (or partition) f into two functions f<sup>1</sup> and f2, such that f = f<sup>1</sup> ◦ f2. Then, we synthesize a perturbation unit (Q), with functional definition h, such that:

Algorithm 1: SR-SFLL

<sup>1</sup> Input : C, λ; <sup>2</sup> <sup>S</sup>, <sup>R</sup> <sup>←</sup> SFLL(C); <sup>3</sup> <sup>C</sup>1, <sup>C</sup>2, <sup>S</sup>1, <sup>S</sup><sup>2</sup> <sup>←</sup> Cut({C, <sup>S</sup>}, λ); <sup>4</sup> <sup>Q</sup> <sup>←</sup> Synthesize(C2, <sup>S</sup>2); <sup>5</sup> if <sup>Q</sup> == <sup>⊥</sup> then <sup>6</sup> return <sup>⊥</sup> <sup>7</sup> end 8 ϕ- <sup>←</sup> Assemble(C1, <sup>C</sup>2, <sup>S</sup>1, <sup>R</sup>, <sup>Q</sup>); <sup>9</sup> return <sup>ϕ</sup>-(X, K);

$$\begin{aligned} \varphi(f\_1 \circ h \circ f\_2)(X) = \varphi\_{o\_2} = (f \oplus g)(X) = \begin{cases} f(X) & \text{if } g(X) = 0 \\ \neg f(X) & \text{if } g(X) = 1 \end{cases} \end{aligned} \tag{4}$$

We use the definition of g (detector for protected patterns) as used in Eq. 3.

Now, we need to ensure the equivalence of (f ⊕ g), i.e. (f<sup>1</sup> ◦ (f<sup>2</sup> ⊕ g)), with that of (f<sup>1</sup> ◦h◦f2). This can be ensured by simply checking for the equivalence of (f2⊕g) with that of (h◦f2). If the selected f<sup>2</sup> is "small", the task of synthesizing h becomes feasible.

For simplicity, we do not assume the splitting of g in the above discussion, but our approach allows that.

## 4.3 Methodology: *SR*-SFLL

Algorithm 1 takes the original circuit (C), and a choice for the cut (λ). It first generates an SFLL locked circuit (Line 2), thereby generating the stripped functionality circuit (S(X)), and the restore unit circuit (R(X, K)).

*Identify Cut.* The circuit is "cut" (according to λ) to partition the original circuit C to segments C<sup>1</sup> and C<sup>2</sup> in Line 3. Similarly, the cube stripping circuit S is also partitioned into S<sup>1</sup> and S<sup>2</sup> in Line 3.


*Synthesize Perturbation Unit Q.* We introduce a perturbation unit Q between C<sup>1</sup> and C<sup>2</sup> such that this modified circuit (see Fig. 2b) satisfies the correctness and security properties (see Sect. 4.1).

Accordingly, we pose the synthesis conditions for Q as follows:

$$\forall A \; \forall V. \; (\mathbb{C}\_2(A) \oplus \mathbb{S}\_2(V)) = \mathbb{C}\_2(\mathbb{Q}(A, V)) \tag{5}$$

<sup>∀</sup><sup>z</sup> <sup>∃</sup><sup>A</sup> <sup>∃</sup>V. <sup>Q</sup>*z*(A, V ) <sup>=</sup> <sup>S</sup>2(<sup>V</sup> ) (6)

Equation 5 imposes the soundness constraint that introducing Q should reinstall the functionality of C2. Equation 6 is the security constraint against structural attacks that ensures that none of the signals (z) in Q is equivalent to sfo.

Our algorithm is not complete, that is, our synthesis conditions are stronger than necessary: the signals A and V are universally quantified over all possibilities, while Q needs to satisfy these conditions only on the possible outputs from C<sup>1</sup> and S<sup>1</sup> respectively. Our formulation trades off completeness for scalability.

If Algorithm 1 fails to synthesize a locked circuit (i.e., the algorithm returns ⊥), the algorithm is run again with a different choice for the cut (λ).

Theorem 1. *If Algorithm 1 succeeds (that is, does not return* ⊥*), the returned locked circuit* <sup>ϕ</sup>-(X, K) *is both correct and secure.*

*Proof.* For the Algorithm 1 to succeed, the Synthesize function must succeed. Synthesize succeeds only if the synthesized Q satisfies Eq. 5 and Eq. 6.


SR-SFLL(0). In case of SR-SFLL(0), we only attempt to synthesize <sup>Q</sup> to replace the circuits C<sup>2</sup> and S<sup>2</sup> (Fig. 4b) instead of synthesizing a new circuit between C<sup>1</sup> and C2. Hence, in this case, the synthesis condition reduces to:

$$\exists \mathbb{Q} \; \forall A \; \forall V. \; \left(\mathbb{C}\_2(A) \oplus \mathbb{S}\_2(V)\right) = \mathbb{Q}(A, V) \tag{7}$$

Circuit Optimization. The circuit may be subjected to optimizations (e.g. using *berkeley-abc* [1]); however, in that case, the security check (Eq. 2) needs to be repeated on the optimized circuit to ensure that the optimizations did not restore the sfo signal. In our experiments, we did perform optimizations on our circuits, and in no case did the security check fail post-optimization.

## 5 SyntAk

Algorithm <sup>2</sup> accepts the locked circuit <sup>ϕ</sup> to return the secret key, K*c*. We use two hyperparameters on the number of attempts on creating cuts (ncuts) and enumerate synthesis candidates (nsynth).

At Line 3, the algorithm uses structural analysis to identify the functionality stripped circuit FSC and the restore unit R. Identifying R is reasonably simple as it is the only part of the locked circuit that uses the key bits K. Hence, one can perform dependency analysis from the key bits to identify R (as also done in prior work [32,40]).

Next, the algorithm enters a loop to guess a suitable cut (Line 5). If a new cut (different than the cuts obtained so far, accumulated in cuts (Line 9)) is found, it attempts to enumerate synthesis candidates. For every synthesis candidate P (Line 12), the algorithm assembles the complete circuit (Line 17) as per Fig. 5b.

## Algorithm 2: SyntAk Input : ϕ, -

```
 ncuts, nsynth;
2 cuts ← ∅;
3 FSC, R ← StructAnalyse(ϕ-
                         );
4 while |cuts| < ncuts do
5 FSC1, FSC2 ← Cut(FSC, cuts);
6 if FSC2 == ⊥ then
7 break;
8 end
9 cuts ← cuts ∪ {FSC1, FSC2};
10 synths ← ∅;
11 while |synths| < nsynths do
12 P ← Synthesize(FSC2,synths);
13 if P == ⊥ then
14 break;
15 end
16 synths ← synths ∪ {P};
17 ϕ ← Assemble(FSC1, P, R);
18 Kc ← AttackWithSFO(ϕ, {i, k});
19 if Kc = ⊥ then
20 return Kc;
21 end
22 end
23 end
24 return ⊥;
```
Fig. 7. SyntAk will not succeed with (a) but may suceed with (b) (cuts shown by blue boxes). (Color figure online)

Then, it launches an existing structural attack (like FALL, SFLLUnlock) with the signals {i, j} as potential candidates for the sfo signal (Line 18). If the existing attacks succeed, the respective key K*<sup>c</sup>* is returned.

The Synthesize procedure synthesizes (i, j) ↔ P(A), such that:

$$\forall A. \; \mathbb{P} \mathbb{S} \mathbb{C}\_2(A) = (\mathbb{P}\_i(A) \oplus \mathbb{P}\_j(A))\tag{8}$$

That is, it searches for a circuit P that is semantically equivalent to FSC such that it exposes the sfo signal. This imposition is due to the fact that a new XOR gate, G*xor* (circled XOR gate in Fig. 5b), is forced on the output of P; this is an attempt to make the new circuit resemble the SFLL circuit in Fig. 1, on which the existing structural attacks are potent.

However, the algorithm is not complete due to multiple factors:


Even with the above areas of incompleteness, SyntAk is quite effective in practice: in our experiments, SyntAk breaks 71.25% of the SR-SFLL(0) locked circuits. Our experiments use an incremental approach to guessing cuts that select cuts by progressively increasing the depth (d) of the cut in each round; all nodes that are at most d distance far from the output are included in FSC2. However, other schemes (including randomized ones) are also possible.

### 6 Evaluation

Benchmarks and Setup. We have used 10 circuits from ISCAS'85 [2] and 10 circuits from MCNC [41]. Benchmarks were used for evaluation in most of the recent work, including SFLL [44], FALL [32], SFLLUnlock [40], and GNNUnlock [4]. Each of these designs was locked under four different configurations to produce SFLL-HD locked versions: 16 and 32 key-bits, each with hamming distances of 2 and 4 for 16 key-bits and 4 and 8 for 32 key-bits. So, overall, we perform our evaluations on a benchmark suite of 80 circuit instances.

ISCAS'85 benchmarks are available in bench and MCNC benchmarks are available in *blif* (Berkeley Logic Interchange Format) formats. We used *Berkeleyabc* to convert *blif* to the *bench* format for use by our framework.

Table 2. Summary of results on all our benchmarks: FL, SU, GU, and SA represent FALL, SFLLUnlock, GNNUnlock, and SyntAk respectively. Under *Robustness*, each cell in the table shows the number of locked circuits successfully broken by the respective attack (smaller is better). Under *Overhd.*, we show the average (AVG) and the standard deviation (STD) of the percentage increase in the number of AND gates in the AIG w.r.t. the SFLL-HD locked design (smaller is better).


We have used the popular SFLL-HD variant of SFLL where the cube stripping function is the hamming distance between the input and the key bits.

We use "cut" at depth <sup>4</sup> for selecting <sup>C</sup><sup>2</sup> both for SR-SFLL(0) and for SR-SFLL. For SyntAk, we progressively increase the depth from one till the attack is successful; we use FALL and SFLLUnlock as the existing attacks on the circuit resynthesized using SyntAk. We use a timeout of 1 h for SR-SFLL(0) and SR-SFLL timeout of 1 h; SyntAk uses a time limit of 2 days.

We built our synthesis engine using *Berkeley-abc* [1] and the *Sketch* [33] synthesizer. Sketch, is primarily designed for program synthesis. It discharges Quantified Boolean formulas (QBF) at the backend to be solved using *Berkeleyabc* or Minisat [12]. We found Sketch to be quite an effective tool for our problem.

The rest of our framework is implemented in Python. We use open-source implementations of *SFLLUnlock* [40], *FALL* [32], and *GNNUnlock* [4] that were made available by the authors of these tools.

We conduct our experiments on a machine with 12-Core Intel(R) Xeon(R) Silver CPU E5-2620 CPU @ 2.00 GHz with 32 GB RAM.

*Research Questions.* Our experiments were designed to answer the following research questions:


Both SR-SFLL(0) and SR-SFLL were able to defend against the existing attacks: SAT, FALL, SFLLUnlock, and GNNUnlock. However, 71.25% of the benchmarks locked using SR-SFLL(0) were broken by SyntAk, while no instance defended by SR-SFLL could be broken by SyntAk.

From the AIG of the circuits, we infer that SR-SFLL uses 0.18% (on average) more AND gates than SFLL-HD locked circuits.

Table 3. Robustness and overhead of SR-SFLL(0) and SR-SFLL with respect to SFLL-HD locked circuits on a subset of our benchmarks. Benchmark names starting with "C" are part of ISCAS, rest are part of MCNC benchmarks. The mark ✖ indicates the attack is successful, ✔ indicates attack is not successful.


## 6.1 Robustness of *SR*-SELL(0) and *SR*-SELL on Existing Attacks

Table 2 provides a summary of the performance of SFLL-HD, SR-SFLL(0), and SR-SFLL against existing structural attacks (FALL, SFLLUnlock, and GNNUnlock) on a representative set of benchmarks: the table shows the number of instances where the respective attack break the defense. While the structural attacks (FALL, SFLLUnlock, and GNNUnlock) are able to break *all* of these instances for SFLL locked circuits, our structurally robust proposals (SR-SFLL(0) and SR-SFLL) are resilient against these attacks.

Table 3 shows the results on a representative subset of our benchmarks: ✖ represents the number of instances where the locked circuit gets broken by the respective attack, and ✔ represents the number of instances where the respective


Table 4. Overhead of SR-SFLL(0) and SR-SFLL vs SFLL. Overhead calculated over SFLL-HD locked circuits shown in Table 3. Benchmark names starting with "C" are part of ISCAS while the rest are part of MCNC benchmarks.

defense successfully defends against the attack. As the primary purpose for the design of SFLL was to be resilient against SAT attacks, it is not surprising that SAT attack times out on all instances of the SFLL locked designs. As SR-SFLL(0) and SR-SFLL are functionally equivalent to SFLL, they too are resilient to SAT attacks.

We also conducted experiments with impractically small key sizes of 5 key bits (with hamming distance 2). None of the structural analysis based attacks (FALL, SFLLUnlock, and GNNUnlock) could break either SR-SFLL(0) or SR-SFLL locked circuits even for these small key sizes.

## 6.2 Robustness of *SR*-SELL(0) and *SR*-SELL on SyntAk

We apply SyntAk on SR-SFLL(0) and SR-SFLL locked circuits to evaluate their robustness on this attack. We "guess" the cut for SyntAk starting with a cut at a depth of 1; if the synthesis phase in SyntAk or the subsequent structural attack (FALL and SFLLUnlock) fails, we reattempt the attack with the depth increased by one. We use a timeout of 2 days for SyntAk.

On our novel SyntAk attack, SR-SFLL(0) succumbs on 71.25% of the cases, but SR-SFLL successfully defends against this attack on all instances. Table 3 shows the performance of some representative benchmarks and Table 2 summarizes the overall results.

## 6.3 Overhead of *SR*-SELL(0) and *SR*-SELL

Table 4 shows the overhead (in terms of the number of AND gates in the AIG) for SR-SFLL(0) and SR-SFLL over that of SFLL on the benchmarks shown in Table 3. Table 2 provides a summary of the overheads over all our benchmarks.

SR-SFLL(0) and has almost no additional overhead (average of about 0.14%) and SR-SFLL also has a very low overhead (average of about 0.18%) over all our benchmarks. This is because while SR-SFLL(0) essentially rewrites a part of the circuit, SR-SFLL is required to insert additional machinery to substitute the functionality of S<sup>2</sup> within C.

### 7 Related Work

Initial logic locking schemes [7,10,11] introduced additional logic and new inputs to the circuit design in order to get the locked circuit. These locked circuits work correctly when the correct secret key is provided to the circuit by the designers post IC fabrication. These logic locking techniques are vulnerable to SAT based attacks [23,26,34]. To overcome the SAT based attacks Anti-SAT [39], and SARLock [42] were proposed. However, Anti-SAT was broken by SPS [43] attack. SARLock was broken by App-SAT [30] attack.

SFLL-HD [44] introduces a stripped functionality approach for logic locking which defend against the above-mentioned attacks. But this is also vulnerable to the FALL [32], SFLLUnlock [40], and GNNUnlock [4].

HOLL [35] exploits the power of program synthesis tools to generate the locked circuit by using a "secret" program (using programmable logic like EEP-ROM) as the key. As the attacker has to synthesize the "secret" program, HOLL becomes challenging to break. However, the requirement of having an embedded programming chip makes the approach both complicated and expensive; further, every invocation of the circuit requires the program in the slow EEPROM memory to be executed. Our approach, instead, builds on the popular SFLL technique and does not need embedded programmable chips.

Program synthesis has seen a significant growth in the recent years. Program synthesis algorithms have powered the synthesis of bit-vector programs [16], heap-manipulations [13,24,27], language parsers [21,31], semantic actions in attribute grammars [18], abstract transformers [19], automata [5], invariants [6,13,20,22], and even differentially private mechanisms [28]. Program synthesis has also been applied to synthesize bug corpora [29] as well as for debugging [8,9], and repairing buggy programs like fixing incorrect heap manipulations [37,38], or synthesize relevant fences and/or atomic sections in concurrent programs under relaxed memory models [36].

There exist boolean functional synthesis tools, like CADET [25], Manthan [14,15], and BFSS [3], that could have been used for our synthesis task. However, none of these tools allow us to control the "structure" of the synthesized formula. Hence, we built our synthesis engine using the *Sketch* synthesizer, which is designed for program synthesis.

## 8 Conclusions

SR-SFLL provides security against structural analysis based attacks such as FALL, SFLLUnlock, and GNNUnlock. The core idea used by SR-SFLL is to use modern synthesis engines to recover structural patterns that can be exploited by existing structural analysis based attacks.

SR-SFLL provides an asymmetric advantage to the defender over the attacker on many counts:


As the perturbation unit resides within the original circuit at a location unknown to the attacker and has no specific structural signature, structural analysis of the SR-SFLL locked circuit becomes difficult. Also, as SR-SFLL locked circuits are functionally equivalent to the respective SFLL locked circuits (see Eqn 1), SR-SFLL retains all the theoretical robustness properties of SFLL.

Acknowledgements. We thank the anonymous reviewers for their valuable inputs. We are thankful to Google for supporting our research. We also thank Intel for their support via the Intel PhD Fellowship Program.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Symbolic Quantum Simulation with Quasimodo**

Meghana Sistla1(B) , Swarat Chaudhuri1, and Thomas Reps<sup>2</sup>

<sup>1</sup> The University of Texas at Austin, Austin, TX, USA mesistla@utexas.edu, swarat@cs.utexas.edu <sup>2</sup> University of Wisconsin-Madison, Madison, WI, USA reps@cs.wisc.edu

**Abstract.** The simulation of quantum circuits on classical computers is an important problem in quantum computing. Such simulation requires representations of distributions over very large sets of basis vectors, and recent work has used symbolic data-structures such as Binary Decision Diagrams (BDDs) for this purpose. In this tool paper, we present Quasimodo, an extensible, open-source Python library for *symbolic simulation* of quantum circuits. Quasimodo is specifically designed for easy extensibility to other backends. Quasimodo allows simulations of quantum circuits, checking properties of the outputs of quantum circuits, and debugging quantum circuits. It also allows the user to choose from among several symbolic data-structures—both unweighted and weighted BDDs, and a recent structure called Context-Free-Language Ordered Binary Decision Diagrams (CFLOBDDs)—and can be easily extended to support other symbolic data-structures.

## **1 Introduction**

Canonical, symbolic representations of Boolean functions—for example, Binary Decision Diagrams (BDDs) [5]—have a long history in automated system design and verification. More recently, such data-structures have found exciting new applications in *quantum simulation*. Quantum computers can theoretically solve certain problems much faster than traditional computers, but current quantum computers are error-prone and access to them is limited. The simulation of quantum algorithms on classical machines allows researchers to experiment with quantum algorithms even without access to reliable hardware.

Symbolic function representations are helpful in quantum simulation because a quantum system's state can be viewed as a distribution over an exponentialsized set of basis-vectors (each representing a "classical" state). Such a state, as well as transformations that quantum algorithms typically apply to them, can often be efficiently represented using a symbolic data-structure. Simulating an algorithm then amounts to performing a sequence of symbolic operations.

Currently, there are a small number of open-source software systems that support such *symbolic quantum simulation* [1,6,8,13,16]. However, the underlying symbolic data-structure can have an enormous effect on simulation performance. In this tool paper, we present Quasimodo, <sup>1</sup> an extensible framework for symbolic quantum simulation. Quasimodo is specifically designed for easy extensibility to other backends to make it possible to experiment with a variety of symbolic data-structures. Quasimodo currently supports (i) BDDs [3,5,7], (ii) a weighted variant of BDDs [9,14], [19, Ch. 5], and (iii) Context-Free-Language Ordered Binary Decision Diagrams CFLOBDDs [11], a recent canonical representation of Boolean functions that has been shown to outperform BDDs in many quantum-simulation tasks. Quasimodo also has a clean interface that formalmethods researchers can use to plug in new symbolic data-structures, which helps to lower the barrier to entry for formal-methods researchers interested in this area.

Users access Quasimodo through a Python interface. They can define a quantum algorithm as a quantum circuit using 18 different kinds of quantum gates, such as Hadamard, CNOT, and Toffoli gates. They can simulate the algorithm using a symbolic data-structure of their own choosing. Users can sample outcomes from the probability distribution computed through simulation, and can query the simulator for the probability of a specific outcome of a quantum computation over a set of quantum bits (qubits). The system also allows for a form of correctness checking: users are allowed to ask for the set of *all* high-probability outcomes and to check that these satisfy a given assertion.

Along with Quasimodo, we are releasing a suite of 7 established quantum algorithms encoded in the input language of Quasimodo. We hope that these algorithms will serve as benchmarks for future research on symbolic simulation and verification of quantum algorithms.

*Organization.* Section 2 gives an overview of quantum simulation. Section 3 gives a user-level overview of Quasimodo. Section 4 provides background on the symbolic data-structures available in Quasimodo. Section 5 describes the programming model of Quasimodo, and presents experimental results. Section 6 concludes.

## **2 Background on Quantum Simulation**

Quantum algorithms on quantum computers can achieve polynomial to exponential speed-ups over classical algorithms on specific problems. However, because so far there are no practical scalable quantum computers, simulation of quantum circuits on classical computers can help in understanding how quantum algorithms work and scale. A simulation of a quantum-circuit computation [1,6,8,11,13,19] uses a representation qs of a quantum state and performs operations on qs that correspond to quantum-circuit operations (gate applications and measurements on qs).

Simulating a quantum circuit can have advantages compared to executing the circuit on a quantum computer. For instance, some quantum algorithms perform

<sup>1</sup> Quasimodo is available at https://github.com/trishullab/Quasimodo.git.

**Fig. 1.** An example of a Quasimodo program that performs a quantum-circuit computation in which the final quantum state is a GHZ state with 4,096 qubits. The program verifies that a measurement of the final quantum state has a 50% chance of returning the all-ones basis-state.

multiple iterations of a particular quantum operator *Op* (e.g., k iterations, where k = 2*<sup>j</sup>* ). A simulation can operate on *Op* itself [19, Ch. 6], using j iterations of repeated squaring to create matrices for *Op*<sup>2</sup>, *Op*<sup>4</sup>,..., *Op*<sup>2</sup>*<sup>j</sup>* = *Op<sup>k</sup>*. In contrast, a physical device must apply *Op* sequentially, and thus performs *Op* k = 2*<sup>j</sup>* times.

Many quantum algorithms require multiple measurements on the final state. After a measurement on a quantum computer, the quantum state collapses to the measured state. Thus, every successive measurement requires re-running the quantum circuit. However, with a simulation, the quantum state can be preserved across measurements, and thus the quantum circuit need only be executed once.

## **3 Quasimodo's Programming and Analysis Interface**

This section presents an overview of Quasimodo from the perspective of a user of the Python API. A user can define a quantum-circuit computation and check the properties of the quantum state at various points in the computation. This section also explains how Quasimodo can be easily extended to include custom representations of the quantum state.

*Example.* Figure 1 shows an example of a quantum-circuit computation written using the Quasimodo API. To use the Quasimodo library, one needs to import the package, as shown in line 1. A user can then create a program that implements a quantum-circuit computation by


Note that queries on the quantum state do not have to be made only at the end of the program; they can also be interspersed throughout the circuitsimulation computation.

Quasimodo allows different backend data-structures to be used for representing quantum states. It comes with BDDs [3,5,7], a weighted variant of BDDs [9,14], [19, Ch. 5], and CFLOBDDs [11]. Quasimodo also provides an interface for new backend data-structures to be incorporated by users. All three of the standard backends provide compressed representations of quantum states and quantum gates, although—as with all variants of decision diagrams—state representations may blow up as a sequence of gate operations are performed.

*Quantum Simulation.* Quantum simulation problems can be implemented using Quasimodo by defining a quantum-circuit computation, and then invoking the API function measure to sample a basis-vector from the final quantum state. For instance, suppose that the final quantum state is - 0.500.5 0.5000.5 0 . Then measure would return a string in the set {000, <sup>010</sup>, <sup>011</sup>, <sup>110</sup>} with probability 0.25 for each of the four strings.

*Verification.* As shown in line 17 of Fig. 1, Quasimodo provides an API call to inquire about the probability of a specific outcome. The function prob takes as its argument a mapping from qubits to {0, 1}, which defines a basis-vector e of interest, and returns the probability that the state would be e if a measurement were carried out at that point. It can also be used to query the probability of a set of outcomes, using a mapping of just a subset S of the qubits, in which case prob returns the sum of all probabilities of obtaining a state that satisfies S. For example, if the quantum state computed by a 3-qubit circuit over q0, q1, q2 is - 0.500.5 0.5000.5 0 , the user can query the probability of states satisfying <sup>q</sup><sup>1</sup> = 1 <sup>∧</sup> <sup>q</sup><sup>2</sup> = 0 by calling prob(<sup>1</sup> : <sup>1</sup>, <sup>2</sup> : <sup>0</sup>), which would return 0.5 (= P r(q<sup>0</sup> <sup>=</sup> <sup>0</sup> <sup>∧</sup> <sup>q</sup><sup>1</sup> = 1 <sup>∧</sup> <sup>q</sup><sup>2</sup> = 0) + P r(q<sup>0</sup> = 1 <sup>∧</sup> <sup>q</sup><sup>1</sup> = 1 <sup>∧</sup> <sup>q</sup><sup>2</sup> = 0) = (0.5)<sup>2</sup> + (0.5)<sup>2</sup>).

Given a relational specification R(x, y) and a quantum circuit y = Q(x), this feature is useful for verifying properties of the form "P r[R(x, Q(x))] > θ," where θ is some desired probability threshold for the user's application.

*Debugging Quantum Circuits.* Quasimodo additionally provides a feature to query the number of outcomes for a given probability. This feature is especially helpful for debugging large quantum circuits—large in-terms of qubit counts when most outcomes have similar probabilities.

Consider the case of a quantum circuit whose final quantum state is intended to be <sup>√</sup> 1 6 - 11101110 . One can check if the final quantum state is the one intended by querying the number of outcomes that have probability <sup>1</sup> <sup>6</sup> . If the returned value is 6, the user can then check if states 011 and 111 have probability 0 by calling prob({<sup>0</sup> : <sup>0</sup>, <sup>1</sup> : <sup>1</sup>, <sup>2</sup> : <sup>1</sup>}) and prob({<sup>0</sup> : <sup>1</sup>, <sup>1</sup> : <sup>1</sup>, <sup>2</sup> : <sup>1</sup>}), respectively. The API function for querying the number of outcomes that have probability <sup>p</sup> <sup>±</sup> is measurement counts(p, ). One can also query the number of outcomes that have probability <sup>≥</sup> <sup>p</sup> by invoking the function tail counts(p).

Quasimodo's API provides the methods get state() and most frequent() to obtain the quantum state (as a pointer to the underlying data-structure) and the outcome with the highest probability, respectively.

#### **3.1 Extending Quasimodo**

The currently supported symbolic data-structures for representing quantum states and quantum gates are written in C++ with bindings for Python. All of the current representations implement an abstract C++ class that exposes (i) QuantumState, which returns a state object that represents a quantum state, (ii) eighteen quantum-gate operations, (iii) an operation for gate composition, (iv) an operation for applying a gate—either a primitive gate or the result of gate composition—to a quantum state, and (v) five query operations. Users can easily extend Quasimodo to add a replacement backend by providing an operation to create a state object, as well as implementations of the seventeen gate operations and three query operations. Currently, the easiest path is to implement the custom representation in C++ as an implementation of the abstract C++ class used by Quasimodo's standard backends.

#### **4 The Internals of Quasimodo**

In this section, we elaborate on the internals of Quasimodo. Specifically, we briefly summarize the BDD, WBDD, and CFLOBDD data-structures that Quasimodo currently supports, and illustrate how Quasimodo performs symbolic simulation using these data-structures. For brevity, we illustrate the way Quasimodo uses these data-structures using the example of the Hadamard gate,

a commonly used quantum gate, defined by the matrix H = <sup>√</sup> 1 2 1 1 1 −1 .

*Binary Decision Diagrams (BDDs).* Quasimodo provides an option to use Binary Decision Diagrams (BDDs) [3,5,7] as the underlying data-structure. A BDD is a data-structure used to efficiently represent a function from Boolean variables to some space of values (Boolean or non-Boolean). The extension of

**Fig. 2.** Three representations of the Hadamard matrix *H* <sup>=</sup> <sup>√</sup><sup>1</sup> 2 - 1 1 1 −1 . (a) A BDD, (b) a CFLOBDD, and (c) a WBDD. The variable ordering is *<sup>x</sup>*<sup>0</sup>*, y*<sup>0</sup>, where *<sup>x</sup>*<sup>0</sup> is the row decision variable and *<sup>y</sup>*<sup>0</sup> is the column decision variable.

BDDs to support a non-Boolean range is called Multi-Terminal BDDs (MTB-DDs) [7] or Algebraic DDs (ADDs) [3]. In this paper, we use "BDD" as a generic term for both BDDs proper and MTBDDs/ADDs. Each node in a BDD corresponds to a specific Boolean variable, and the node's outgoing edges represents a decision based on the variable's value (0 or 1). The leaves of the BDD represent the different outputs of the Boolean function. In the best case, BDDs provide an exponential compression in space compared to the size of the decision-tree representation of the function.<sup>2</sup> Figure 2(a) shows the BDD representation of the Hadamard matrix H with variable ordering x0, y0, where x<sup>0</sup> is the row decision variable and y<sup>0</sup> is the column decision variable.

We enhanced the CUDD library [12] by incorporating complex numbers at the leaf nodes and adding the ability to count paths.

*Context-Free-Language Ordered Binary Decision Diagrams (CFLOBDDs).* CFLOBDDs [11] are a binary decision diagram inspired by BDDs, but the two data-structures are based on different principles. A BDD is an acyclic finite-state machine (modulo ply-skipping), whereas a CFLOBDD is a particular kind of

<sup>2</sup> Technically, the BDD variant that, in the best case, is exponentially smaller than the corresponding decision tree, is called a *quasi-reduced BDD*. Quasi-reduced BDDs are BDDs in which variable ordering is respected, but don't-care nodes are *not* removed, and thus all paths from the root to a leaf have length *n*, where *n* is the number of variables. However, the size of a quasi-reduced BDD is at most a factor of *n*+1 larger than the size of the corresponding (reduced, ordered) BDD [15, Thm. 3.2.3]. Thus, although BDDs can give better-than-exponential compression compared to decision trees, at best, it is linear compression of exponential compression.

*single-entry, multi-exit, non-recursive, hierarchical finite-state machine* (HFSM) [2]. Whereas a BDD can be considered to be a special form of bounded-size, branching, but non-looping program, a CFLOBDD can be considered to be a bounded-size, branching, but non-looping program in which a certain form of *procedure call* is permitted.

CFLOBDDs can provide an exponential compression over BDDs and doubleexponential compression over the decision-tree representation. The additional compression of CFLOBDDs can be roughly attributed to the following reasons:


Such "procedure calls" allow additional sharing of structure beyond what is possible in BDDs: a BDD can share sub-DAGs, whereas a procedure call in a CFLOBDD shares the "middle of a DAG". The CFLOBDD for Hadamard matrix H, shown in Fig. 2(b), illustrates this concept: the fork node (the node with a split) at the top right of Fig. 2(b) is shared twice—once during the red solid path (—) and again during the blue dashed path (−·−). The corresponding elements of the BDD for H are outlined in red and blue in Fig. 2(a). The cell entry H[1][1], which corresponds to the assignment {x<sup>0</sup> → 1, y<sup>0</sup> → 1}, is shown in Fig. 2(a) (BDD) and Fig. 2(b) (CFLOBDD) as the paths highlighted in bold that lead to the value √−<sup>1</sup> 2 .

*Weighted Binary Decision Diagrams (WBDDs).* A Weighted Binary Decision Diagram (WBDD) [9,14], [19, Ch. 5] is similar to a BDD, but each decision (edge) in the diagram is assigned a weight. To evaluate the represented function <sup>f</sup> on a given input <sup>a</sup> (i.e., <sup>a</sup> is an assignment in {0, <sup>1</sup>}*<sup>n</sup>*), the path for <sup>a</sup> is followed; the value of f(a) is the product of the weights encountered along the path. Consider how the WBDD in Fig. 2(c) represents Hadamard matrix H. The variable ordering used is x0, y0, where x<sup>0</sup> is the row decision variable and y<sup>0</sup> is the column decision variable. Consider the assignment a = {x<sup>0</sup> → 1, y<sup>0</sup> → 1}. This assignment corresponds to the path shown in red in Fig. 2(c). The WBDD has a weight <sup>√</sup> 1 <sup>2</sup> at the root, which is common to all paths. The weight corresponding to {x<sup>0</sup> → 1} is 1 and {y<sup>0</sup> → 1} is -1; consequently, a evaluates to √ 1 <sup>2</sup> <sup>∗</sup> <sup>1</sup> ∗ −1 = √−<sup>1</sup> <sup>2</sup> , which is equal to the value in cell H[1][1].

WBDDs have been used in a variety of applications, such as verification and quantum simulation [19]. In the case of quantum simulation, the weights on the edges of a WBDD are complex numbers. Additionally, the weight on the lefthand edge at every decision node is normalized to 1; this invariant ensures that WBDDs provide a canonical representation of Boolean functions. We use the MQT DD package [19] for backend WBDD support. As distributed, MQT DD supports at most 128 qubits; we modified it to support up to 2<sup>31</sup> qubits.

*Symbolic Simulation.* A symbolic simulation of a quantum circuit-computation [11,13,19] uses a symbolic representation qs of a quantum state and performs operations on qs that correspond to quantum-circuit operations.


## **5 Experiments**

In this section, we present some experimental results from using Quasimodo on seven quantum benchmarks, Greenberger-Horne-Zeilinger state creation (GHZ), Bernstein-Vazirani algorithm (BV), Deutsch-Jozsa algorithm (DJ), Simon's algorithm, Grover's algorithm, Shor's algorithm (2n + 3 qubits circuit by [4]), and application of the Quantum Fourier Transform (QFT) to a basis state, for different numbers of qubits. Columns 2–4 of Table 1 show the time taken for running the benchmarks with CFLOBDDs, BDDs (CUDD 3.0.0 [12]), and WBDDs (MQT DD v2.1.0 [17]). For each benchmark and number of qubits, we created 50 random oracles and report the average time taken across the 50 runs. For each run of each benchmark, we performed a measurement at the end of the circuit computation and checked if the measured outcome is correct. We ran all of the experiments on AWS machines: t2.xlarge machines with 4 vCPUs, 16GB memory, and a stack size of 8192KB, running on an Ubuntu OS.

One sees that CFLOBDDs scale better than BDDs and WBDDs for the GHZ, BV, and DJ benchmarks as the number of qubits increases. BDDs perform better than CFLOBDDs and are comparable to WBDDs for Simon's algorithm, whereas WBDDs perform better than BDDs and CFLOBDDs for QFT, Grover's algorithm, and Shor's algorithm.

We noticed that the BDD implementation suffers from precision issues; i.e., if an algorithm with a large number of qubits contains too many Hadamard gates, it can lead to extremely low-probability values for each basis state, which are rounded to 0, which in turn causes leaves that really should hold different miniscule values to be coelesced unsoundly, leading to incorrect results. To overcome this issue, one needs to increase the floating-point precision of the floating-point package used to represent BDD leaf values. We increased the precision at 512 qubits (∗) and again at 2048 qubits (∗∗).

Part of these results are similar to the work reported in [11]; however, that paper did not use Quasimodo. The results of the present paper were obtained using Quasimodo, and we also report results for WBDDs, as well as BDDs and CFLOBDDs (both of which were used in [11]). The numbers given in Table 1 are slightly different from those given in [11] because these quantum circuits exclusively use gate operations that are applied in sequence to the initial quantum state. One can rewrite the quantum circuit to first compute various gate-gate operations (either Kronecker product or matrix-multiplication operations) and then apply the resultant gate to the initial quantum state. For example, consider a part of a circuit defined as follows:

$$\begin{array}{c} \text{for i in range(0, n)}: \\ \text{qc.cx(i, n)} \end{array}$$

Instead of applying CNOT (cx) sequentially for every i, one can construct a gate equivalent to cx op = Π*<sup>n</sup>*−<sup>1</sup> *<sup>i</sup>*=0 cx(i, n) and then apply cx op to quantum state qc as follows:

```
cx_op = qc.create_cx(0, n)
for i in range(1, n):
   tmp = qc.create_cx(i, n)
   cx_op = qc.gate_gate_apply(cx_op, tmp)
qc.apply_gate(cx_op)
```
Quasimodo supports such operations as Kronecker product and matrix product of two gate matrices. [11] uses such computations for both oracle construction and as part of the quantum algorithm. Table 2 shows the results on GHZ, BV, and DJ algorithms using the same circuit and oracle construction used in [11]. However, Simon's algorithm, Grover's algorithm, and Shor's algorithm in [11] use operations outside Quasimodo's computational model, and



the results on these benchmarks differ from [11]. (Note that the results reported in Table 2 do not include the time taken for the construction of the oracle.)

We also compared Quasimodo with three other quantum-simulation tools: MQT DDSim [18], Quimb [8], and Google Tensor Network (GTN) [10]. MQT


**Table 2.** Performance of CFLOBDDs, BDDs, WBDDs using Quasimodo on an alternate circuit implementation of GHZ, BV, DJ algorithms

DDSim is based on WBDDs (using MQT DD), whereas Quimb and GTN are based on tensor networks. Their performance is shown in columns 6–8 of Table 1. Note that MQT DDSim does not support more than 128 qubits.

## **6 Conclusion**

In this paper, we presented Quasimodo, an extensible, open-source framework for quantum simulation using symbolic data-structures. Quasimodo supports CFLOBDDs and both unweighted and weighted BDDs as the underlying datastructures for representing quantum states and for performing quantum-circuit operations. Quasimodo is implemented as a Python library. It provides an API to commonly used quantum gates and quantum operations, and also supports operations for (i) computing the probability of a measurement leading to a given set of states, (ii) obtaining a representation of the set of states that would be observed with a given probability, and (iii) measuring an outcome from a quantum state.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verifying the Verifier: eBPF Range Analysis Verification

Harishankar Vishwanathan(B) , Matan Shachnai, Srinivas Narayana, and Santosh Nagarakatte

Rutgers University, New Brunswick, USA {harishankar.vishwanathan,m.shachnai, srinivas.narayana,santosh.nagarakatte}@rutgers.edu

Abstract. This paper proposes an automated method to check the correctness of range analysis used in the Linux kernel's eBPF verifier. We provide the specification of soundness for range analysis performed by the eBPF verifier. We automatically generate verification conditions that encode the operation of the eBPF verifier directly from the Linux kernel's C source code and check it against our specification. When we discover instances where the eBPF verifier is unsound, we propose a method to generate an eBPF program that demonstrates the mismatch between the abstract and the concrete semantics. Our prototype automatically checks the soundness of 16 versions of the eBPF verifier in the Linux kernel versions ranging from 4.14 to 5.19. In this process, we have discovered new bugs in older versions and proved the soundness of range analysis in the latest version of the Linux kernel.

Keywords: Abstract interpretation · Program verification · Program synthesis · Kernel extensions · eBPF

## 1 Introduction

Extended Berkeley Packet Filter (eBPF) enables the Linux kernel to be extended with user-developed functionality. Historically, eBPF has its roots in a domainspecific language for efficient packet filtering [53], wherein a user can write a description of packets that must be captured by the network stack. In its modern form, eBPF is an in-kernel register-based virtual machine with a custom 64-bit RISC instruction set. eBPF programs can be Just-in-Time (JIT) compiled to the native processor hardware with access to a subset of kernel functions and memory. Programs written in eBPF are widely used in the industry, e.g. for load balancing [10], DDoS mitigation [38], and access control [12].

eBPF Verifier. A user should be able to attach expressive programs within the operating system, while ensuring that they are safe to run. For this purpose, Linux has a built-in eBPF verifier [11] which performs a static analysis of the eBPF program to check safety properties before allowing the program

c The Author(s) 2023

H. Vishwanathan and M. Shachnai—Equal contribution.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 226–251, 2023. https://doi.org/10.1007/978-3-031-37709-9\_12

Fig. 1. Agni's methodology for automatically checking the correctness of the eBPF verifier on each commit. When we find the kernel to be unsound, we generate an eBPF program (i.e., a POC) highlighting the mismatch between abstract and concrete semantics. When we are not able to generate a POC, kernel requires a manual verification.

to be loaded. Given that the verifier is executed in a production kernel, any bug in the verifier creates a huge attack surface for exploits [50,51,62,66] and vulnerabilities [1–9,23–26,35,43–45].

Abstract Interpretation in the Kernel. The verifier, among other things, tracks the values of its variables which it subsequently uses to deem memory accesses to the kernel data structures to be safe. The eBPF static analyzer employs abstract interpretation [33] with multiple abstract domains to track the types, liveness, and values of program variables across all executions. It uses five abstract domains to track the values of variables (i.e., value tracking); four of them are variants of interval domains and the other is a bitwise domain named tnum [55,57,65,71]. The kernel implements abstract operators for each of these domains efficiently. Unlike traditional sound composition of sound operators typically done with abstract interpretation (i.e., modular reduced products) [31], the abstract operators are composed in a non-modular fashion. Specifically, the kernel mixes up the implementation of abstract operators in one domain with reduction operators that combine information across domains (Sect. 3, see Fig. 2(d)). Further, the Linux kernel does not provide any soundness guarantees for these operators. This makes the task of verification challenging because each abstract domain's correctness individually does not necessarily imply the correctness of their composition. To the best of our knowledge, there are no existing sound reduction operators for the abstract domains in the kernel.

This Paper. We propose an automated verification approach to check the soundness of the eBPF verifier for value tracking. To perform soundness checks on every kernel commit, we automatically generate a formula representing the actions of the abstract operator from the verifier's C code rather than manually writing them (Sect. 5). Figure 1 illustrates our workflow. We develop a general correctness specification to determine when a non-modular abstract operator that combines multiple domains is sound (Sect. 4.1). When we checked the validity of the formula generated from recent versions of the verifier with the correctness specification, we found that the verifier is unsound. We discovered that the verifier avoids manifesting these soundness bugs through a shared reduction operator that preconditions the input abstract values (Sect. 4.2). Refining our correctness specification revealed that recent versions of the verifier are indeed sound.

When our refined soundness check fails, we generate a concrete eBPF program that demonstrates the mismatch between abstract values maintained by the verifier and the concrete execution of the eBPF program using program synthesis methods (Sect. 4.3). We call our approach differential synthesis because it generates programs that exercise the divergence between abstract verifier semantics and concrete eBPF semantics in unsound kernels.

Prototype and Results. We have used our prototype, Agni [18,72]., to automatically check the soundness of 16 kernel versions starting from 4.14 to 5.19. In this process, we have discovered 27 previously unknown bugs, which have been subsequently fixed by unrelated patches. For each unsound verifier, we have generated an eBPF program with at most three instructions that shows the mismatch between the semantics in ≈ 97% of the cases. The eBPF programs highlighting the mismatch are smaller than previously known ones. We have also shown that the newer versions of the kernel verifier are sound with respect to value tracking. The source code for our prototype is publicly available [18,72].

## 2 Background on Abstract Interpretation

Abstract interpretation is a form of static analysis that uses *abstract values* from an abstract domain to represent sets of values of program variables. For example, in the interval domain, the abstract value [x, y], with x, y <sup>∈</sup> <sup>Z</sup>, x <sup>≤</sup> y, tracks the set of concrete values {<sup>z</sup> <sup>∈</sup> <sup>Z</sup> <sup>|</sup> <sup>x</sup> <sup>≤</sup> <sup>z</sup> <sup>≤</sup> <sup>y</sup>}. *Abstract operators* concisely represent the impact of the program's operations over its variables in the abstract domain.

Abstract Domains, Concretization, and Abstraction. Formally, concrete values form a partially ordered set (poset) with elements C and ordering relation <sup>C</sup>. The concrete poset is <sup>C</sup> - 2<sup>Z</sup> (i.e., power set of integers) with the ordering relationship <sup>C</sup> being the subset relationship ⊆. An abstract domain is also a poset, with a set of elements <sup>A</sup> and ordering relation <sup>A</sup>. A *concretization function* <sup>γ</sup>: <sup>A</sup><sup>→</sup> <sup>C</sup>, takes an abstract value <sup>a</sup><sup>∈</sup> <sup>A</sup> and produces concrete values <sup>c</sup><sup>∈</sup> <sup>C</sup>. For example, the interval domain uses the abstract poset <sup>A</sup> - <sup>Z</sup> <sup>×</sup> <sup>Z</sup> with the ordering relation [x, y] <sup>A</sup> [a, b] ⇔ (a ≤ x) ∧ (b ≥ y).

An *abstraction function* <sup>α</sup>: <sup>C</sup><sup>→</sup> <sup>A</sup>, takes a concrete value <sup>c</sup> <sup>∈</sup> <sup>C</sup> and produces an abstract value <sup>a</sup> <sup>∈</sup> <sup>A</sup>. For example, in the interval domain, abstracting the concrete value {1, 4, 6} produces α({1, 4, 6}) = [1, 6]. Concretizing [1, 6] yields γ([1, 6]) = {1, 2, 3, 4, 5, 6}. As seen in this example, the abstraction of a concrete value may over-approximate it to maintain concise representation in the abstract domain. A value <sup>a</sup> <sup>∈</sup> <sup>A</sup> is a *sound abstraction* of <sup>c</sup> <sup>∈</sup> <sup>C</sup> if <sup>c</sup> <sup>C</sup> <sup>γ</sup>(a). For a sound abstraction <sup>a</sup> of c, the smaller the concrete value γ(a), the higher the *precision* of the abstraction.

Abstract Operators. Intuitively, abstract operators capture the computation of concrete operators over program variables in the abstract domain. For example, in the range domain, the action of concrete unary negation −<sup>C</sup>(·) may be abstracted by −<sup>A</sup>([x, y]) - [−y, <sup>−</sup>x]. Consider a concrete operation <sup>f</sup>:Z*<sup>n</sup>* <sup>→</sup>Z*<sup>n</sup>* on a single program variable that is an n-bit value. We can lift f point-wise to any set <sup>c</sup> <sup>∈</sup> <sup>C</sup>, where <sup>f</sup>(c) - {f(z) <sup>|</sup> <sup>z</sup> <sup>∈</sup> <sup>c</sup>}. An abstract operator <sup>g</sup>: <sup>A</sup><sup>→</sup> <sup>A</sup> is a *sound abstraction* of <sup>f</sup> if <sup>∀</sup><sup>a</sup> <sup>∈</sup> <sup>A</sup> : <sup>f</sup>(γ(a)) <sup>C</sup> <sup>γ</sup>(g(a)).

Galois Connection. Abstraction and concretization functions (α, γ) are said to form a Galois connection if: (1) α is monotonic (i.e. x <sup>C</sup> y =⇒ α(x) <sup>A</sup> α(y)), (2) γ is monotonic (a <sup>A</sup> b =⇒ γ(a) <sup>C</sup> γ(b)), (3) γ ◦ α is extensive (i.e. <sup>∀</sup><sup>c</sup> <sup>∈</sup> <sup>C</sup> : <sup>c</sup> <sup>C</sup> <sup>γ</sup>(α(c))), and (4) <sup>α</sup>◦<sup>γ</sup> is reductive (i.e. <sup>∀</sup><sup>a</sup> <sup>∈</sup> <sup>A</sup> : <sup>α</sup>(γ(a)) <sup>A</sup> <sup>a</sup>) [56]. γ

The Galois connection is denoted as (C, <sup>C</sup>) −− α → ←−− (A, <sup>A</sup>). The existence of a Galois connection enables reasoning about the soundness and the precision of any abstract operator. It is in principle possible to compute a sound and precise abstraction of any concrete operator f through the composition α◦f ◦γ. However, it is computationally expensive, due to the evaluation of the concretization γ.

Combining Multiple Abstract Domains Through Cartesian Product [31]. Suppose we are given two abstract domains (sets A1, A2) with sound abstraction functions α<sup>A</sup>1, α<sup>A</sup><sup>2</sup> and concretization functions γ<sup>A</sup>1, γ<sup>A</sup>2. The Cartesian product abstract domain uses the set P - <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup>2, and the ordering relationship applied separately to each domain: (a<sup>1</sup> <sup>A</sup><sup>1</sup> b1) ∧ (a<sup>2</sup> <sup>A</sup><sup>2</sup> b2) ⇒ (a1, a2) <sup>P</sup> (b1, b2). The concretization function intersects the results obtained from concretizing each element in its respective abstract domain: γP(a1, a2) <sup>γ</sup><sup>A</sup>1(a1) <sup>∩</sup> <sup>γ</sup><sup>A</sup>2(a2). For a concrete value <sup>c</sup> <sup>∈</sup> <sup>C</sup>, the abstraction functions are applied domain-wise and combined: αP(c) - - α<sup>A</sup>1(c), α<sup>A</sup>2(c) . The Cartesian product domain enjoys a Galois connection (C, <sup>C</sup>) −− <sup>α</sup>→<sup>P</sup> ←γP −− (P, <sup>P</sup>) building on the Galois connections of its component abstract domains.

For example, consider the interval domain (A1, <sup>A</sup><sup>1</sup> defined as above) and the parity domain (A<sup>2</sup> - {⊥, odd, even, } with ordering relationships ⊥ <sup>A</sup><sup>2</sup> odd, even <sup>A</sup><sup>2</sup> ). Suppose at some point the two interpretations produce abstract values [3, 5] and even in the two domains. The concretization of the Cartesian product abstract value ([3, 5], even) produces the set {4}, which is smaller than the concretizations of either abstract value [3, 5] or even in their respective domains. However, since the abstraction functions are applied domainwise, such information cannot be propagated to the abstract values themselves. For example, it is desirable to propagate information from the abstract value even in A<sup>2</sup> to reduce the interval to [4, 4] in A1.

Reduced Products. Intuitively, we wish to make an abstract value in one domain more precise using information available in an abstract value in a different domain. Suppose we are given an abstract value (a1, a2) from the Cartesian product domain. A *reduction operator* [34] attempts to find the smallest abstract value (a 1, a <sup>2</sup>) such that its concretization is the same as that of (a1, a2), i.e. <sup>γ</sup><sup>A</sup>1(a1) <sup>∩</sup> <sup>γ</sup><sup>A</sup>2(a2). Formally, the reduction operator <sup>ρ</sup>: <sup>P</sup><sup>→</sup> <sup>P</sup> is defined as the greatest lower bound of all abstract values whose concretization is larger than that of the given abstract value,

i.e. ρ(a1, a2) - - <sup>P</sup> {(a 1, a <sup>2</sup>) | γP(a1, a2) <sup>C</sup> γP(a 1, a <sup>2</sup>)}. However, this definition is impractical to compute even on finite domains.

In general, more "relaxed" versions of reduction operators may be designed to improve precision with efficient computation. For example, Granger [40] introduces a set of reduction operators ρ1, ρ<sup>2</sup> to reduce each abstract domain in turn, using information from the other, until a fixed point. The operator <sup>ρ</sup>1: <sup>A</sup>1×A<sup>2</sup> <sup>→</sup>A<sup>1</sup> reduces the abstract value in domain <sup>A</sup>1, while <sup>ρ</sup>2: <sup>A</sup>1×A<sup>2</sup> <sup>→</sup>A<sup>2</sup> reduces that in <sup>A</sup>2. The reduction using <sup>ρ</sup><sup>1</sup> is sound if <sup>∀</sup>a<sup>1</sup> <sup>∈</sup> <sup>A</sup>1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup><sup>2</sup> : γP(ρ1(a1, a2), a2) = γP(a1, a2) (preserve concrete values in the intersection) and ρ1(a1, a2) A<sup>1</sup> a<sup>1</sup> (improve precision). Similarly, reduction using ρ<sup>2</sup> is sound if <sup>∀</sup>a<sup>1</sup> <sup>∈</sup> <sup>A</sup>1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup><sup>2</sup> : <sup>γ</sup>P(a1, ρ2(a1, a2)) = <sup>γ</sup>P(a1, a2) and <sup>ρ</sup>2(a1, a2) <sup>A</sup><sup>2</sup> <sup>a</sup>2.

### 3 Abstract Interpretation in the Linux Kernel

The Linux kernel implements abstract interpretation to check the safety of eBPF programs loaded into the kernel. The kernel's algorithms are encoded into a component called the *eBPF verifier,* which is a part of the pre-compiled operating system image. The Linux kernel uses several abstract domains to track the type, liveness, and values of registers and memory locations used by eBPF programs. Among these, the abstract domains used by the kernel to track values are critical since they are used to guard statically against malicious programs that may access kernel memory. In Linux kernel v5.19 (latest as of this writing), these analyses constitute roughly 2100 lines of source code in the eBPF verifier. Implementing such analyses soundly in the kernel is challenging. This part of the verifier has been a source of several high-profile security vulnerabilities [1– 9,23–26,35,43–45] and exploits [50,51,62,66].

The Linux kernel uses five abstract domains for value tracking, including intervals in unsigned 64-bit (u64), unsigned 32-bit (u32), signed 64-bit (s64), signed 32-bit (s32), and tri-state numbers (tnum [61,71]). The kernel does not provide a formal specification of their abstraction or concretization functions, or proofs of soundness of the abstract operators. Below, we illustrate the abstract domains used in the Linux kernel with the unsigned 64-bit interval domain u64 and tristate numbers tnum.

The u64 Domain. The u64 abstract domain tracks an upper and lower bound of a 64-bit register interpreted as an unsigned 64-bit value. The eBPF verifier maintains the abstract u64 value as part of its static state for each register. Figure 2(a) provides a simplified C source code for abstract addition in the u64 domain. The operator takes two abstract values in1 and in2, with the two components of each abstract value denoted by the members u64\_min and u64\_max. The output abstract value is stored in out. Here, U64\_MAX is the largest 64-bit nonnegative integer. The first if condition detects if integer overflows may occur as a result of addition. If there is overflow, the analysis loses all precision, setting the 64-bit bounds of the result to the largest abstract value, [0, U64\_MAX]. If there is no overflow (else clause), out is set to the component-wise sum of the bounds of in1 and in2, similar to unbounded bit-width interval arithmetic [32].

Formally, the abstract domain is A*<sup>u</sup>*<sup>64</sup> - {[x, y] <sup>|</sup> (x, y <sup>∈</sup> <sup>Z</sup><sup>+</sup> <sup>64</sup>) ∧ (x ≤*<sup>u</sup>*<sup>64</sup> <sup>y</sup>)}, where <sup>Z</sup><sup>+</sup> <sup>64</sup> is the set of 64-bit non-negative integers, and ≤*<sup>u</sup>*<sup>64</sup> represents a 64-bit unsigned comparison. The ordering relationship is (x<sup>1</sup> ≥*<sup>u</sup>*<sup>64</sup>

Fig. 2. Excerpts (simplified) from the kernel's implementation of the abstract operators for (a) addition (from the function scalar\_min\_max\_add [14]), and (b) bitwise AND (from scalar\_min\_max\_and [15]). (c) Example of reduced product abstract interpretation where one may use inductive assertions on abstract operators from each domain, along with the soundness of reduction operators, to reason about the correctness of the overall abstraction. The greyed boxes show modular reasoning about components within the boxes. (d) In the Linux kernel, it is challenging to reason modularly about the correctness of abstract operators in each domain independently from their pairwise reductions, since the implementation combines abstraction with reduction. Proving soundness requires one-shot reasoning about all operations together.

x2) ∧ (y<sup>1</sup> ≤*<sup>u</sup>*<sup>64</sup> y2) ⇔ [x1, y1] *<sup>u</sup>*<sup>64</sup> [x2, y2]. The *concretization function* is γ*u*64([x, y]) - {<sup>z</sup> <sup>|</sup> (<sup>z</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> <sup>64</sup>) ∧ (x ≤*<sup>u</sup>*<sup>64</sup> z ≤*<sup>u</sup>*<sup>64</sup> y)}. The *abstraction function* is α*u*64(c) - [min*u*64(c), max*u*64(c)], where c is a member of the powerset of Z<sup>+</sup> 64, and min*u*64(·) and max*u*64(·) compute the minimum and maximum over a finite set c where each element of c is interpreted as a 64-bit unsigned value.

Tristate Numbers (tnums). This abstract domain in the Linux kernel tracks which bits of a variable are known to be 0, known to be 1, or unknown across executions of the program. This domain is similar to bitwise domains [55,57,65]. However, the kernel implements this abstract domain efficiently with a tuple of two unsigned integers (v,m). If m for a particular bit is 1, then the value of that bit is unknown. If m for a particular bit is 0, then value of that bit is equal to v's value for the particular bit. More formally, the abstraction function (α*t*) is written using two other functions defined as follows: α&(C) - & c | c ∈ C ; and α|(C) - | c | c ∈ C . Then, α*t*(C) - - α&(C), α&(C)^α|(C) . The concretization function is written as: γ*t*(P) = γ*t*((P.v, P.m)) - <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> <sup>64</sup> | c & P.<sup>m</sup> = P.<sup>v</sup> [71].

Abstract Operators In The Linux Kernel and Challenges in Proving their Correctness. The Linux kernel implements an abstract operator in each abstract domain for each arithmetic and logic (ALU) instruction and each jump instruction in the eBPF instruction set.<sup>1</sup> The kernel verifier also provides

<sup>1</sup> The ALU instructions include 32 and 64-bit add, sub, mul, div, or, and, lsh, rsh, neg, mod, xor, arsh and the jump instructions include 32 and 64-bit ja, jeq, jgt, jge, jlt, jle, jset, jne, jsgt, jsge, jslt, jsle [13].

functions to propagate information between the abstractions (reductions). However, it does not provide formal underpinnings, e.g. Galois connections. The overall analysis appears to be a Reduced Product abstract interpretation (Sect. 2).

However, the key challenge in proving soundness is that the kernel's operators combine abstraction with reduction. Consider the excerpt in Fig. 2(b) from the implementation of the bitwise AND operation in the u64 abstract domain in the kernel, simplified for clarity. As before, in1 and in2 correspond to the input abstract values, and out to the output abstract value. The members with names tnum.\* denote the components of the abstract tnum. Before the execution of these two lines, the tnum abstract output out.tnum.v has already been computed. In the first line, the lower bound of the u64 result, out.u64\_min is updated using the output abstract value in a different domain (out.tnum.v). Hence, the operation overall is not (merely) an abstract operator in the u64 domain. In the second line, the output abstract state out.u64\_max is updated using the abstract *inputs* in the u64 domain. Reduction operators consume abstract outputs, not inputs. Hence, the operation overall is not a reduction operator either.

These characteristics apply not just to the kernel's bitwise AND operation in the u64 domain. Figure 2(d) shows the structure of several of the kernel's abstract operators, compared against the typical structure of product domains and reduction operators (Fig. 2(c)). The kernel's algorithms combine abstraction with reduction, making it challenging to prove their soundness in a modular fashion. Instead, we must resort to a "one-shot" approach, which attempts to prove the soundness of the abstraction of an operator in one domain and the reductions across domains together. We call the kernel's abstract operators *abstraction/reduction operators* in the rest of this paper.

## 4 Automatic Verification of the Kernel's Algorithms

Given the non-modular structure of the kernel's abstract algorithms (Sect. 3), we cannot use traditional methods to prove their soundness, i.e. by showing the soundness of each domain and the reductions separately. Further, the kernel's algorithms have been evolving continuously with the inclusion of new features to the eBPF run-time environment. We want our methods to be applicable to every new update and commit to the Linux kernel.

Hence, our goal is to perform automatic verification using SMT solvers to prove the soundness of (or find bugs in) the C implementation of Linux's abstraction/reduction operators. We work with the input-output semantics of the kernel's abstraction/reduction operators in first-order logic extracted automatically from the kernel's C source code (details of the extraction deferred to Sect. 5).

*Overview of Our Approach.*We develop generic soundness specifications for the Linux kernel's abstraction/reduction operators, handling arithmetic, logic, and branching instructions (Sect. 4.1). We find that several kernel operators violate these soundness specifications. However, many of these violations flag latent bugs in the kernel's algorithms—bugs which are not necessarily manifested in concrete program executions. We observe that the kernel includes a shared "tail" of computation in all of its abstraction/reduction operators. We use this shared computation to refine our soundness specification by preconditioning the input abstract states (Sect. 4.2). This refinement enables proving the soundness of several of the kernel's operators. However, it still identifies many potential violations of soundness in the kernel. We present a method based on program synthesis to generate loop-free eBPF programs that manifest the bugs identified by the soundness specifications, automatically producing programs that have divergent concrete and abstract semantics. We call this method differential synthesis (Sect. 4.3).

Figure 1 illustrates our entire workflow. Starting from the Linux kernel source code, our techniques produce concrete eBPF programs that manifest soundness bugs in the kernel's algorithms. We have used this procedure to prove the soundness of multiple Linux kernel versions, discovered previously unknown soundness bugs (i.e. no CVEs assigned, to our knowledge), with validated proof-of-concept programs triggering those bugs.

#### 4.1 Soundness Specification for Abstraction/Reduction Operators

We present verification conditions that are *sufficient* to assert the soundness of abstraction/reduction operators in the Linux kernel.

Preliminaries. Encoding Soundness for a Single Abstract Domain in SMT. We describe how to encode the soundness condition for an abstract operator of two operands as an SMT formula, since most eBPF instructions take two operands. Suppose <sup>f</sup>: <sup>C</sup>×C<sup>→</sup> <sup>C</sup> is a binary concrete operation (e.g. 64-bit addition) over the concrete domain (e.g. C - 2<sup>Z</sup><sup>+</sup> <sup>64</sup> ). Suppose the operator <sup>g</sup>: <sup>A</sup>×A<sup>→</sup> <sup>A</sup> abstracts <sup>f</sup>. Operator <sup>g</sup> is sound (Sect. 2) if <sup>∀</sup>a1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup> : <sup>f</sup>(γ(a1), γ(a2)) <sup>C</sup> γ(g(a1, a2)).

We can check soundness with an SMT query as follows. Suppose we have SMT variables to denote a bitvector <sup>x</sup> <sup>∈</sup> <sup>C</sup> and an abstract value <sup>a</sup> <sup>∈</sup> <sup>A</sup>. We can use the concretization function γ to represent the fact that x is included in the concretization of a. For example, for the u64 domain, we may use the formula mem*<sup>u</sup>*64(x, a) -(a.min ≤*<sup>u</sup>*<sup>64</sup> x) ∧ (x ≤*<sup>u</sup>*<sup>64</sup> a.max) to assert that x ∈ γ(a).

The input-output relationship of abstract operator g is available as a firstorder logic formula extracted from the kernel source code (Sect. 5). We represent the resulting formula as a*<sup>o</sup>* = abs*g*(a*<sup>i</sup>* 1, a*<sup>i</sup>* <sup>2</sup>), where a*<sup>i</sup>* <sup>1</sup> and a*<sup>i</sup>* <sup>2</sup> are input abstract values and a*<sup>o</sup>* is the output abstract value.

The concrete semantics of the eBPF instruction set determines the inputoutput relationship of the concrete operation f. For example, the bpf\_add64 instruction performs binary addition (with possibility of overflow) of two 64 bit registers, denoted by +64. The action of this instruction is encoded through the formula x*<sup>o</sup>* = conc*<sup>f</sup>* (x*<sup>i</sup>* 1, x*<sup>i</sup>* <sup>2</sup>); for bpf\_add64, conc*<sup>f</sup>* (x*<sup>i</sup>* 1, x*<sup>i</sup>* <sup>2</sup>) - (x*<sup>i</sup>* <sup>1</sup> +<sup>64</sup> x*<sup>i</sup>* 2).

The concrete ordering relationship <sup>C</sup> is just the subset operation ⊆ between two sets. For two sets S1, S2, we can encode the relationship S<sup>1</sup> ⊆ S<sup>2</sup> by asserting that ∀x : x ∈ S<sup>1</sup> ⇒ x ∈ S2. Putting all this together, we can check the soundness of a single abstract operator abs*g*, by using an SMT solver to check the validity of the formula (i.e., by checking if the negation is unsatisfiable).

$$\begin{aligned} \forall x\_1^i, \quad x\_2^i \in \mathbb{C}, \quad a\_1^i, a\_2^i \in \mathbb{A}:\\ x^o = conc\_f(x\_1^i, x\_2^i) \land a^o = abs\_g(a\_1^i, a\_2^i) \Rightarrow mem\_{\mathbb{A}}(x^o, a^o) \end{aligned} \tag{1}$$

Generalizing Soundness To Abstraction/Reduction Operators Spanning Multiple Abstract Domains. For the abstraction/reduction operators in Linux (Sect. 3), we can no longer assert soundness for an abstract domain purely using abstract values from that domain. We show how to extend the reasoning to two abstract domains. Let us denote the two abstract domains by A<sup>1</sup> and A2. An eBPF instruction has two inputs (x*<sup>i</sup>* <sup>1</sup>, x*<sup>i</sup>* 2) and each input has the corresponding abstract value for each abstract domain. Suppose a*<sup>i</sup>* <sup>11</sup> and a*<sup>i</sup>* <sup>12</sup> correspond to abstract values for the first input from domains A<sup>1</sup> and A2, respectively (similarly, a*i* <sup>21</sup> and a*<sup>i</sup>* <sup>22</sup> for the second input). Further, the concrete input x*<sup>i</sup>* must be in the intersection of the concretizations of all its abstract values. Hence, the formula mem<sup>A</sup><sup>1</sup> (x*<sup>i</sup>* 1, a*<sup>i</sup>* <sup>11</sup>)∧mem<sup>A</sup><sup>2</sup> (x*<sup>i</sup>* 1, a*<sup>i</sup>* <sup>12</sup>)∧mem<sup>A</sup><sup>1</sup> (x*<sup>i</sup>* 2, a*<sup>i</sup>* <sup>21</sup>)∧mem<sup>A</sup><sup>2</sup> (x*<sup>i</sup>* 2, a*<sup>i</sup>* <sup>22</sup>) must hold.

We denote the kernel's abstraction/reduction operation, extracted from C source code, as {a*<sup>o</sup>* 1, a*<sup>o</sup>* <sup>2</sup>} <sup>=</sup> abs*g*(a*<sup>i</sup>* 11, a*<sup>i</sup>* 12, a*<sup>i</sup>* 21, a*<sup>i</sup>* 22). Note that the kernel's operation outputs a list of abstract values corresponding to each abstract domain (unlike Eq. 1). The concrete semantics dictates that x*<sup>o</sup>* = conc*<sup>f</sup>* (x*<sup>i</sup>* 1, x*<sup>i</sup>* 2).

To establish the soundness of the abstraction/reduction operator, we ensure that the concrete output is included in the concretizations of the abstract outputs in each domain, i.e., mem<sup>A</sup><sup>1</sup> (x*o*, a*<sup>o</sup>* <sup>1</sup>) <sup>∧</sup> mem<sup>A</sup><sup>2</sup> (x*o*, a*<sup>o</sup>* <sup>2</sup>). Putting it all together, we check the validity of the following SMT formula:

$$\begin{aligned} \forall x\_1^i, \quad x\_2^i \in \mathbb{C}, \quad a\_{11}^i, \quad a\_{21}^i \in \mathbb{A}\_1, \ a\_{12}^i, \ a\_{22}^i \in \mathbb{A}\_2: \\\\mem\_{\mathbb{A}\_1}(x\_1^i, a\_{11}^i) \land mem\_{\mathbb{A}\_2}(x\_1^i, a\_{12}^i) \land mem\_{\mathbb{A}\_1}(x\_2^i, a\_{21}^i) \land mem\_{\mathbb{A}\_2}(x\_2^i, a\_{22}^i) \land \\\ x^o = conc\_f(x\_1^i, x\_2^i) \land \{a\_1^o, a\_2^o\} = abs\_g(a\_1^i, a\_1^i, a\_{12}^i, a\_{21}^i, a\_{22}^i) \\\ \Rightarrow (mem\_{\mathbb{A}\_1}(x^o, a\_1^o) \land mem\_{\mathbb{A}\_2}(x^o, a\_2^o)) \end{aligned} \tag{2}$$

The kernel uses five abstract domains (Sect. 3). Extending from two domains to all five domains is straightforward. It involves the addition of membership queries for the inputs and the corresponding abstract values (i.e., mem predicate above). The encoding of each of the kernel's abstraction/reduction operators returns a list containing five abstract outputs (one for each domain). Finally, we check that the concrete output is included in the concretization of each abstract output.

Encoding Arithmetic and Logic (ALU) Instructions. Using the formulation above, we have encoded soundness specifications of abstraction/reduction operators for 16 eBPF ALU instructions, which include 32 and 64-bit add, sub, div, or, and, lsh, rsh, neg, mod, xor, arsh. Notably, we exclude the multiplication instruction mul, whose SMT formula involves a bitvector multiplication operation and a large unrolled loop, making it intractable in the bitvector theory.

Encoding Branch Instructions. We also encoded soundness specifications for conditional and unconditional branches (jeq, jlt, etc.) on both 64 and 32-bit register operands. These amount to 20 instructions, for a total of 36 instructions captured by our encodings. While the soundness of abstracting ALU instructions follows the general structure of Eq. 2, writing down the soundness conditions for branches is more involved. Branches do not concretely modify their input registers. However, the kernel learns new information in the abstract domains using the branch outcome (true vs. false). For example, in the u64 domain, consider two abstract registers [1, 5], [3, 3]. Jumping upon an = (equals) comparison shows that the first register can also be set to [3, 3] in the true case. Indeed, each conditional jump instruction produces *four* abstract outputs (rather than the usual *one* output for ALU instructions), corresponding to updated abstract values for two registers across two branch outcomes.

We illustrate the encoding of the correctness condition for the jump instruction for a single abstract domain. Given two concrete operands x*<sup>i</sup>* <sup>1</sup> and x*<sup>i</sup>* <sup>2</sup>, the concrete interpretation for the jump instruction returns whether the condition is true or false. When x*<sup>o</sup>* = conc*<sup>f</sup>* (x*<sup>i</sup>* 1, x*<sup>i</sup>* <sup>2</sup>), x*<sup>o</sup>* will be either true or false. The kernel's abstraction/reduction operator generates four output abstract values, a*o* 1*t*, a*<sup>o</sup>* <sup>1</sup>*<sup>f</sup>* , a*<sup>o</sup>* 2*t*, a*<sup>o</sup>* <sup>2</sup>*<sup>f</sup>* . There are two abstract outputs corresponding to each input. They reflect the updated abstract value for the true case (e.g., a*<sup>o</sup>* <sup>1</sup>*<sup>t</sup>* is the updated abstract value of the first input when the branch condition is true), and similarly for the false case. We represent the kernel's abstraction/reduction operator for branch instructions by the formula {a*<sup>o</sup>* 1*t*, a*<sup>o</sup>* <sup>1</sup>*<sup>f</sup>* , a*<sup>o</sup>* 2*t*, a*<sup>o</sup>* <sup>2</sup>*<sup>f</sup>* } <sup>=</sup> abs*g*(a*<sup>i</sup>* 1, a*<sup>i</sup>* 2).

Our correctness condition for jumps requires that the inputs are present in the concretizations of the corresponding abstract value in both the true and false branch outcomes. The formula below specifies this correctness condition.

$$\begin{aligned} \forall x\_1^i, \quad x\_2^i \in \mathbb{C}, \quad a\_1^i, \quad a\_2^i \in \mathbb{A}: &mem\_{\mathbb{A}}(x\_1^i, a\_1^i) \wedge mem\_{\mathbb{A}}(x\_2^i, a\_2^i) \wedge \\\ x^o = com\_f(x\_1^i, x\_2^i) \wedge \{a\_{1t}^o, a\_{1f}^o, a\_{2t}^o, a\_{2f}^o\} = abs\_g(a\_1^i, a\_2^i) \Rightarrow \\\ ((x^o \Rightarrow (mem\_{\mathbb{A}}(x\_1^i, a\_{1t}^o) \wedge mem\_{\mathbb{A}}(x\_2^i, a\_{2t}^o))) \wedge \\\ (\neg x^o \Rightarrow (mem\_{\mathbb{A}}(x\_1^i, a\_{1f}^o) \wedge mem\_{\mathbb{A}}(x\_2^i, a\_{2f}^o))) \} \end{aligned} \tag{3}$$

The above correctness condition can be extended to multiple domains in a manner similar to Eq. 2. The kernel's implementation of the abstraction/reduction operator for a single jump instruction produces 20 output abstract values (2 inputs × 2 branch outcomes × 5 domains).

#### 4.2 Refining Soundness Specification with Input Preconditioning

When we checked the soundness of the kernel's verifier using the soundness specifications in Sect. 4.1, we observed that many of the abstract operators are not sound. However, it is unclear whether these violations are latent unsound behaviors, or behaviors that could actually manifest with concrete eBPF programs. Specifically, the precondition in Eq. 2 is too general, including any combination of abstract values (across domains) as long as the intersection of their concretizations is non-empty. Indeed, the abstract operators in the Linux kernel are unsound if each instruction may start from any arbitrary abstract value across domains. However, these combinations of abstract values may never be encountered in any eBPF program. Our goal is to refine the soundness specifications from Sect. 4.1 to minimize reporting latent (but unmanifested) bugs.

Shared Suffix of Abstraction/Reduction Operator. Upon carefully analyzing the kernel's abstraction/reduction operators, we observed that the kernel performs certain common computations—a *shared suffix* of abstraction/reduction operations—right before producing each abstract output (Fig. 3(a)). As a concrete example, in kernel version 5.19, the function reg\_bounds\_sync is called at the end of each ALU operation [49], updating the signed domains using the unsigned domains, the u64 bounds from u32 bounds and tnums, besides other reductions [48].

Our key insight is that this shared suffix of abstraction/reduction has the effect of preconditioning the initial abstract values for any subsequent instruction, narrowing down the set of possible abstract values that a subsequent instruction may encounter as input. Further, all eBPF programs start executing from abstract values where each register in every domain is either (any concrete value in the domain) or its concretization is a singleton (precisely known concrete value). We observe and show using an SMT solver that the shared suffix computation does not modify initial values.

Refined Soundness Specification by Preconditioning Input Abstract Values. We can leverage shared suffix operations to refine our soundness specification as fol-

Fig. 3. (a) The structure of each abstraction/reduction operator in the kernel can be conceptualized as having a prefix that depends on the specific operator, generating an intermediate output, and a suffix that is shared across all the operators, resulting in the final abstract output. (b) We use a refined soundness specification that preconditions input abstract values *a* using the shared suffix *sro*(*.*) of the reduction operators used in the Linux kernel.

lows. First, let sro(a) denote the abstract outputs of computing the shared suffix of the abstraction/reduction over the abstract inputs <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> ···× <sup>A</sup>5. The SMT formula encoding sro(a) is extracted using our C to SMT encoder (Sect. 5). The main change from the specifications in Sect. 4.1 is that the shared suffix preconditions the input values to any abstract operator. Hence, for example, the soundness specification for two abstract domains from Eq. 2 is updated to use an input abstract value sro(a) as shown below:

$$\begin{aligned} \forall x\_1^i, \quad x\_2^i \in \mathbb{C}, \quad a\_1^i, \quad a\_{21}^i \in \mathbb{A}\_1, \ a\_{12}^i, \ a\_{22}^i \in \mathbb{A}\_2: \\ (b\_{11}^i, b\_{12}^i) = src(a\_{11}^i, a\_{12}^i) \land (b\_{21}^i, b\_{22}^i) = src(a\_{21}^i, a\_{22}^i) \land \\mem\_{\mathbb{A}\_1}(x\_1^i, b\_{11}^i) \land mem\_{\mathbb{A}\_2}(x\_1^i, b\_{12}^i) \land mem\_{\mathbb{A}\_1}(x\_2^i, b\_{21}^i) \land mem\_{\mathbb{A}\_2}(x\_2^i, b\_{22}^i) \land \\ x^o = conc\_f(x\_1^i, x\_2^i) \land \{a\_1^o, a\_2^o\} = abs\_g(b\_{11}^i, b\_{12}^i, b\_{21}^i, b\_{22}^i) \\ \Rightarrow (mem\_{\mathbb{A}\_1}(x^o, a\_1^o) \land mem\_{\mathbb{A}\_2}(x^o, a\_2^o)) \end{aligned} \tag{4}$$

It is straightforward to generalize to multiple domains. Refinement eliminated most of the latent violations reported from Sect. 4.1. We found that the latest kernel versions are sound with respect to value tracking.

#### 4.3 Automatically Producing Programs Exercising Soundness Bugs

Even after refining the soundness specifications (Sect. 4.2), we still find a few violations of soundness. It is challenging to determine whether these violations are "real" (manifested in actual eBPF programs) or latent, since input abstract values preconditioned by sro still overapproximate the abstract values that may occur when analyzing actual eBPF programs (Fig. 3(b), Sect. 4.2).

We aim to automatically generate eBPF programs that manifest soundness bugs (uncovered by the techniques in Sect. 4.2) in an actual kernel verifier execution. Our problem is a form of *differential synthesis*: generating programs whose semantics diverge between the concrete execution and the abstract analysis. We propose a sound but incomplete approach to generate eBPF programs that demonstrate soundness violations. We enumerate loop-free programs up to a bounded length, using an SMT solver to identify concrete and abstract operands that manifest soundness violations.

Our approach is a combination of well-known existing techniques from enumerative [20,52,63] and deductive program synthesis [19,41,58,67]. However, unlike typical program synthesis problems which have a ∀∃ formula structure (e.g. meet a specification on all inputs), our problem has a much more tractable ∃ structure, i.e. finding one concrete input and program to trigger a soundness violation. In this sense, it is more akin to property-directed reachability algorithms used in model checking [22,27].

Preliminaries. The eBPF run-time starts executing eBPF programs with all live registers holding values that are either precisely known at compile time (e.g. offsets into valid memory regions) or completely unknown (e.g. contents of packet memory). For an abstract value <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> ···× <sup>A</sup>5, we say that init(a) holds if <sup>a</sup> is either singleton (e.g. <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> <sup>64</sup> : [x, x] in u64) or in each domain <sup>A</sup>*i*. We refer to such abstract values as *initial abstract values*. It is straightforward to write down an SMT formula for init(a) for the kernel's domains. We say an abstract value <sup>b</sup> <sup>∈</sup> <sup>A</sup><sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup> ···× <sup>A</sup><sup>5</sup> is *reachable* if there exists a sequence of eBPF instructions for which the abstract analysis can produce b for some register starting from input registers whose abstract values all satisfy init(·).

Overview. Given an abstract operator that violates the soundness specification in Sect. 4.2, our algorithm finds an eBPF instruction sequence that shows that the violating input abstract values are reachable. For a bounded program length k, we enumerate all sequences of eBPF concrete operators (i.e. arithmetic, logic, and branching instructions) of length <sup>k</sup> <sup>−</sup> <sup>1</sup>, with the <sup>k</sup>*th* instruction being the violating concrete operator. This enumeration produces the "skeleton" of the program, filling out the opcodes, but leaving the operands as well as the data and control flow undetermined. For each skeleton, we discharge an SMT query that identifies the concrete and abstract operands for k instructions with well-formed data and control flow. The first instruction consumes eBPF initial abstract values. Starting from k = 1, if we cannot find an eBPF program of length k that manifests the violation, we increment k and try again until a timeout.

Single Instruction Programs (k = 1). As the base case, we check whether initial abstract values along with suitable concrete values may already violate soundness (Sect. 4.2). For example, suppose our enumeration generated the 1 instruction program v = bpf\_or(t, u). For simplicity, below we work with just one abstract domain. Building on Eq. (1), we discharge the SMT formula:

$$t, u \in \mathbb{C}, \ a\_t, a\_u \in \mathbb{A}:$$

$$init(a\_t) \land init(a\_u) \land mem\_{\mathbb{A}}(t, a\_t) \land mem\_{\mathbb{A}}(u, a\_u) \land$$

$$v = conc\_{or}(t, u) \land a\_v = abs\_{or}(a\_t, a\_u) \land \neg(mem\_{\mathbb{A}}(v, a\_v))\tag{5}$$

If the formula is satisfiable, the model provides the concrete operands t, u, with the result that bpf\_or(t, u) is an executable eBPF program manifesting the soundness violation. However, an unsound operator may fail to produce a model since the necessary abstract operands lie outside the initial abstract values.

Straight-line Programs, Length k > 1. Larger the length of the program k, larger the set of reachable input abstract values available to manifest a soundness violation at the <sup>k</sup>*th* instruction. We exhaustively enumerate all possible (<sup>k</sup> <sup>−</sup> 1)-long instruction sequences. To enable well-formed data flow between the k instructions, the inputs for each instruction are sourced either from the outputs of prior instructions or initial abstract values.

For example, consider a two-instruction program (k = 2) generated by the enumerator: r = bpf\_and(p,q); v = bpf\_or(t,u), We are looking for soundness violation in bpf\_or. The variables p, q, r, t, u, v are concrete values, with corresponding abstract values a*p*, a*q*, ··· , a*v*. The abstract inputs of the first instruction bpf\_and are initial abstract values. The abstract inputs of the last instruction may be drawn from either a*p*, a*q*, a*<sup>r</sup>* or the initial abstract values. We use the formula assign(x, {y1, y2, ···}) to denote that x is mapped to one of the variables y1, y2, ··· in both the concrete and abstract domains. We can write down assign(x, {y1, y2, ···}) - (x = y<sup>1</sup> ∧ a*<sup>x</sup>* = a*y*<sup>1</sup> ) ∨ (x = y<sup>2</sup> ∧ a*<sup>x</sup>* = a*y*<sup>2</sup> ) ∨··· . We discharge the following SMT formula to a solver:

$$\begin{aligned} p, q, r, t, u, v &\in \mathbb{C}, \quad a\_p, a\_q, a\_r, a\_t, a\_u, a\_v \in \mathbb{A}: \\ init(a\_p) \land init(a\_q) \land mem\_\mathbb{A}(p, a\_p) \land mem\_\mathbb{A}(q, a\_q) &\land \\ r = conc\_\text{and}(p, q) \land a\_r = abs\_{\text{and}}(a\_p, a\_q) \land mem\_\mathbb{A}(r, a\_r) &\land \\ (init(a\_t) \lor assign(t, \{p, q, r\})) \land (init(a\_u) \lor assign(u, \{p, q, r\})) &\land \\ & mem\_\mathbb{A}(t, a\_t) \land mem\_\mathbb{A}(u, a\_u) &\land \\ v = conc\_\text{or}(t, u) \land a\_v = abs\_{\text{or}}(a\_t, a\_u) &\land (mem\_\text{A}(v, a\_v)) &\land \end{aligned} \tag{6}$$

A model for the formula produces the concrete and abstract operands for the two instructions, leading to an executable bug-manifesting program. This approach is extensible to more instructions and more abstract domains.

Loop-free Programs. Incorporating branch instructions significantly broadens the set of input abstract values available to the k*th* instruction, improving the likelihood of finding a bug-manifesting program at a given length. We turn each branch into a single-instruction ite whose outputs are available for subsequent instructions. More concretely, (i) any of the 1 ··· k − 1 instructions may be jump instructions; (ii) the jump target of a branch instruction in the i *th* slot for both outcomes (i.e. true or false) points to the i + 1*th* slot, and (iii) the abstract outputs of the branch (e.g. from Eq. (3)) may be used as abstract inputs for subsequent instructions, similar to arithmetic and logic instructions.

As an example, suppose our enumerator produces r = bpf\_jump\_gt64(p,q,0); v = bpf\_or(t,u). Here r is a concrete value which is either true or f alse. We use 0 as the jump target, always pointing branches to the next instruction. There are four abstract outputs from the jump: a*pt*, a*qt* for the true branch and a*pf* , a*qf* for the false branch (see Sect. 4.1). For convenience, we set the abstract value a*<sup>o</sup> <sup>p</sup>* (resp. a*o <sup>q</sup>*) to either a*pt* or a*pf* (resp. a*qt* or a*qf* ) based on the branch outcome; and also assert that the corresponding final concrete values p*<sup>o</sup>* = p and q*<sup>o</sup>* = q. Building on Eq. (3), we ask the SMT solver for a model of the formula:

$$p, q, t, u, v \in \mathbb{C}, \quad r \in \{true, false\}, \quad a\_p, a\_q, a\_t, a\_u, a\_v \in \mathbb{A}:$$

$$init(a\_p) \land init(a\_q) \land mem\_h(p, a\_p) \land mem\_h(q, a\_q) \land$$

$$r = conc\_{jump\\_get4}(p, q) \land \{a\_{pt}, a\_{pt}, a\_{qt}, a\_{qt}\} = abs\_{jump\\_get4}(a\_p, a\_q) \land$$

$$(r \Rightarrow (mem\_h(p, a\_{pt}) \land mem\_h(q, a\_{qt}) \land a\_p^o = a\_{pt} \land a\_q^o = a\_{qt})) \land$$

$$(\neg r \Rightarrow (mem\_h(p, a\_{pt}) \land mem\_h(q, a\_{qt}) \land a\_p^o = a\_{pt} \land a\_q^o = a\_{qt})) \land$$

$$(init(a\_t) \lor assign(t, \{p^o, q^o\})) \land (init(a\_u) \lor assign(u, \{p^o, q^o\})) \land$$

$$mem\_h(t, a\_t) \land mem\_h(u, a\_u) \land$$

$$v = conc\_{or}(t, u) \land a\_v = abs\_{or}(a\_t, a\_u) \land \neg(mem\_h(v, a\_v))\tag{7}$$

Validation of Manifested Soundness Violations. The programs generated by our approach for bugs with known CVEs were similar to the proof-of-concept implementations found in these CVEs. For previously unknown bugs, we logged the kernel verifier's state as it analyzes eBPF programs and also executed the eBPF program with the concrete operands produced by the SMT solver. We compared the parameters in the SMT solver's model and those from the kernel verifier and run-time result. This process entailed manually compiling and booting into each kernel version that we check, and running the generated programs. For the manifested bugs, we found exact agreement between the SMT model and the observed behaviors in all cases we checked.

## 5 C to Logic for Kernel's Abstract Operators

To prove the soundness of the kernel's abstract operators, we first have to extract the input-output semantics of the operators from the kernel's implementation in C into first-order logic. It is tedious and error-prone to manually write down the formulas for each version of the kernel. Further, the verifier's abstract semantics can change across versions. Hence, we automatically generate the first-order logic formula (in SMT-LIB format) directly from the verifier's C source code. Modeling C code in general is hard [42,46,64]. However, we observe that it is sufficient to handle a subset of C for the verifier's value-tracking routines.

Verifier's C Code for Value-tracking. The kernel uses two integers to represent abstract values for each of the five domains (Sect. 3). These 10 integers are encapsulated in a structure named bpf\_reg\_state (reg\_st for short). The tnum domain is further encapsulated within reg\_st in a struct called tnum. This static "register state" is maintained for each register in the eBPF program being analyzed. The kernel has a single top-level function called adjust\_scalar\_min\_max\_vals (adjust\_scalar for short) that is called for each abstract operator corresponding to ALU instructions [16]. This function takes three arguments: opcode and two register states named dst and src that track the abstract value in the destination and source register of the eBPF instruction, respectively. Depending on the opcode, one of several switch-cases is executed, which leads to instruction-specific function calls that modify the abstract values in dst and src. None of the functions updating register state in the call-chain have recursion or loops. The kernel has a structured way of accessing the members of reg\_st. We use these specific features to translate C code to logic. The structures of the corresponding functions for jumps (reg\_set\_min\_max and descendants) are similar.

Preprocessing the Verifier's C Code. We use the LLVM compiler's [47] intermediate representation (IR) because it allows us to handle complex C code and provides a collection of tools to modify, optimize, and analyze the IR. Figure 4(a) shows an overview of our tool's pipeline. Consider the case where we want to generate the SMT-LIB file for the abstract operator corresponding to the 32-bit bitwise OR instruction (bpf\_or32). After obtaining the verifier's code in IR (stage 1 ), we proceed to apply our custom IR-transforming passes (stage 2 ). First, we remove functions that are not relevant to our purpose because they do not modify register state. Next, we inline all the function calls that adjust\_scalar makes. Inlining is possible because there are no recursive functions or loops in the call-graph. Next, we need to create a slice of the verifier that is only concerned with bpf\_or32. We inject an LLVM instruction in the entry basic block of adjust\_scalar which sets the opcode to bpf\_or32. LLVM's optimizer removes all irrelevant code from this IR with constant propagation and dead-code elimination. Next, we adapt a transformation pass from Seahorn's [42] codebase, which allows us to lower memcpy instructions to a sequence of stores. The result is a single function in LLVM IR, which captures the action of the abstract operator given input abstract states (i.e., dst and src) for one instruction (bpf\_or32).

The LLVMToSMT Pass. In step 3 , we use the theory of bitvectors to generate the first-order logic formula for the function obtained from step 2 . Since we encode everything with bitvectors, we need a memory model to capture memory accesses. We model memory as a set of two disjoint regions pointed to by dst and src. Given that the memory is only accessed via the structure reg\_st's fields, we can further view memory as a set of named registers. This allows us to model the entire memory as a tree of bitvectors: the leaf nodes store bitvectors corresponding to the first-class members of reg\_st (e.g. for u64\_min), the non-leaf nodes store trees of aggregate types (e.g. for tnum). C struct member accesses in IR begin with a getelementptr (GEP) instruction, which calculates the pointer (address) of the struct's member. We use an indexing similar to that used by GEP to to identify the bitvector that corresponds to the accessed member.

Handling Straight Line Code and Branches. LLVM's IR is already in SSA form. Every IR instruction that produces a value defines a new temporary virtual

Fig. 4. (a) The pipeline for automatically generating an SMT-LIB file from the Linux kernel's verifier.c. Shown here is an instance of the pipeline for the bpf\_or32 instruction. (b) The LLVM IR presented as a CFG, overlaid with MemorySSA analysis in red, for a function adjust\_scalar\_bpf\_or32 that is representative of verifier code for bpf\_or32. It takes as input two structs dst and src and modifies them.

register. We create a fresh bitvector variable when we encounter a temporary in the IR. Consider a simple addition instruction: %y = add i64 %x, 3. To encode the instruction, we create a formula that asserts an equality between a fresh bitvector BVy and the existing one BVx, based on the semantics of the instruction: BVy == BVx + BVconst3.

To handle branches, we precondition the SMT formula for each basic block with its path condition. As the IR we analyze does not contain loops, the control flow graph (CFG) is a directed acyclic graph. Hence, the path condition of each basic block is a disjunction of path conditions flowing through each incoming edge into the node corresponding to that block in the CFG. Phi nodes (φ's) in SSA merge the values flowing in from various paths. We use the phi instructions in IR to merge incoming values. We calculate an "edge condition" formula for each incoming edge to the phi. Then, we encode the phi instruction by appropriately setting the bitvector to the incoming values based on the edge condition.

Handling Memory Access Instructions. Our tool leverages LLVM's MemorySSA analysis [17] to handle loads and stores. The MemorySSA pass creates new versions of memory upon stores and branch merges, associates load instructions with specific versions, and provides a memory dependence graph between the memory versions. Figure 4 (b) shows an example CFG in IR overlaid with MemorySSA analysis (red). We maintain a one-to-one mapping between the different versions of memory presented by MemorySSA, and versions of our memory model consisting of bitvector-trees. liveOnEntry (line 3) is the memory version at the start of the function. The bitvectors in the corresponding bitvector-tree are the input operands for the kernel's abstract operators.

Every load instruction is annotated with a MemoryUse (e.g.. the load instruction on line 6 reads from the liveOnEntry memory version), and preceded by a GEP. Thus, we choose the appropriate bitvector-tree and index into it to obtain the appropriate bitvector (say BVsrc0). We encode the load instruction as: (BVx1 == BVsrc0). A store instruction (e.g. line 12, annotated using a MemoryDef) modifies an existing memory version (liveOnEntry) to create new version (1). We create a new bitvector-tree and map it to version 1. The bitvectors in this bitvector-tree are exactly the same as liveOnEntry's, except for the bitvector in the location that the store modifies. The latter bitvector is replaced with the bitvector mapped to the temporary used for the store. For a MemoryPhi node (e.g. line 18, creating version 3), we create a new bitvector-tree for the latest memory version (e.g. 3). Similar to regular phi nodes, we use the edge condition of the incoming edges to conditionally set each bitvector in the new bitvector-tree to the corresponding bitvector in the memory version propagated through that edge.

The bitvector-tree corresponding to the active memory version at the point of the (unique) ret instruction (e.g. 3 in the lend block) contains the output operands for the kernel's abstract operators.

### 6 Experimental Evaluation

Our prototype, Agni [18,72], automatically checks the soundness of the value tracking algorithms in various versions of the kernel eBPF verifier. It uses LLVM 12 [47] for the C to logic translation and the Z3 SMT solver [36] for checking formulas. The source code for our prototype is publicly available [18,72]. We evaluate Agni to determine the effectiveness in checking soundness of the kernel verifier and the ability to generate eBPF programs that manifest soundness violations (which we call proof-of-concepts, or POCs).

Checking Soundness Across Kernel Versions. We have automatically checked the soundness of all combinations of abstract operators and abstract domains for kernels between versions 4.14 and 5.19. Figure 5(a) provides a summary of our results. To keep the size of the table short, we only report kernel versions starting from 4.14 that are known to have a documented CVE or a bug that is distinct from one in a prior kernel version (4.14, 5.5, 5.7-rc1, 5.8, ...). We evaluated intermediate kernel versions that are not reported; our tool can support all kernel versions between 4.14 to 5.19 (the latest as of this writing).

We compare our generic soundness specification (Sect. 4.1, labeled gen in columns 2,4,6) and the refined one (Sect. 4.2, labeled sro in columns 3,5,7). A kernel with at least one potentially unsound domain or operator is considered unsound (columns 2 and 3). Operator+domain pairs that violated the soundness specification are reported in columns 4 and 5. Those operators that violated soundness in at least one domain are reported in columns 6 and 7.

All kernel versions including the latest ones are unsound with respect to the generic soundness specification (column 2). Even in one of the latest ver-


Fig. 5. (a) Soundness violations detected with the generic soundness specification (Sect. 4.1, labeled *gen*) in comparison to the refined specification (Sect. 4.2, labeled *sro*). We show the number of violating operator+domain pairs (columns 4-5) and number of unsound operators (columns 6-7) (b) Number of generated POCs and their lengths for unsound operator+domains after *sro* checks.

sions of the kernel (v5.19), 6 operators corresponding to bpf\_xor64, bpf\_xor32, bpf\_and64, bpf\_or64, bpf\_or32, and bpf\_and32 are unsound according to the generic soundness specification (column 6, row of kernel version 5.19). Refining the soundness specification enables us to prove the soundness of all operators in kernels newer than 5.13 (column 3). However, even the latter reports violations for older kernels. Among those violations, 27 were previously unknown. A single wrong abstract operator can violate the soundness of many abstract domains (up to 5). The refined (sro) specification reduces the reported soundness violations by ≈ 6.8% in potentially unsound kernel versions and by 100% in sound ones.

We observed that the 64-bit jump instructions and 64-bit/32-bit bitwise instructions exhibited the largest number of soundness violations. The unsoundness persisted across multiple kernel versions (until eventually patched).

Generating POCs for Unsound kernels. We evaluate the ability of differential synthesis (Sect. 4.3) to generate eBPF programs that manifest soundness bugs. Figure 5(b) summarizes our results. Starting with operator+domain pairs from soundness violations uncovered by sro (column 2), we report whether all operator+domain violations were successfully manifested using POCs (column 3) and the lengths of the POCs successfully generated (columns 4,5,6). We produced a POC for ≈ 97% of soundness violations across kernel versions (validated as described in Sect. 4.3). The smallest POCs for many violations require multiinstruction programs. For example, none of the soundness violations in version 5.5 may be manifested with a single eBPF instruction. We generated a POC for all soundness violations for all but 2 versions of the kernel (for versions 4.14 and 5.5, we generated a POC for all but 3 and 8 violations respectively). The ability to manifest almost all of the reported sro violations speaks to the significance and precision of the refinement in the soundness specification. Our differential synthesis technique may enable developers to experiment with concrete eBPF programs to validate and debug unsound behaviors in the kernel verifier.

Some bugs in the eBPF verifier are well known security vulnerabilities and have known POCs [51,62]. We have generated a POC, of equal or lesser size, for all known CVEs in the kernel versions analyzed. For example, we have generated a POC for a well known bug with two instructions instead of four [62].

Time Taken to Verify kernels and Generate POCs. We conducted our experiments on the Cloudlab [37] testbed, using a machine with two 10-core Intel Skylake CPUs running at 2.20 GHz with 192 GB of memory. When using the generic soundness specifications, 90% of the abstract operators (eBPF instructions) were checked for soundness within ≈ 100 minutes. If deemed unsound, the refined specification was checked in ≈ 30 minutes for ≈ 90% of the unsound operators. On the extreme, verifying some operators, as well as finding a POC for some soundness violations, may take a long time (2000 min or more). We attribute this to the significant size of the SMT-LIB formulas that are generated. We were able to find POCs for 90% of the soundness violations in kernel versions 5.7-rc1 through 5.12 within a few hours.

## 7 Limitations and Caveats

The results in this paper must be interpreted with the following caveats.

Only Range Analysis is Considered. There are other static analyses in the kernel verifier beyond range analysis (Sect. 1). These include tracking register liveness for reading and writing, and detecting speculative execution vulnerabilities.

Coverage of eBPF Abstract Operators. We exclude verifying the soundness of the abstract operators corresponding to multiplication as they cause our SMT verifications to time out. This is primarily due to the presence of 64-bit bitvector multiplication in the SMT encoding of these operators. We have verified their soundness using 8-bit bitvectors. Our results on (un)soundness cover all other abstract arithmetic, logic, and branching operators (Sect. 4.1).

Trusted Computing Base. Our C to SMT translation (Sect. 5) and soundness proofs have software dependencies including the LLVM compiler infrastructure, the Z3 solver, and our translation passes, which together form our trusted computing base. We have unit tested our C-to-SMT translations extensively. We validated our synthesized POCs by manually executing them in Linux kernels running inside the QEMU emulator, replicating the soundness bugs. Despite our best efforts, it is possible that there are bugs in our software infrastructure.

Incompleteness of Differential Synthesis. The differential synthesis approach is incomplete (Sect. 4.3). If our refined verification condition (Eq. (4)) finds an operator unsound, and the synthesis is unable to produce a POC, there are two possibilities. First, there may be long programs which could manifest the unsound behavior. Our enumerative algorithm currently times out for programs of length ≥ 4. Second, it is possible that the bug cannot be manifested with any concrete eBPF program, and is reported due to overapproximation in the soundness specification.

## 8 Related Work

Closest Related Work. The two closely related prior works are: (1) a paper on tnum verification [71], and (2) a recent manuscript on verifying range analysis [21]. The tnum paper explores formal verification for a single abstract domain: tnums. The recent manuscript [21] also aims to prove the soundness of the eBPF verifier's value-tracking. In contrast, our work differs by (1) exposing the nonmodular nature of the abstract operators in the kernel, and (2) proposing a method to reason about abstract operators for both arithmetic and branches, (3) automatically generating VCs from kernel source code, and (4) synthesizing eBPF programs that exercise the divergence of abstract and concrete semantics.

Safety of eBPF Programs And Static Analyzers. eBPF compilation and interpreter safety has been a site of recent endeavors [59,60,69,73,74]. PRE-VAIL [39] uses abstract interpretation using the zone abstract domain for checking safety outside the kernel. In contrast, we focus on proving the soundness of the in-kernel verifier.

Abstract Interpretation And Domain Refinement. Prior work on abstract interpretation [30,31,33] and value-tracking abstract domains [55,56,68] have indirectly influenced the eBPF verifier's design [61,71]. The idea of combining abstract domains to enhance the precision of abstract representations was first introduced by Cousot with the reduced product and disjunctive completion domain refinements [29,34] and further improved by others [70]. A systematic survey on product abstract operators is also available [28]. Specifically, we tailor our work to verify the abstract operators in the Linux kernel.

C to First-order Logic. Similar to our approach that generates first-orderlogic formulas from C code, prior tools also generate verification conditions from C code [42,46,54,64]. A few of them, SMACK [64] and SeaHorn [42], use LLVM IR for this purpose. These tools support a rich subset of C. They typically model memory as a linear array of bytes, which is not ideal for modeling kernel source code. We explore a subset of C that is sufficient to handle kernel code and still generates queries using only the bitvector theory, which enables us to efficiently verify soundness for multiple versions of the kernel.

## 9 Conclusion

We present a fully automated method to verify the soundness of range analysis in the Linux kernel's eBPF verifier. We are able to check the soundness of multiple kernel versions automatically because we generate the verification conditions for the abstract operators directly from the kernel C code. We develop specifications for reasoning about soundness when multiple abstract domains are combined in a non-modular fashion in the kernel. Our refinement to this specification, capturing preconditioning in the kernel, proves the soundness of recent Linux kernels. We also successfully generate concrete eBPF programs that demonstrate the divergence between abstract and concrete semantics when soundness checks fail. Our next step is to push for incorporating this approach in the kernel development process, to help eliminate verifier bugs during code review.

Acknowledgement. This paper is based upon work supported in part by the National Science Foundation under FMITF-Track I Grant No. 2019302. We thank the CAV reviewers, and He Zhu for their valuable feedback. We also thank CloudLab for providing the research testbed for our experiments.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Software Verification**

## Automated Verification of Correctness for Masked Arithmetic Programs

Mingyang Liu<sup>1</sup>, Fu Song1,2,3(B) , and Taolue Chen<sup>4</sup>

<sup>1</sup> ShanghaiTech University, Shanghai 201210, China <sup>2</sup> Institute of Software, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing 100190, China songfu@shanghaitech.edu.cn <sup>3</sup> Automotive Software Innovation Center, Chongqing 400000, China

<sup>4</sup> Birkbeck, University of London, London WC1E 7HX, UK

Abstract. Masking is a widely-used effective countermeasure against power side-channel attacks for implementing cryptographic algorithms. Surprisingly, few formal verification techniques have addressed a fundamental question, i.e., whether the masked program and the original (unmasked) cryptographic algorithm are functional equivalent. In this paper, we study this problem for masked arithmetic programs over Galois fields of characteristic 2. We propose an automated approach based on term rewriting, aided by random testing and SMT solving. The overall approach is sound, and complete under certain conditions which do meet in practice. We implement the approach as a new tool FISCHER and carry out extensive experiments on various benchmarks. The results confirm the effectiveness, efficiency and scalability of our approach. Almost all the benchmarks can be proved for the first time by the term rewriting system solely. In particular, FISCHER detects a new flaw in a masked implementation published in EUROCRYPT 2017.

## 1 Introduction

Power side-channel attacks [42] can infer secrecy by statistically analyzing the power consumption during the execution of cryptographic programs. The victims include implementations of almost all major cryptographic algorithms, e.g., DES [41], AES [54], RSA [33], Elliptic curve cryptography [46,52] and postquantum cryptography [56,59]. To mitigate the threat, cryptographic algorithms are often implemented via *masking* [37], which divides each secret value into (d + 1) shares by randomization, where d is a given masking order. However, it is error-prone to implement secure and correct masked implementations for nonlinear functions (e.g., finite-field multiplication, module addition and S-Box),

This work is supported by the National Natural Science Foundation of China (62072309), CAS Project for Young Scientists in Basic Research (YSBR-040), ISCAS New Cultivation Project (ISCAS-PYFX-202201), an oversea grant from the State Key Laboratory of Novel Software Technology, Nanjing University (KFKT2022A03), and Birkbeck BEI School Project (EFFECT).

which are prevalent in cryptography. Indeed, published implementations of AES S-Box that have been proved secure via paper-and-pencil [19,40,58] were later shown to be vulnerable to power side-channels when d is no less than 4 [24].

While numerous formal verification techniques have been proposed to prove resistance of masked cryptographic programs against power side-channel attacks (e.g., [7,13,26,29–32,64]), one fundamental question which is largely left open is the (functional) correctness of the masked cryptographic programs, i.e., whether a masked program and the original (unmasked) cryptographic algorithm are actually functional equivalent. It is conceivable to apply general-purpose program verifiers to masked cryptographic programs. Constraint-solving based approaches are available, for instance, Boogie [6] generates constraints via weakest precondition reasoning which then invokes SMT solvers; SeaHorn [36] and CPAChecker [12] adopt model checking by utilizing SMT or CHC solvers. More recent work (e.g., CryptoLine [28,45,53,62]) resorts to computer algebra, e.g., to reduce the problem to the ideal membership problem. The main challenge of applying these techniques to masked cryptographic programs lies in the presence of finite-field multiplication, affine transformations and bitwise exclusive-OR (XOR). For instance, finite-field multiplication is not natively supported by the current SMT or CHC solvers, and the increasing number of bitwise XOR operations causes the infamous state-explosion problem. Moreover, to the best of our knowledge, current computer algebra systems do not provide the full support required by verification of masked cryptographic programs.

Contributions. We propose a novel, term rewriting based approach to efficiently check whether a masked program and the original (unmasked) cryptographic algorithm (over Galois fields of characteristic 2) are functional equivalent. Namely, we provide a term rewriting system (TRS) which can handle affine transformations, bitwise XOR, and finite-field multiplication. The verification problem is reduced to checking whether a term can be rewritten to normal form 0. This approach is sound, i.e., once we obtain 0, we can claim functional equivalence. In case the TRS reduces to a normal form which is different from 0, most likely they are *not* functional equivalent, but a false positive is possible. We further resort to random testing and SMT solving by directly analyzing the obtained normal form. As a result, it turns out that the overall approach is complete if no uninterpreted functions are involved in the normal form.

We implement our approach as a new tool FISCHER (FunctionalIty of maSked CryptograpHic program verifiER), based on the LLVM framework [43]. We conduct extensive experiments on various masked cryptographic program benchmarks. The results show that our term rewriting system solely is able to prove almost all the benchmarks. FISCHER is also considerably more efficient than the general-purpose verifiers SMACK [55], SeaHorn, CPAChecker, and Symbiotic [22], cryptography-specific verifier CryptoLine, as well as a straightforward approach that directly reduces the verification task to SMT solving. For instance, our approach is able to handle masked implementations of finite-field multiplication with masking orders up to 100 in less than 153 s, while none of the compared approaches can handle masking order of 3 in 20 min.

In particular, for the first time we detect a flaw in a masked implementation of finite-field multiplication published in EUROCRYPT 2017 [8]. The flaw is tricky, as it only occurs for the masking order d ≡ 1 mod 4. <sup>1</sup> This finding highlights the importance of the correctness verification of masked programs, which has been largely overlooked, but of which our work provides an effective solution.

Our main contributions can be summarized as follows.


Related Work. Program verification has been extensively studied for decades. Here we mainly focus on their application in cryptographic programs, for which some general-purpose program verifiers have been adopted. Early work [3] uses Boogie [6]. HACL\* [65] uses F\* [2] which verifies programs by a combination of SMT solving and interactive proof assistants. Vale [15] uses F\* and Dafny [44] where Dafny harnesses Boogie for verification. Cryptol [61] checks equivalence between machine-readable cryptographic specifications and real-world implementations via SMT solving. As mentioned before, computer algebra systems (CAS) have also been used for verifying cryptographic programs and arithmetic circuits, by reducing to the ideal membership problem together with SAT/SMT solving. Typical work includes CryptoLine and AMulet [38,39]. However, as shown in Sect. 7.2, neither general-purpose verifiers (SMACK with Boogie and Corral, SeaHorn, CPAChecker and Symbiotic) nor the CAS-based verifier CryptoLine is sufficiently powerful to verify masked cryptographic programs. Interactive proof assistants (possibly coupled with SMT solvers) have also been used to verify unmasked cryptographic programs (e.g., [1,4,9,23,27,48,49]). Compared to them, our approach is highly automatic, which is more acceptable and easier to use for general software developers.

Outline. Section 2 recaps preliminaries. Section 3 presents a language on which the cryptographic program is formalized. Section 4 gives an example and an overview of our approach. Section 5 and Sect. 6 introduce the term rewriting system and verification algorithms. Section 7 reports experimental results. We conclude in Sect. 8. The source code of our tool and benchmarks are available at https://github.com/S3L-official/FISCHER.

### 2 Preliminaries

For two integers l, u with l ≤ u, [l, u] denotes the set of integers {l,l + 1, ··· , u}. Galois Field. A *Galois field* GF(p*<sup>n</sup>*) comprises polynomials <sup>a</sup>*<sup>n</sup>*−<sup>1</sup>X*<sup>n</sup>*−<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>a</sup>1X<sup>1</sup> <sup>+</sup> <sup>a</sup><sup>0</sup> over <sup>Z</sup>*<sup>p</sup>* = [0, p <sup>−</sup> 1], where <sup>p</sup> is a prime number, <sup>n</sup> is a positive integer, and <sup>a</sup>*<sup>i</sup>* <sup>∈</sup> <sup>Z</sup>*p*. (Here <sup>p</sup> is the *characteristic* of the field, and <sup>p</sup>*<sup>n</sup>*

<sup>1</sup> This flaw has been confirmed by an author of [8].

is the *order* of the field.) Symmetric cryptography (e.g., DES [50], AES [25], SKINNY [10], PRESENT [14]) and bitsliced implementations of asymmetric cryptography (e.g., [17]) intensively uses GF(2*<sup>n</sup>*). Throughout the paper, F denotes the Galois field GF(2*<sup>n</sup>*) for a fixed <sup>n</sup>, and <sup>⊕</sup> and <sup>⊗</sup> denote the addition and multiplication on F, respectively. Recall that GF(2*<sup>n</sup>*) can be constructed from the quotient ring of the polynomial ring GF(2)[X] with respect to the ideal generated by an irreducible polynomial P of degree n. Hence, multiplication is the product of two polynomials modulo P in GF(2)[X] and addition is bitwise exclusive-OR (XOR) over the binary representation of polynomials. For example, AES uses GF(256) = GF(2)[X]/(X<sup>8</sup> + X<sup>4</sup> + X<sup>3</sup> + X + 1). Here n = 8 and P = X<sup>8</sup> + X<sup>4</sup> + X<sup>3</sup> + X + 1.

Higher-Order Masking. To achieve order-d security against power sidechannel attacks under certain leakage models, masking is usually used [37,60]. Essentially, masking partitions each secret value into (usually d + 1) shares so that knowing at most d shares cannot infer any information of the secret value, called *order-*<sup>d</sup> *masking*. In Boolean masking, a value <sup>a</sup> <sup>∈</sup> <sup>F</sup> is divided into shares <sup>a</sup>0, a1,...,a*<sup>d</sup>* <sup>∈</sup> <sup>F</sup> such that <sup>a</sup>0⊕a1⊕...⊕a*<sup>d</sup>* <sup>=</sup> <sup>a</sup>. Typically, <sup>a</sup>1,...,a*<sup>d</sup>* are random values and a<sup>0</sup> = a⊕a1⊕...⊕a*d*. The tuple (a0, a1,...,a*d*), denoted by **a**, is called an *encoding* of a. We write - *<sup>i</sup>*∈[0*,d*] **<sup>a</sup>***<sup>i</sup>* (or simply **a**) for a0⊕a1⊕...⊕a*d*. Additive masking can be defined similarly to Boolean masking, where ⊕ is replaced by the module arithmetic addition operator. In this work, we focus on Boolean masking as the XOR operation is more efficient to implement.

To implement a masked program, for each operation in the cryptographic algorithm, a corresponding operation on shares is required. As we will see later, when the operation is affine (i.e. the operation f satisfies f(x⊕y) = f(x)⊕f(y)⊕c for some constant c), the corresponding operation is simply to apply the original operation on each share a*<sup>i</sup>* in the encoding (a0, a1,...,a*d*). However, for nonaffine operations (e.g., multiplication and addition), it is a very difficult task and error-prone [24]. Ishai et al. [37] proposed the first masked implementation of multiplication, but limited to the domain GF(2) *only*. The number of the required random values and operations is not optimal and is known to be vulnerable in the presence of glitches because the electric signals propagate at different speeds in the combinatorial paths of hardware circuits. Thus, various follow-up papers proposed ways to implement higher-order masking for the domain GF(2*<sup>n</sup>*) and/or optimizing the computational complexity, e.g., [8,11,21,34,58], all of which are referred to as ISW scheme in this paper. In another research direction, new glitch-resistant Boolean masking schemes have been proposed, e.g., Hardware Private Circuits (HPC1 & HPC2) [20], Domain-oriented Masking (DOM) [35] and Consolidating Masking Schemes (CMS) [57]. In this work, we are interested in automatically proving the correctness of the masked programs.

#### 3 The Core Language

In this section, we first present the core language MSL, given in Fig. 1, based on which the verification problem is formalized.

Fig. 1. Syntax of MSL in Backus-Naur form

A program P in MSL is given by a sequence of procedure definitions and affine transformation definitions/declarations. A procedure definition starts with the keyword proc, followed by a procedure name, a list of input parameters, an output and its body. The procedure body has two blocks of statements, separated by a special statement shares d+ 1, where d is the masking order. The first block stmtsorigin, called the *original block*, implements its original functionality on the input parameters without masking. The second block stmtsmasked, called the *masked block*, is a masked implementation of the original block over the input encodings **x** of the input parameters x. The input parameters and output x, declared using the keywords input and output respectively, are scalar variables in the original block, but are treated as the corresponding encodings (i.e., tuples) **x** in the masked block. For example, input x declares the scalar variable x as the input of the original block, while it implicitly declares an encoding **x** = (x0, x1,...,x*d*) as the input of the masked block with shares d + 1.

We distinguish affine transformation definitions and declarations. The former starts with the keyword affine, followed by a name f, an input, an output and its body. It is expected that the affine property <sup>∀</sup>x, y <sup>∈</sup> <sup>F</sup>.f(x⊕y) = <sup>f</sup>(x)⊕f(y)⊕<sup>c</sup> holds for some affine constant <sup>c</sup> <sup>∈</sup> <sup>F</sup>. (Note that the constant <sup>c</sup> is not explicitly provided in the program, but can be derived, cf. Sect. 6.2.) The transformation f is *linear* if its affine constant c is 0. In contrast, an affine transformation declaration f simply declares a transformation. As a result, it can only be used to declare a linear one (i.e., c must be 0), which is treated as an uninterpreted function. Note that non-linear affine transformation declarations can be achieved by declaring linear affine transformations and affine transformation definitions. Affine transformation here serves as an abstraction to capture complicated operations (e.g., shift, rotation and bitwise Boolean operations) and can accelerate verification by expressing operations as uninterpreted functions. In practice, a majority of cryptographic algorithms (in symmetric cryptography) can be represented by a composition of S-box, XOR and linear transformation only.

Masking an affine transformation can simply mask an input encoding in a share-wise way, namely, the masked version of the affine transformation f(a) is

$$f(a\_0 \oplus a\_1 \oplus \dots \oplus a\_d) = \begin{cases} f(a\_0) \oplus f(a\_1) \oplus \dots \oplus f(a\_d), & \text{if } d \text{ is even;}\\ f(a\_0) \oplus f(a\_1) \oplus \dots \oplus f(a\_d) \oplus c, & \text{if } d \text{ is odd.} \end{cases}$$

This is default, so affine transformation definition only contains the original block but no masked block.

A statement is either an assignment or a function call. MSL features two types of assignments which are either of the form x ← e defined as usual or of the form r ←rand which assigns a uniformly sampled value from the domain F to the variable r. As a result, r should be read as a random variable. We assume that each random variable is defined only once. We note that the actual parameters and output are scalar if the procedure is invoked in an original block while they are the corresponding encodings if it is invoked in a masked block.

MSL is the core language of our tool. In practice, to be more user-friendly, our tool also accepts C programs with conditional branches and loops, both of which should be statically determinized (e.g., loops are bound and can be unrolled; the branching of conditionals can also be fixed after loop unrolling). Furthermore, we assume there is no recursion and dynamic memory allocation. These restrictions are sufficient for most symmetric cryptography and bitsliced implementations of public-key cryptography, which mostly have simple control graphs and memory aliases.

Problem Formalization. Fix a program P with all the procedures using orderd masking. We denote by P<sup>o</sup> (resp. Pm) the program P where all the masked (resp. original) blocks are omitted. For each procedure f, the procedures f<sup>o</sup> and f<sup>m</sup> are defined accordingly.

Definition 1. *Given a procedure* f *of* P *with* m *input parameters,* f<sup>m</sup> *and* f<sup>o</sup> *are* functional equivalent*, denoted by* f<sup>m</sup> ∼= fo*, if the following statement holds:*

$$\forall a^1, \dots, a^m, r\_1, \dots, r\_h \in \mathbb{F}, \forall \mathbf{a}^1, \dots, \mathbf{a}^m \in \mathbb{F}^{d+1}.$$

$$(\bigwedge\_{i \in [1, m]} a^i = \bigoplus\_{j \in [0, d]} \mathbf{a}\_j^i) \to (f\_\bullet(a^1, \dots, a^m) = \bigoplus\_{i \in [0, d]} f\_\mathbf{m}(\mathbf{a}^1, \dots, \mathbf{a}^m)\_i).$$

*where* r1, ··· , r*<sup>h</sup> are all the random variables used in* fm*.*

Note that although the procedure f<sup>m</sup> is randomized (i.e., the output encoding <sup>f</sup>m(**a**<sup>1</sup>, ··· , **<sup>a</sup>***<sup>m</sup> <sup>i</sup>* ) is technically a random variable), for functional equivalence we consider a stronger notion, viz., to require that f<sup>m</sup> and f<sup>o</sup> are equivalent under any values in the support of the random variables r1, ··· , r*h*. Thus, r1, ··· , r*<sup>h</sup>* are universally quantified in Definition 1.

The verification problem is to check if f<sup>m</sup> ∼= f<sup>o</sup> for a given procedure f where *<sup>i</sup>*∈[1*,m*] <sup>a</sup>*<sup>i</sup>* <sup>=</sup> - *<sup>j</sup>*∈[0*,d*] **<sup>a</sup>***<sup>i</sup> <sup>j</sup>* and <sup>f</sup>o(a<sup>1</sup>, ··· , a*<sup>m</sup>*) = - *<sup>i</sup>*∈[0*,d*] <sup>f</sup>m(**a**<sup>1</sup>, ··· , **<sup>a</sup>***<sup>m</sup>*)*<sup>i</sup>* are regarded as pre- and post-conditions, respectively. Thus, we assume the unmasked procedures themselves are correct (which can be verified by, e.g., CryptoLine). Our focus is on whether the masked counterparts are functional equivalent to them.

## 4 Overview of the Approach

In this section, we first present a motivating example given in Fig. 2, which computes the multiplicative inverse in GF(2<sup>8</sup>) for the AES S-Box [58] using first-order

Fig. 2. Motivating example, where **<sup>x</sup>** denotes (x0, x1).

Boolean masking. It consists of three affine transformation definitions and two procedure definitions. For a given input x, exp2(x) outputs x<sup>2</sup>, exp4(x) outputs x<sup>4</sup> and exp16(x) outputs x<sup>16</sup>. Obviously, these three affine transformations are indeed linear.

Procedure sec\_multo(a, b) outputs a⊗b. Its masked version sec\_multm(**a**, **b**) computes the encoding **c** = (c0, c1) over the encodings **a** = (a0, a1) and **b** = (b0, b1). Clearly, it is desired that c<sup>0</sup> ⊕ c<sup>1</sup> = a ⊗ b if a<sup>0</sup> ⊕ a<sup>1</sup> = a and b<sup>0</sup> ⊕ b<sup>1</sup> = b. Procedure refresh\_maskso(x) is the identity function while its masked version refresh\_masksm(**x**) re-masks the encoding **x** using a random variable r0. Thus, it is desired that y<sup>0</sup> ⊕ y<sup>1</sup> = x if x = x<sup>0</sup> ⊕ x1. Procedure sec\_exp254o(x) computes the multiplicative inverse x<sup>254</sup> of x in GF(2<sup>8</sup>). Its masked version sec\_exp254m(**x**) computes the encoding **y** = (y0, y1) where refresh\_masks<sup>m</sup> is invoked to avoid power side-channel leakage. Thus, it is desired that y<sup>0</sup> ⊕ y<sup>1</sup> = <sup>x</sup><sup>254</sup> if <sup>x</sup>0⊕x<sup>1</sup> <sup>=</sup> <sup>x</sup>. In summary, it is required to prove sec\_mult<sup>m</sup> <sup>∼</sup><sup>=</sup> sec\_multo, refresh\_masks<sup>m</sup> ∼= refresh\_masks<sup>o</sup> and sec\_exp254<sup>m</sup> ∼= sec\_exp254o.

#### 4.1 Our Approach

An overview of FISCHER is shown in Fig. 3. The input program is expected to follow the syntax of MSL but in C language. Moreover, the pre-conditions and post-conditions of the verification problem are expressed by assume and assert statements in the masked procedure, respectively. Recall that the input program can contain conditional branches and loops when are statically determinized. Furthermore, affine transformations can use other common operations (e.g., shift, rotation and bitwise Boolean operations) besides the addition ⊕ and multiplication <sup>⊗</sup> on the underlying field <sup>F</sup>. FISCHER leverages the LLVM framework to obtain the LLVM intermediate representation (IR) and call graph, where

Fig. 3. Overview of FISCHER.

all the procedure calls are inlined. It then invokes *Affine Constant Computing* to iteratively compute the affine constants for affine transformations according to the call graph, and *Functional Equivalence Checking* to check functional equivalence, both of which rely on the underpinning engines, viz., *Symbolic Execution* (refer to symbolic computation without path constraint solving in this work), *Term Rewriting* and *SMT-based Solving*.

We apply intra-procedural symbolic execution to compute the symbolic outputs of the procedures and transformations, i.e., expressions in terms of inputs, random variables and affine transformations. The symbolic outputs are treated as terms based on which both the problems of functional equivalence checking and affine constant computing are solved by rewriting to their normal forms (i.e., sums of monomials w.r.t. a total order). The analysis result is often conclusive from normal forms. In case it is inconclusive, we iteratively inline affine transformations when their definitions are available until either the analysis result is conclusive or no more affine transformations can be inlined. If the analysis result is still inconclusive, to reduce false positives, we apply random testing and accurate (but computationally expansive) SMT solving to the normal forms instead of the original terms. We remark that the term rewriting system solely can prove almost all the benchmarks in our experiments.

Consider the motivating example. To find the constant <sup>c</sup> <sup>∈</sup> <sup>F</sup> of exp2 such that the property <sup>∀</sup>x, y <sup>∈</sup> <sup>F</sup>.exp2(<sup>x</sup> <sup>⊕</sup> <sup>y</sup>) = exp2(x) <sup>⊕</sup> exp2(y) <sup>⊕</sup> <sup>c</sup> holds, by applying symbolic execution, exp2(x) is expressed as the term <sup>x</sup> <sup>⊗</sup> <sup>x</sup>. Thus, the property is reformulated as (x ⊕ y) ⊗ (x ⊕ y)=(x ⊗ x) ⊕ (y ⊗ y) ⊕ c, from which we can deduce that the desired affine constant c is equivalent to the term ((x ⊕ y) ⊗ (x ⊕ y)) ⊕ (x ⊗ x) ⊕ (y ⊗ y). Our TRS will reduce the term as follows:


For the transformation exp4(x), by applying symbolic execution, it can be expressed as the term exp2(exp2(x)). To find the constant <sup>c</sup> <sup>∈</sup> <sup>F</sup> to satisfy <sup>∀</sup>x, y <sup>∈</sup> <sup>F</sup>.exp4(x⊕y) = exp4(x)⊕exp4(y)⊕c, we compute the term exp2(exp2(x⊕y))<sup>⊕</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(y)). By applying our TRS, we have:

exp2(exp2(<sup>x</sup> <sup>⊕</sup> <sup>y</sup>)) <sup>⊕</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(y)) <sup>=</sup> exp2(exp2(x) <sup>⊕</sup> exp2(y)) <sup>⊕</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(y)) <sup>=</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(y)) <sup>⊕</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(y)) <sup>=</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(x)) <sup>⊕</sup> exp2(exp2(y)) <sup>⊕</sup> exp2(exp2(y)) = 0

Clearly, the affine constant of exp4 is 0. Similarly, we can deduce that the affine constant of the transformation exp16 is 0 as well.

To prove sec\_mult<sup>o</sup> ∼= sec\_multm, by applying symbolic execution, we have that sec\_multo(a, b) = a ⊗ b and sec\_multm(**a**, **b**) = **c** = (c0, c1), where c<sup>0</sup> = (a0⊗b0)⊕r<sup>0</sup> and c<sup>1</sup> = (a1⊗b1)⊕(r0⊕(a0⊗b1)⊕(a1⊗b0)). Then, by Definition 1, it suffices to check

$$\begin{aligned} \forall a, b, a\_0, a\_1, b\_0, b\_1, r\_0 &\in \mathbb{F}. \left( a = a\_0 \oplus a\_1 \land b = b\_0 \oplus b\_1 \right) \to \\ \left( a \otimes b = \left( \left( a\_0 \otimes b\_0 \right) \oplus r\_0 \right) \oplus \left( \left( a\_1 \otimes b\_1 \right) \oplus \left( r\_0 \oplus \left( a\_0 \otimes b\_1 \right) \oplus \left( a\_1 \otimes b\_0 \right) \right) \right) \right). \end{aligned}$$

Thus, we check the term (a<sup>0</sup> ⊕ a1) ⊗ (b<sup>0</sup> ⊕ b1) ⊕ ((a<sup>0</sup> ⊗ b0) ⊕ r0) ⊕ ((a<sup>1</sup> ⊗ b1) ⊕ (r<sup>0</sup> ⊕(a<sup>0</sup> ⊗b1)⊕(a<sup>1</sup> ⊗b0))) which is equivalent to 0 iff sec\_mult<sup>o</sup> ∼= sec\_multm. Our TRS is able to reduce the term to 0. Similarly, we represent the outputs of sec\_exp254<sup>o</sup> and sec\_exp254<sup>m</sup> as terms via symbolic execution, from which the statement sec\_exp254<sup>o</sup> ∼= sec\_exp254<sup>m</sup> is also encoded as a term, which can be reduced to 0 via our TRS without inlining any transformations.

## 5 Term Rewriting System

In this section, we first introduce some basic notations and then present our term rewriting system.

Definition 2. *Given a program* <sup>P</sup> *over* <sup>F</sup>*, a* signature <sup>Σ</sup><sup>P</sup> *of* <sup>P</sup> *is a set of symbols* <sup>F</sup> ∪ {⊕, <sup>⊗</sup>, f1,...,f*t*}*, where* <sup>s</sup> <sup>∈</sup> <sup>F</sup> *with arity* <sup>0</sup> *are all the constants in* <sup>F</sup>*,* <sup>⊕</sup> *and* <sup>⊗</sup> *with arity* <sup>2</sup> *are addition and multiplication operators on* <sup>F</sup>*, and* f1, ··· , f*<sup>t</sup> with arity* 1 *are affine transformations defined/declared in* P*.*

For example, the signature of the motivating example is <sup>F</sup> <sup>∪</sup> {⊕, <sup>⊗</sup>, exp2, exp4, exp16}. When it is clear from the context, the subscript <sup>P</sup> is dropped from Σ<sup>P</sup> .

Definition 3. *Let* V *be a set of variables (assuming* Σ∩V = ∅*), the set* T[Σ,V ] *of* Σ-terms *over* V *is inductively defined as follows:*


*We denote by* T\⊕(Σ,V ) *the set of* Σ*-terms that do not use the operator* ⊕*.*

<sup>A</sup> <sup>Σ</sup>-term <sup>α</sup> <sup>∈</sup> <sup>T</sup>[Σ,V ] is called a *factor* if <sup>τ</sup> <sup>∈</sup> <sup>F</sup> <sup>∪</sup> <sup>V</sup> or <sup>τ</sup> <sup>=</sup> <sup>f</sup>*i*(<sup>τ</sup> ) for some i ∈ [1, t] such that τ ∈ T\⊕(Σ,V ). A *monomial* is a product α<sup>1</sup> ⊗···⊗ α*<sup>k</sup>* of none-zero factors for k ≥ 1. We denote by M[Σ,V ] the set of monomials. For instance, consider variables x, y ∈ V and affine transformations f1, f<sup>2</sup> ∈ Σ. All f1(f2(x))⊗f1(y), f1(2⊗f2(4⊗x)), f1(x⊕y) and f1(f2(x))⊕f1(x) are Σ-terms, both f1(f2(x))⊗f1(y) and f1(2⊗f2(4⊗x)) are monomials, while neither f1(x⊕y) nor f1(f2(x))⊕f1(x) is a monomial. For the sake of presentation, Σ-terms will be written as terms, and the operator ⊗ may be omitted, e.g., τ1τ<sup>2</sup> denotes τ<sup>1</sup> ⊗ τ2, and <sup>τ</sup> <sup>2</sup> denotes <sup>τ</sup> <sup>⊗</sup> <sup>τ</sup> .

Definition 4. *A polynomial is a sum* - *<sup>i</sup>*∈[1*,t*] <sup>m</sup>*<sup>i</sup> of monomials* <sup>m</sup><sup>1</sup> ...m*<sup>t</sup>* <sup>∈</sup> M[Σ,V ]*. We use* P[Σ,V ] *to denote the set of polynomials.*

To simplify and normalize polynomials, we impose a total order on monomials and their factors.

Definition 5. *Fix an arbitrary total order* ≥*<sup>s</sup> on* V Σ*.*

*For two factors* α *and* α *, the* factor order ≥*<sup>l</sup> is defined such that* α ≥*<sup>l</sup>* α *if one of the following conditions holds:*

*–* α, α <sup>∈</sup> <sup>F</sup> <sup>∪</sup> <sup>V</sup> *and* <sup>α</sup> <sup>≥</sup>*<sup>s</sup>* <sup>α</sup> *; –* α = f(τ ) *and* α = f (τ ) *such that* f ≥*<sup>s</sup>* f *or (*f = f *and* τ ≥*<sup>p</sup>* τ *); –* α = f(τ ) *such that* f ≥*<sup>s</sup>* α *or* α = f(τ ) *such that* α ≥*<sup>s</sup>* f*.*

*Given a monomial* m = α<sup>1</sup> ··· α*k, we write* sort<sup>≥</sup>*<sup>l</sup>* (α1, ··· , α*k*) *for the monomial which includes* α1, ··· , α*<sup>k</sup> as factors, but sorts them in descending order.*

*Given two monomials* m = α<sup>1</sup> ··· α*<sup>k</sup> and* m = α <sup>1</sup> ··· α *k*- *, the* monomial order ≥*<sup>p</sup> is defined as the lexicographical order between* sort<sup>≥</sup>*<sup>l</sup>* (α1, ··· , α*k*) *and* sort<sup>≥</sup>*<sup>l</sup>* (α <sup>1</sup>, ··· , α *k*-)*.*

Intuitively, the factor order ≥*<sup>l</sup>* follows the given order ≥*<sup>s</sup>* on V Σ, where the factor order between two factors with the same affine transformation f is determined by their parameters. We note that if sort<sup>≥</sup>*<sup>l</sup>* (α <sup>1</sup>, ··· , α *k*- ) is a prefix of sort<sup>≥</sup>*<sup>l</sup>* (α1, ··· , α*k*), we have: α<sup>1</sup> ··· α*<sup>k</sup>* ≥*<sup>p</sup>* α <sup>1</sup> ··· α *k*- . Furthermore, if α<sup>1</sup> ··· α*<sup>k</sup>* ≥*<sup>p</sup>* α <sup>1</sup> ··· α *k* and α <sup>1</sup> ··· α *k*- ≥*<sup>p</sup>* α<sup>1</sup> ··· α*k*, then sort<sup>≥</sup>*<sup>l</sup>* (α <sup>1</sup>, ··· , α *k*- ) = sort<sup>≥</sup>*<sup>l</sup>* (α1, ··· , α*k*). We denote by α<sup>1</sup> ··· α*<sup>k</sup>* >*<sup>p</sup>* α <sup>1</sup> ··· α *k* if α<sup>1</sup> ··· α*<sup>k</sup>* ≥*<sup>p</sup>* α <sup>1</sup> ··· α *k*- but sort<sup>≥</sup>*<sup>l</sup>* (α <sup>1</sup>, ··· , α *k*-) = sort<sup>≥</sup>*<sup>l</sup>* (α1, ··· , α*k*).

Proposition 1. *The monomial order* ≥*<sup>p</sup> is a total order on monomials.*

Definition 6. *Given a program* P*, we define the corresponding term rewriting system (TRS)* R *as a tuple* (Σ,V, ≥*s*, Δ)*, where* Σ *is a signature of* P*,* V *is a set of variables of* P *(assuming* Σ ∩ V = ∅*),* ≥*<sup>s</sup> is a total order on* V Σ*, and* Δ *is the set of term rewriting rules given below:*

$$\begin{aligned} R1 \frac{(m\_1', \ldots, m\_k') = \mathtt{sort}\_{\geq \nu}(m\_1, \ldots, m\_k) \neq (m\_1, \ldots, m\_k)}{m\_1 \oplus \ldots \oplus m\_k \mapsto m\_1' \oplus \ldots \oplus m\_k'} &\ R3 \frac{\tau \oplus \tau \mapsto 0}{\tau \oplus \tau \mapsto 0} &\ R5 \frac{\tau \oplus \tau \qquad \cdots}{0 \tau \mapsto 0} \\\ R2 \frac{(\alpha\_1', \ldots, \alpha\_k') = \mathtt{sort}\_{\geq \nu}(\alpha\_1, \ldots, \alpha\_k) \neq (\alpha\_1, \ldots, \alpha\_k)}{\alpha\_1 \wedge \ldots \wedge \alpha\_k \mapsto \alpha\_1' \cdots \alpha\_k'} &\ R4 \frac{\tau \oplus 0 \longmapsto 0}{\tau \oplus 0 \longmapsto \tau} &\ R6 \frac{\tau \oplus 0 \longmapsto \tau}{\tau \oplus 0 \longmapsto \tau} \\\ R7 \frac{\partial \mathtt{R} \colon \begin{array}{c} \\ \hline \end{array} \\\ R8 \frac{\tau \Longrightarrow \tau}{\tau \longmapsto \tau} &\ R9 \frac{\tau \Longrightarrow \tau}{\tau \longmapsto \tau} &\ R10 \frac{\tau \end{array} &\ R10 \frac{\tau \oplus 0 \longmapsto \tau}{(\tau\_1 \oplus \tau\_2) \hookrightarrow (\tau\_1 \tau) \oplus (\tau\_2 \tau)} \\\ R13 \frac{\tau \oplus 0 \longmapsto \tau}{f(0) \mapsto c} &\ R13 \frac{\tau \oplus 0 \longmapsto \tau}{f(0) \mapsto c} &\ R2 \frac{\tau \oplus 0 \longmapsto \tau}{f(0) \mapsto c} \end{aligned}$$

*where* m1, m <sup>1</sup>, ··· , m*k*, m *<sup>k</sup>* ∈ M[Σ,V ]*,* α1, α2, α<sup>3</sup> *are factors,* τ,τ1, τ<sup>2</sup> ∈ T[Σ,V ] *are terms,* f ∈ Σ *is an affine transformation with affine constant* c*.*

Intuitively, rules R1 and R2 specify the commutativity of ⊕ and ⊗, respectively, by which monomials and factors are sorted according to the orders ≥*<sup>p</sup>* and ≥*l*, respectively. Rule R3 specifies that ⊕ is essentially bitwise XOR. Rules R4 and R5 specify that 0 is the multiplicative zero. Rules R6 and R7 (resp. R8 and R9) specify that 0 (resp. 1) is additive (resp. multiplicative) identity. Rules R10 and R11 express the distributivity of ⊗ over ⊕. Rule R12 expresses the affine property of an affine transformation while rule R13 is an instance of rule R12 via rules R3 and R5.

Given a TRS R = (Σ,V, ≥*s*, Δ) for a given program P, a term τ ∈ T[Σ,V ] can be rewritten to a term τ , denoted by τ ⇒ τ , if there is a rewriting rule τ<sup>1</sup> → τ<sup>2</sup> such that τ is a term obtained from τ by replacing an occurrence of the sub-term τ<sup>1</sup> with the sub-term τ2. A term is in a *normal form* if no rewriting rules can be applied. A TRS is *terminating* if all terms can be rewritten to a normal form after finitely many rewriting. We denote by τ τ with τ being the normal form of τ .

We show that any TRS R associated with a program P is terminating, and that any term will be rewritten to a normal form that is a polynomial, independent of the way of applying rules.

Lemma 1. *For every normal form* τ ∈ T[Σ,V ] *of the TRS* R*, the term* τ *must be a polynomial* m<sup>1</sup> ⊕···⊕ m*<sup>k</sup> such that (1)* ∀i ∈ [1, k − 1]*,* m*<sup>i</sup>* >*<sup>p</sup>* m*i*+1*, and (2) for every monomial* m*<sup>i</sup>* = α<sup>1</sup> ··· α*<sup>h</sup> and* ∀i ∈ [1, h − 1]*,* α*<sup>i</sup>* ≥*<sup>l</sup>* α*i*+1*.*

*Proof.* Consider a normal form τ ∈ T[Σ,V ]. If τ is not a polynomial, then there must exist some monomial m*<sup>i</sup>* in which the addition operator ⊕ is used. This means that either rule R<sup>10</sup> or R<sup>11</sup> is applicable to the term τ which contradicts the fact that τ is normal form.

Suppose τ is the polynomial m<sup>1</sup> ⊕···⊕ m*k*.


## Lemma 2. *The TRS* R = (Σ,V, ≥*s*, Δ) *of a given program* P *is terminating.*

*Proof.* Consider a term τ ∈ T[Σ,V ]. Let π = τ<sup>1</sup> ⇒ τ<sup>2</sup> ⇒ τ<sup>3</sup> ⇒···⇒ τ*<sup>i</sup>* ⇒··· be a reduction of the term τ by applying rewriting rules, i.e., τ = τ1. We prove that the reduction π is finite by showing that all the rewriting rules can be applied finitely.

First, since rules R1 and R2 only sort the monomials and factors, respectively, while sorting always terminates using any classic sorting algorithm (e.g., quick sort algorithm), rules R1 and R2 can only be consecutively applied finitely for each term <sup>τ</sup>*<sup>i</sup>* due to the premises sort<sup>≥</sup>*<sup>p</sup>* (m1, ··· , m*k*) = (m1, ··· , m*k*) and sort<sup>≥</sup>*<sup>l</sup>* (α1, ··· , α*k*) = (α1, ··· , α*k*) in rules R1 and R2, respectively.

Second, rules R10, R11 and R12 can only be applied finitely in the reduction π, as these rules always push the addition operator ⊕ toward the root of the syntax tree of the term τ*<sup>i</sup>* when one of them is applied onto a term τ*i*, while the other rules either eliminate or reorder the addition operator ⊕.


Lastly, rules R3–9 and R13 can only be applied finitely in the reduction π, as these rules reduce the size of the term by 1 when one of them is applied onto a term τ*<sup>i</sup>* while the rules R10–12 that increase the size of the term can only be applied finitely.

Hence, the reduction π is finite indicating that the TRS R is terminating.

By Lemmas 1 and 2, any term τ ∈ T[Σ,V ] can be rewritten to a normal form that must be a polynomial.

Theorem 1. *Let* R = (Σ,V, ≥*s*, Δ) *be the TRS of a program* P*. For any term* τ ∈ T[Σ,V ]*, a polynomial* τ ∈ T[Σ,V ] *can be computed such that* τ τ *.*

*Remark 1.* Besides the termination of a TRS, confluence is another important property of a TRS, where a TRS is confluent if any given term τ ∈ T[Σ,V ] can be rewritten to two distinct terms τ<sup>1</sup> and τ2, then the terms τ<sup>1</sup> and τ<sup>2</sup> can be reduced to a common term. While we conjecture that the TRS R associated with the given program is indeed confluent which may be shown by its local confluence [51], we do not strive to prove its confluence, as it is irrelevant to the problem considered in the current work.

## 6 Algorithmic Verification

In this section, we first present an algorithm for computing normal forms, then show how to compute the affine constant for an affine transformation, and finally propose an algorithm for solving the verification problem.

#### 6.1 Term Normalization Algorithm

We provide the function TermNorm (cf. Algorithm 1) which applies the rewriting rules in a particular order aiming for better efficiency. Fix a TRS R = (Σ,V, ≥*s*, Δ), a term τ ∈ T[Σ,V ] and a mapping λ that provides required affine constants λ(f). TermNorm(R,τ,λ) returns a normal form τ of τ , i.e., τ τ .


TermNorm first applies rules R3–R13 to rewrite the term τ (line 2), resulting in a polynomial which does not have 0 as a factor or monomial (due to rules R4–R7), or 1 as a factor in a monomial unless the monomial itself is 1 (due to rules R<sup>8</sup> and R9). Next, it recursively sorts all the factors and monomial involved in the polynomial from the innermost sub-terms (lines 3 and 4). Sorting factors and monomials will place the same monomials at adjacent positions. Finally, rules R3 and R6–R7 are further applied to simplify the polynomial (line 5), where consecutive syntactically equivalent monomials will be rewritten to 0 by rule R3, which may further enable rules R6–R7. Obviously, the final term τ is a normal form of the input τ , although its size may be exponential in that of τ .

Lemma 3. *TermNorm*(R,τ,λ) *returns a normal form* τ *of* τ *.*

#### 6.2 Computing Affine Constants

The function AffConst in Algorithm 2 computes the associated affine constant for an affine transformation f. It first sorts all affine transformations in a topological order based on the call graph G (lines 2–21). If f is *only* declared in P, as mentioned previously, we assumed it is linear, thus 0 is assigned to λ(f) (line 4). Otherwise, it extracts the input x of f and computes its output ξ(x) via symbolic execution (line 7), where ξ(x) is treated as f(x). We remark that during symbolic execution, we adopt a lazy strategy for inlining invoked affine transformations in f to reduce the size of ξ(x). Thus, ξ(x) may contain affine transformations.

Recall that <sup>c</sup> is the affine constant of <sup>f</sup> iff <sup>∀</sup>x, y <sup>∈</sup> <sup>F</sup>.f(x⊕y) = <sup>f</sup>(x)⊕f(y)⊕<sup>c</sup> holds. Thus, we create the term τ = ξ(x)[x → x⊕y]⊕ξ(x)⊕ξ(x)[x → y] (line 7), where e[a → b] denotes the substitution of a with b in e. Obviously, the term τ is equivalent to some constant c iff c is the affine constant of f.

The while-loop (lines 9–21) evaluates τ . First, it rewrites τ to a normal form (line 10) by invoking TermNorm in Alg.1. If the normal form is some constant c, then c is the affine constant of f. Otherwise, AffConst repeatedly inlines each affine transformation g that is defined in P but has not been inlined in τ (lines 13 and 14) and rewrites the term τ to a normal form until either the normal form is some constant c or no affine transformation can be inlined. If the normal form is still not a constant, τ is evaluated using random input values. Clearly, if τ is evaluated to two distinct values (line 18), f is not affine. Otherwise, we check the satisfiability of the constraint ∀x, y.τ = c via an SMT solver in bitvector theory (line 19), where declared but undefined affine transformations are treated as uninterpreted functions provided with their affine properties. If ∀x, y.τ = c is satisfiable, we extract the affine constant c from its model (line 20). Otherwise, we emit an error and then abort (line 21), indicating that the affine constant of f cannot be computed. Since the satisfiability problem module bitvector theory is decidable, we can conclude that f is *not* affine if ∀x.∀y.τ = c is unsatisfiable and no uninterpreted function is involved in τ .

Lemma 4. *Assume an affine transformation* f *in* P*. If AffConst*(P, R, G) *in Algorithm 2 returns a mapping* λ*, then* λ(f) *is the affine constant of* f*.*

#### 6.3 Verification Algorithm

The verification problem is solved by the function Verifier(P) in Algorithm 3, which checks if f<sup>m</sup> ∼= fo, for each procedure f defined in P. It first preprocesses the given program P by inlining all the procedures, unrolling all the loops and eliminating all the branches (line 2). Then, it computes the corresponding TRS R, call graph G and affine constants as the mapping λ, respectively (line 3). Next, it iteratively checks if f<sup>m</sup> ∼= fo, for each procedure f defined in P (lines 4–23).

For each procedure <sup>f</sup>, it first extracts the inputs <sup>a</sup>1, ··· , a*<sup>m</sup>* of <sup>f</sup><sup>o</sup> that are scalar variables (line 5) and input encodings **<sup>a</sup>**1, ··· , **<sup>a</sup>***<sup>m</sup>* of <sup>f</sup><sup>m</sup> that are vectors of variables (line 6). Then, it computes the output <sup>ξ</sup>(a1, ··· , a*<sup>m</sup>*) of <sup>f</sup><sup>o</sup> via symbolic execution, which yields an expression in terms of <sup>a</sup>1, ··· , a*<sup>m</sup>* and affine transformations (line 7). Similarly, it computes the output ξ (**a**1, ··· , **<sup>a</sup>***<sup>m</sup>*) of <sup>f</sup><sup>m</sup> via symbolic execution, i.e., a tuple of expressions in terms of the entries of the input encodings **<sup>a</sup>**<sup>1</sup>, ··· , **<sup>a</sup>***<sup>m</sup>*, random variables and affine transformations (line 8).

Recall that <sup>f</sup><sup>m</sup> <sup>∼</sup><sup>=</sup> <sup>f</sup><sup>o</sup> iff for all <sup>a</sup><sup>1</sup>, ··· , a*<sup>m</sup>*, r1, ··· , r*<sup>h</sup>* <sup>∈</sup> <sup>F</sup> and for all **<sup>a</sup>**<sup>1</sup>, ··· , **<sup>a</sup>***<sup>m</sup>* <sup>∈</sup> <sup>F</sup>*d*+1, the following constraint holds (cf. Definition 1):

$$f\left(\bigwedge\_{i\in[1,m]}a^i = \bigoplus\_{j\in[0,d]}\mathbf{a}\_j^i\right) \to \left(f\_\bullet(a^1,\cdots,a^m) = \bigoplus\_{i\in[0,d]}f\_\mathfrak{m}(\mathbf{a}^1,\cdots,\mathbf{a}\_i^m)\right)$$

where r1, ··· , r*<sup>h</sup>* are all the random variables used in fm. Thus, it creates the term <sup>τ</sup> <sup>=</sup> <sup>ξ</sup>(a<sup>1</sup>, ··· , a*<sup>m</sup>*)[a<sup>1</sup> → **<sup>a</sup>**<sup>1</sup>, ··· , a*<sup>m</sup>* → **<sup>a</sup>***<sup>m</sup>*] <sup>⊕</sup> ξ (**a**<sup>1</sup>, ··· , **<sup>a</sup>***<sup>m</sup>*) (line 9), where <sup>a</sup>*<sup>i</sup>* → **a***<sup>i</sup>* is the substitution of a*<sup>i</sup>* with the term **a***<sup>i</sup>* in the expression <sup>ξ</sup>(a<sup>1</sup>, ··· , a*<sup>m</sup>*). Obviously, <sup>τ</sup> is equivalent to <sup>0</sup> iff <sup>f</sup><sup>m</sup> <sup>∼</sup><sup>=</sup> <sup>f</sup>o.

#### Algorithm 3: Verification Algorithm


To check if τ is equivalent to 0, similar to computing affine constants in Algorithm 2, the algorithm repeatedly rewrites the term τ to a normal form by invoking TermNorm in Algorithm 1 until either the conclusion is drawn or no affine transformation can be inlined (lines 10–23). We declare that f is correct if the normal form is 0 (line 13) and incorrect if it is a non-zero constant (line 14). If the normal form is *not* a constant, we repeatedly inline affine transformation g defined in P which has not been inlined in τ and re-check the term τ .

If there is no definite answer after inlining all the affine transformations, τ is evaluated using random input values. f is *incorrect* if τ is non-zero (line 20). Otherwise, we check the satisfiability of the constraint τ = 0 via an SMT solver in bitvector theory (line 21). If τ = 0 is unsatisfiable, then f is *correct*. Otherwise we can conclude that f is *incorrect* if no uninterpreted function is involved in τ , but in other cases it is not conclusive.

Theorem 2. *Assume a procedure* f *in* P*. If Verifier*(P) *emits "*f is correct*", then* f<sup>m</sup> ∼= fo*; if Verifier*(P) *emits "*f is incorrect*" or "*f may be incorrect*" with no uninterpreted function involved in its final term* τ *, then* f<sup>m</sup> ∼= fo*.*

#### 6.4 Implementation Remarks

To implement the algorithms, we use the total order ≥*<sup>s</sup>* on V Σ where all the constants are smaller than the variables, which are in turn smaller than the affine transformations. The order of constants is the standard one on integers, and the order of variables (affine transformations) uses lexicographic order.

In terms of data structure, each term is primarily stored by a directed acyclic graph, allowing us to represent and rewrite common sub-terms in an optimised way. Once a (sub-)term becomes a polynomial during term rewriting, it is stored as a sorted nested list w.r.t. the monomial order ≥*p*, where each monomial is also stored as a sorted list w.r.t. the factor order ≥*l*. Moreover, the factor of the form α*<sup>k</sup>* in a monomial is stored by a pair (α, k).

We also adopted two strategies: (i) By Fermat's little theorem [63], x<sup>2</sup>*n*−<sup>1</sup> = 1 for any <sup>x</sup> <sup>∈</sup> GF(2*<sup>n</sup>*). Hence each <sup>k</sup> in (α, k) can be simplified to <sup>k</sup> mod (2*<sup>n</sup>* <sup>−</sup>1). (ii) By rule R12, a term f(τ<sup>1</sup> ⊕···⊕ τ*k*) can be directly rewritten to f(τ1) ⊕ ···⊕ (τ*k*) if k is odd, and f(τ1) ⊕···⊕ f(τ*k*) ⊕ c if k is even, where c is the affine constant associated with the affine transformation f.

#### 7 Evaluation

We implement our approach as a tool FISCHER for verifying masked programs in LLVM IR, based on the LLVM framework. We first evaluate FISCHER for computing affine constants (i.e., Algorithm 2), correctness verification, and scalability w.r.t. the masking order (i.e., Algorithm 3) on benchmarks using the ISW scheme. To show the generality of our approach, FISCHER is then used to verify benchmarks using glitch-resistant Boolean masking schemes and latticebased public-key cryptography. All experiments are conducted on a machine with Linux kernel 5.10, Intel i7 10700 CPU (4.8 GHz, 8 cores, 16 threads) and 40 GB memory. Milliseconds (ms) and seconds (s) are used as the time units in our experiments.

#### 7.1 Evaluation for Computing Affine Constants

To evaluate Algorithm 2, we compare with a pure SMT-based approach which directly checks <sup>∃</sup>c.∀x, y <sup>∈</sup> <sup>F</sup>.f(<sup>x</sup> <sup>⊕</sup> <sup>y</sup>) = <sup>f</sup>(x) <sup>⊕</sup> <sup>f</sup>(y) <sup>⊕</sup> <sup>c</sup> using Z3 [47], CVC5 [5] and Boolector [18], by implementing ⊕ and ⊗ in bit-vector theory, where ⊗ is achieved via the Russian peasant method [16]. Technically, SMT solvers only deal with satisfiability, but they usually can eliminate the universal quantifiers in this case, as x, y are over a finite field. In particular, in our experiment, Z3 is configured with default (i.e. (check-sat)), simplify (i.e. (check-sat-using (then simplify smt))) and bit-blast (i.e. (check-sat-using (then bit-blast smt))), denoted by Z3-d, Z3-s and Z3 b, respectively. We focus on the following functions: expi(x) = <sup>x</sup>*<sup>i</sup>* for <sup>i</sup> <sup>∈</sup> {2, 4, 8, 16}; rotli(x) for i ∈ {1, 2, 3, 4} that left rotates x by i bits; af(x) = rotl1(x) ⊕ rotl2(x) ⊕ rotl3(x) ⊕ rotl4(x) ⊕ 99 used in AES S-Box; L1(x) = <sup>7</sup>x<sup>2</sup> <sup>⊕</sup> <sup>14</sup>x<sup>4</sup> <sup>⊕</sup> <sup>7</sup>x<sup>8</sup>, L3(x)=7<sup>x</sup> <sup>⊕</sup> <sup>12</sup>x<sup>2</sup> <sup>⊕</sup> <sup>12</sup>x<sup>4</sup> <sup>⊕</sup> <sup>9</sup>x<sup>8</sup>, L5(x) = 10<sup>x</sup> <sup>⊕</sup> <sup>9</sup>x<sup>2</sup> and L7(x)=4<sup>x</sup> <sup>⊕</sup> <sup>13</sup>x<sup>2</sup> <sup>⊕</sup> <sup>13</sup>x<sup>4</sup> <sup>⊕</sup> <sup>14</sup>x<sup>8</sup> used in PRESENT S-Box over GF(16) = GF(2)[X]/(X<sup>4</sup> <sup>+</sup> <sup>X</sup> + 1) [14,19]; f1(x) = <sup>x</sup><sup>3</sup>, f2(x) = <sup>x</sup><sup>2</sup> <sup>⊕</sup> <sup>x</sup> <sup>⊕</sup> <sup>1</sup>, f3(x) = <sup>x</sup> <sup>⊕</sup> <sup>x</sup><sup>5</sup> and f4(x) = af(exp2(x)) over GF(2<sup>8</sup>).

Table 1. Results of computing affine constants, where † means Algorithm <sup>2</sup> needs SMT solving, ‡ means affineness is disproved via testing, ✗ means nonaffineness, and Algorithm 2+B means Algorithm 2+Boolector.


The results are reported in Table 1, where the 2nd–8th rows show the execution time and the last row shows the affine constants if they exist otherwise ✗. We observe that Algorithm 2 significantly outperforms the SMT-based approach on most cases for all the SMT solvers, except for rotli and af (It is not surprising, as they use operations rather than ⊕ and ⊗, thus SMT solving is required). The term rewriting system is often able to compute affine constants *solely* (e.g., expi and Li), and SMT solving is required *only* for computing the affine constants of rotli. By comparing the results of Algorithm 2+Z3-b vs. Z3-b and Algorithm 2+B vs. Boolector on af, we observe that term rewriting is essential as checking normal form—instead of the original constraint—reduces the cost of SMT solving.

#### 7.2 Evaluation for Correctness Verification

To evaluate Algorithm 3, we compare it with a pure SMT-based approach with SMT solvers Z3, CVC5 and Boolector. We also consider several promising general-purpose software verifiers SMACK (with Boogie and Corral engines), SeaHorn, CPAChecker and Symbiotic, and one cryptography-specific verifier CryptoLine (with SMT and CAS solvers), where the verification problem is expressed using assume and assert statements. Those verifiers are configured in two ways: (1) recommended ones in the manual/paper or used in the competition, and (2) by trials of different configurations and selecting the optimal one. Specifically:


The benchmark comprises five different masked programs sec\_mult for finitefield multiplication over GF(2<sup>8</sup>) by varying masking order d = 0, 1, 2, 3, where the d = 0 means the program is unmasked. We note that sec\_mult in [8] is only available for masking order d ≥ 2.



The results are shown in Table 2. We can observe that FISCHER is significantly more efficient than the others, and is able to prove all the cases using our term rewriting system *solely* (i.e., without random testing or SMT solving). With the increase of masking order d, almost all the other tools failed. Both CryptoLine (with the CAS solver) and CPAChecker fail to verify any of the cases due to the non-linear operations involved in sec\_mult. SMACK with Corral engine produces two false positives (marked by in Table 2). These results suggest that dedicated verification approaches are required for proving the correctness of masked programs.

#### 7.3 Scalability of FISCHER

To evaluate the scalability of FISCHER, we verify different versions of sec\_mult and masked procedures sec\_aes\_sbox (resp. sec\_present\_sbox) of S-Boxes used in AES [58] (resp. PRESENT [19]) with varying masking order d. Since it is known that refresh\_masks in [58] is vulnerable when d ≥ 4 [24], a fixed version RefreshM [7] is used in all the S-Boxes (except that when sec\_mult is taken from [8] its own version is used). We note that sec\_present\_sbox uses the affine transformations L1, L3, L5, L7, exp2 and exp4, while sec\_aes\_sbox uses the affine transformations af, exp2, exp4 and exp16.

The results are reported in Table 3. All those benchmarks are proved using our term rewriting system solely except for the three incorrect ones marked by . FISCHER scales up to masking order of 100 or even 200 for sec\_mult, which is remarkable. FISCHER also scales up to masking order of 30 or even 40 for sec\_present\_sbox. However, it is less scalable on sec\_aes\_sbox, as it computes the multiplicative inverse x<sup>254</sup> on shares, and the size of the term encoding the equivalence problem explodes with the increase of the masking order. Furthermore, to better demonstrate the effectiveness of our term writing system in dealing with complicated procedures, we first use Algorithm 2 to derive affine constants on sec\_aes\_sbox with ISW [58] and then directly apply SMT solvers to solve the correctness constraints obtained at Line 9 of Algorithm 3. It takes about 1 s to obtain the result on the first-order masking, while fails to obtain the result within 20 min on the second-order masking.

Table 3. Results on sec\_mult and S-Boxes, where T.O. means time out (20 min), and means that the program is *incorrect*.



Table 4. Results on sec\_mult and S-Boxes for HPC, DOM and CMS.

A highlight of our findings is that FISCHER reports that sec\_mult from [8] and the S-boxes based on this version are incorrect when d = 5. After a careful analysis, we found that indeed it is incorrect for any d ≡ 1 mod 4 (i.e., 5, 9, 13, etc.). This is because [8] parallelizes the multiplication over the entire encodings (i.e., tuples of shares) while the parallelized computation depends on the value of d mod 4. When the reminder is 1, the error occurs.

#### 7.4 Evaluation for More Boolean Masking Schemes

To demonstrate the applicability of FISCHER on a wider range of Boolean masking schemes, we further consider glitch-resistant Boolean masking schemes: HPC1, HPC2 [20], DOM [35] and CMS [57]. We implement the finite-field multiplication sec\_mult using those masking schemes, as well as masked versions of AES S-box and PRESENT S-box. We note that our implementation of DOM sec\_mult is derived from [20], and we only implement the 2nd-order CMS sec\_mult due to the difficulty of implementation. All other experimental settings are the same as in Sect. 7.3.

The results are shown in Table 4. Our term rewriting system *solely* is able to efficiently prove the correctness of finite-field multiplication sec\_mult, masked versions of AES S-box and PRESENT S-box using the glitch-resistant Boolean masking schemes HPC1, HPC2, DOM and CMS. The verification cost of those benchmarks is similar to that of benchmarks using the ISW scheme, demonstrating the applicability of FISCHER for various Boolean masking schemes.


4 38 ms 44 ms 50 ms 62 ms 76 ms 109 ms 155 ms 144 ms 506 ms 1.9s 18s 112s T.O. 93 ms 130 ms 190 ms 676 ms 3.3s 49s 366s 5 39 ms 45 ms 51 ms 66 ms 81 ms 118 ms 168 ms 160 ms 586 ms 2.2s 22s 136s T.O. 109 ms 159 ms 256 ms 1.1s 6.5s 100s 746s

Table 5. Results on sec\_add, sec\_add\_modp and sec\_a2b [17], where T.O. means time out (20 min).

#### 7.5 Evaluation for Arithmetic/Boolean Masking Conversions

To demonstrate a wider applicability of FISCHER other than masked implementations of symmetric cryptography, we further evaluate FISCHER on three key non-linear building blocks for bitsliced, masked implementations of lattice-based post-quantum key encapsulation mechanisms (KEMs [17]). Note that KEMs are a class of encryption techniques designed to secure symmetric cryptographic key material for transmission using asymmetric (public-key) cryptography. We implement the Boolean masked addition modulo 2*<sup>k</sup>* (sec\_add), Boolean masked addition modulo p (sec\_add\_modp) and the arithmetic-to-Boolean masking conversion modulo 2*<sup>k</sup>* (sec\_a2b) for various bit-width k and masking order d, where p is the largest prime number less than 2*<sup>k</sup>*. Note that some bitwise operations (e.g., circular shift) are expressed by affine transformations, and the modulo addition is implemented by the simulation algorithm [17] in our implementations.

The results are reported in Table 5. FISCHER is able to efficiently prove the correctness of these functions with various masking orders (d) and bit-width (k), using the term rewriting system *solely*. With the increase of the bit-width k (resp. masking order d), the verification cost increases more quickly for sec\_add\_modp (resp. sec\_a2b) than for sec\_add. This is because sec\_add\_modp with bit-width k invokes sec\_add three times, two of which have the bit-width k + 1, and the number of calls to sec\_add in sec\_a2b increases with the masking order d though using the same bit-width as sec\_a2b. These results demonstrate the applicability of FISCHER for asymmetric cryptography.

#### 8 Conclusion

We have proposed a term rewriting based approach to proving functional equivalence between masked cryptographic programs and their original unmasked algorithms over GF(2*<sup>n</sup>*). Based on this approach, we have developed a tool FISCHER and carried out extensive experiments on various benchmarks. Our evaluation confirms the effectiveness, efficiency and applicability of our approach.

For future work, it would be interesting to further investigate the theoretical properties of the term rewriting system. Moreover, we believe the term rewriting approach extended with more operations may have a greater potential in verifying more general cryptographic programs, e.g., those from the standard software library such as OpenSSL.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Automatic Program Instrumentation for Automatic Verification

Jesper Amilon1(B) , Zafer Esen2(B) , Dilian Gurov1(B) , Christian Lidström1(B) , and Philipp Rümmer2,3(B)

<sup>1</sup> KTH Royal Institute of Technology, Stockholm, Sweden {jamilon,dilian,clid}@kth.se <sup>2</sup> Uppsala University, Uppsala, Sweden {zafer.esen,philipp.ruemmer}@it.uu.se <sup>3</sup> University of Regensburg, Regensburg, Germany

Abstract. In deductive verification and software model checking, dealing with certain specification language constructs can be problematic when the back-end solver is not sufficiently powerful or lacks the required theories. One way to deal with this is to transform, for verification purposes, the program to an equivalent one not using the problematic constructs, and to reason about its correctness instead. In this paper, we propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches, has a clear formal correctness criterion, can be applied automatically, and can transfer back witnesses and counterexamples. We illustrate our approach on the automated verification of programs that involve quantification and aggregation operations over arrays, such as the maximum value or sum of the elements in a given segment of the array, which are known to be difficult to reason about automatically. We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation, and evaluate it on example programs, including SV-COMP programs.

## 1 Introduction

Overview. Program specifications are often written in expressive, high-level languages: for instance, in temporal logic [14], in first-order logic with quantifiers [28], in separation logic [40], or in specification languages that provide extended quantifiers for computing the sum or maximum value of array elements [7,33]. Specifications commonly also use a rich set of theories; for instance, specifications could be written using full Peano arithmetic, as opposed to bitvectors or linear arithmetic used in the program. Rich specification languages make it possible to express intended program behaviour in a succinct form, and as a result reduce the likelihood of mistakes being introduced in specifications.

There is a gap, however, between the languages used in specifications and the input languages of automatic verification tools. Software model checkers, in particular, usually require specifications to be expressed using program assertions and Boolean program expressions, and do not directly support any of the more sophisticated language features mentioned. In fact, rich specification languages are challenging to handle in automatic verification, since satisfiability checks can become undecidable (i.e., it is no longer decidable whether assertion failures can occur on a program path), and techniques for inferring program invariants usually focus on simple specifications only.

To bridge this gap, it is common practice to *encode* high-level specifications in the low-level assertion languages understood by the tools. For instance, temporal properties can be translated to Büchi automata, and added to programs using ghost variables and assertions [14]; quantified properties can be replaced with non-determinism, ghost variables, or loops [13,37]; sets used to specify the absence of data-races can be represented using non-deterministically initialized variables [18]. By adding ghost variables and bespoke ghost code to programs [22], many specifications can be made effectively checkable.

The translation of specifications to assertions or ghost code is today largely designed, or even carried out, by hand. This is an error-prone process, and for complex specifications and programs it is very hard to ensure that the low-level encoding of a specification faithfully models the original high-level properties to be checked. Mistakes have been found even in industrial, very carefully developed specifications [39], and can result in assertions that are vacuously satisfied by any program. Naturally, the manual translation of specifications also tends to be an ad-hoc process that does not easily generalise to other specifications.

This paper proposes the first general framework to automate the translation of rich program specifications to simpler program assertions, using a process called *instrumentation.* Our approach models the semantics of specific complex operations using program-independent *instrumentation operators,* consisting of (manually designed) rewriting rules that define how the evaluation of the operator can be achieved using simpler program statements and ghost variables. The instrumentation approach is flexible enough to cover a wide range of different operators, including operators that are best handled by weaving their evaluation into the program to be analysed. While instrumentation operators are manually written, their application to programs can be performed in a fully automatic way by means of a search procedure. The soundness of an instrumentation operator is shown formally, once and for all, by providing an *instrumentation invariant* that ensures that the operator can never be used to show correctness of an incorrect program.

Additional instrumentation operator definitions, correctness proofs, and detailed evaluation results can be found in the accompanying extended report [4].

Motivating Example. We illustrate our approach on the computation of *triangular numbers* s<sup>N</sup> = (N<sup>2</sup> + N)/2, see left-hand side of Fig. 1. For reasons of presentation, the program has been normalised by representing the square N\*N using an auxiliary variable NN. While mathematically simple, verifying the postcondition s == (NN+N)/2 in the program turns out to be challenging even for state-of-the-art model checkers, as such tools are usually thrown off course by

Fig. 1. Program computing triangular numbers, and its instrumented counterpart

the non-linear term N\*N. Computing the value of NN by adding a loop in line 16 is not sufficient for most tools either, since the program in any case requires a non-linear invariant 0 <= i <= N && 2\*s == i\*i + i to be derived for the loop in lines 4–12.

The insight needed to elegantly verify the program is that the value i\*i can be tracked during the program execution using a ghost variable x\_sq. For this, the program is instrumented to maintain the relationship x\_sq == i\*i: initially, i == x\_sq == 0, and each time the value of i is modified, also the variable x\_sq is updated accordingly. With the value x\_sq == i\*i available, both the loop invariant and the post-condition turn into formulas over linear arithmetic, and program verification becomes largely straightforward. The challenge, of course, is to discover this program transformation automatically, and to guarantee the soundness of the process. For the example, the transformed program is shown on the right-hand side of Fig. 1, and discussed in the next paragraphs.

Our method splits the process of program instrumentation into two parts: (i) choosing an *instrumentation operator,* which is defined manually, designed to be program-independent, and induces a space of possible program transformations; and (ii) carrying out an automatic *application strategy* to find, among the possible program transformations, one that enables verification of a program.

An instrumentation operator for tracking squares is shown in Fig. 2, and consists of the declaration of two ghost variables (x\_sq, x\_shad) with initial value 0, respectively; four rules for rewriting program statements; and the instrumentation invariant witnessing correctness of the operator. The rewrite rules use formal variables x, y, which can represent arbitrary variables in the program (i, N, NN). An application of the operator to a program will declare the ghost variables in the form of global variables, and then rewrite some chosen set of program statements using the provided rules. Since the statements to be rewritten can


Fig. 2. Definition of an instrumentation operator <sup>Ω</sup>*square* for tracking squares

be chosen arbitrarily, and since moreover multiple rewrite rules might apply to some statements, rewriting can result in many different variants of a program. In the example, we rewrite the assignments C, D of the left-hand side program using rewrite rules (R2) and (R4), respectively, resulting in the instrumented and correct program on the right-hand side.

Instrumentation operators are designed to be *sound,* which means that rewriting a wrong selection of program statements might lead to an instrumented program that cannot be verified, i.e., in which assertions might fail, but instrumentation can never turn an incorrect source program into a correct instrumented program. This opens up the possibility to systematically search for the right program instrumentation. We propose a counterexample-guided algorithm for this purpose, which starts from some arbitrarily chosen instrumentation, checks whether the instrumented program can be verified, and otherwise attempts to fix the instrumentation using a refinement loop. As soon as a verifiable instrumented program has been found, the search can stop and the correctness of the original program has been shown.

The concept of instrumentation invariants is essential for guaranteeing soundness of an operator. Instrumentation invariants are formulas that can (only) refer to the ghost variables introduced by an instrumentation operator, and are formulated in such a way that they hold *in every reachable state of every instrumented program.* To maintain their invariants, instrumentation operators use shadow variables that duplicate the values of program variables. In the operator in Fig. 2, the purpose of the shadow variable x\_shad is to reproduce the value of the program variable whose square is tracked (i). The rewriting rules introduce guards to detect incorrect instrumentation (the assertions in (R2), (R3), (R4)), which are particular cases in which some update of a relevant variable was missed and not correctly instrumented. The use of shadow variables and guards make instrumentation operators very flexible; in our example, note that instrumentation tracks the square of the value of i during the loop, but is also used later to simplify the expression N\*N. This is possible because of the instrumentation invariant and because i == N holds after termination of the loop, which is verified through the assertion introduced in line 14.

Contributions and Outline. The operator shown in Fig. 2 is simple, and does not apply to all programs, but it can easily be generalised to other arithmetic operators and program statements. The framework presented in this paper provides the foundation for developing a (extendable) library of formally verified instrumentation operators. In the scope of this paper, we focus on two specification constructs that have been identified as particularly challenging in the literature: existential and universal *quantifiers* over arrays, and *aggregation* (or *extended quantifiers*), which includes computing the sum or maximum value of elements in an array. Our experiments on benchmarks taken from the SV-COMP [8] show that even relatively simple instrumentation operators can significantly extend the capabilities of a software model checker, and often make the automatic verification of otherwise hard specifications easy.

The contributions of the paper are: (i) a general *framework for program instrumentation*, which defines a space of program transformations that work by rewriting individual statements (Sect. 2); (ii) an application strategy *search algorithm* in this space, for a given program (Sect. 3); (iii) two *instantiations* of the framework—one for instrumentation operators to handle specifications with *quantifiers* (Sect. 4.1), and one for *extended quantifiers* (Sect. 4.2); (iv) machinechecked proofs of the correctness of the instrumentation operators for quantifiers ∀ and the extended quantifier \max; (v) a new *verification tool*, Mono-Cera, that is tailored to the verification of programs with aggregation; and (vi) an *evaluation* of our method and tool on a set of examples, including such from SV-COMP [8] (Sect. 5).

### 2 Instrumentation Framework

The next two sections formally introduce the instrumentation framework. Later, we instantiate the framework for quantification and aggregation over arrays. We split the instrumentation process into two parts:


Table 1. Syntax of the core language.

```
Type ::= Int | Bool | Array -
                              Type
Expr ::= -
             DecimalNumber | true | false | -
                                            Variable
        | -
             Expr == -
                       Expr|-
                               Expr <= -
                                        Expr | !-
                                                 Expr|-
                                                         Expr && -
                                                                   Expr
        | -
             Expr || -
                       Expr|-
                               Expr + -
                                       Expr|-
                                               Expr * -
                                                        Expr
        | select(-
                     Expr,-
                            Expr) | store(-
                                           Expr,-
                                                  Expr,-
                                                          Expr)
Prog ::= skip | -
                   Variable = -
                               Expr|-
                                       Prog; -
                                              Prog | while (-
                                                             Expr) -
                                                                     Prog
        | assert(-
                     Expr) | assume(-
                                     Expr) | if (-
                                                  Expr) -
                                                          Prog else -
                                                                     Prog
```
Even though instrumentation operators are non-deterministic, we shall guarantee their *soundness:* if the original program has a failing assertion, so will any instrumented program, regardless of the chosen application strategy; that is, instrumentation of an incorrect program will never yield a correct program.

We shall also guarantee a weak form of *completeness*, to the effect that if an assertion that has not been added to the program by the instrumentation fails in the instrumented program, then it will also fail in the original program. As a result, any counterexample (for such an assertion) produced when verifying the instrumented program can be transformed into a counterexample for the original program.

#### 2.1 The Core Language

While our implementation works on programs represented as constrained Horn clauses [12], i.e., is language-agnostic, for readability purposes we present our approach in the setting of an imperative core programming language with datatypes for unbounded integers, Booleans, and arrays, and assert and assume statements. The language is deliberately kept simple, but is still close to standard C. The main exception is the semantics of arrays: they are defined here to be *functional* and therefore represent a value type. Arrays have integers as index type and are unbounded, and their signature and semantics are otherwise borrowed from the SMT-LIB theory of extensional arrays [6]:

– *Reading* the value of an array a at index i: select(a, i);

– *Updating* an array a at index i with a new value x: store(a, i, x).

The complete syntax of the core language is given in Table 1. Programs are written using a vocabulary X of typed program variables; the typing rules of the language are given in [4]. As syntactic sugar, we sometimes write a[i] instead of select(a, i), and a[i] = x instead of a = store(a, i, x).

We denote by D<sup>σ</sup> the domain of a program type σ. The domain of an array type Array <sup>σ</sup> is the set of functions <sup>f</sup> : <sup>Z</sup> <sup>→</sup> <sup>D</sup>σ.

*Semantics.* We assume the Flanagan-Saxe *extended execution model* of programs with assume and assert statements (see, e.g., [23]), in which executing an assert statement with an argument that evaluates to false *fails*, i.e., terminates abnormally. An assume statement with an argument that evaluates to false has the same semantics as a non-terminating loop. Partial correctness properties of programs are expressed using *Hoare triples* {*Pre*} P {*Post*}, which state that an execution of P, starting in a state satisfying *Pre*, never fails, and may only terminate in states that satisfy *Post*. As usual, a program P is considered *(partially) correct* if the Hoare triple {*true*} P {*true*} holds.

The evaluation of program expressions is modelled using a function -·<sup>s</sup> that maps program expressions t of type σ to their value t<sup>s</sup> ∈ D<sup>σ</sup> in the state s.

#### 2.2 Instrumentation Operators

An instrumentation operator defines schemes to rewrite programs while preserving the meaning of the existing program assertions. Without loss of generality, we restrict program rewriting to assignment statements. Instrumentation can introduce *ghost state* by adding arbitrary fresh variables to the program. The main part of an instrumentation consists of *rewrite rules*, which are schematic rules r = t s, where the meta-variable r ranges over program variables, t is an expression that can contain further meta-variables, and s is a schematic program in which the meta-variables from r = t might occur. Any assignment that matches r = t can be rewritten to s.

Definition 1 (Instrumentation Operator). *An* instrumentation operator *is a tuple* Ω = (G,R, I)*, where:*


*The rewrite rules* R *and the invariant* I *must adhere to the following constraints:*

	- *(a)* s *terminates (normally or abnormally) for pre-states satisfying* I*, assuming that all meta-variables are ordinary program variables.*
	- *(b)* s *does not assign to variables other than* r *or the ghost variables* x1,..., xk*.*
	- *(c)* s *preserves the instrumentation invariant:* {I} s {I}*, where* s *is* s *with every assert(*e*) statement replaced by an assume(*e*) statement.*
	- *(d)* s *preserves the semantics of the assignment* r *=* t*: the Hoare triple* {I} *z =* t*;* s {*z* = r}*, where z is a fresh variable, holds.*

The conditions imposed in the definition ensure that all instrumentations are *correct*, in the sense that they are sound and weakly complete, as we show below. In particular, the instrumentation invariant guarantees that the rewrites of program statements are *semantics-preserving* w.r.t. the original program, and thus, the execution of any assert statement of the original program has the same effect before and after instrumentation. Observe that the conditions can themselves be deductively verified to hold for each concrete instrumentation operator, and that this check is *independent* of the programs to be instrumented, so that an instrumentation operator can be proven correct once and for all.

An instrumentation operator Ω does itself not define which occurrences of program statements are to be rewritten, but only how they are rewritten. Given a program P and the operator Ω, an instrumented program P is derived by carrying out the following two steps: (i) variables x1,..., x<sup>k</sup> and the assignments x<sup>1</sup> = *init*1; ... ; x<sup>k</sup> = *init*<sup>k</sup> are added at the beginning of the program, and (ii) some of the assignments in P, to which a rewriting rule r = t s in Ω is applicable, are replaced by s, substituting meta-variables with the actual terms occurring in the assignment. We denote by Ω(P) the set of all instrumented programs P that can be derived in this way. An example of an instrumentation operator and its application was shown Fig. 1 and Fig. 2.

#### 2.3 Instrumentation Correctness

Verification of an instrumented program produces one of two possible results: a *witness* if verification is successful, or a *counterexample* otherwise. A witness consists of the inductive invariants needed to verify the program, and is presented in the context of the programming language: it is translated back from the back-end theory used by the verification tool, and is a formula over the program variables and the ghost variables added during instrumentation. A counterexample is an execution trace leading to a failing assertion.

Definition 2 (Soundness). *An instrumentation operator* Ω *is called* sound *if for every program* P *and instrumented program* P ∈ Ω(P)*, whenever there is an execution of* P *where some assert statement fails, then there also is an execution of* P *where some assert statement fails.*

Equivalently, existence of a witness for an instrumented program entails existence of a witness for the original program, in the form of a set of inductive invariants solely over the program variables. Notably, because of the semanticspreserving nature of the rewrites under the instrumentation invariant, a witness for the original program can be derived from one for the instrumented program. One such back-translation is to add the instrumentation invariant as a conjunct to the original witness, and to existentially quantify over the ghost variables.

*Example.* To illustrate the back-translation, we return to the instrumentation operator from Fig. 2 and the example program from Fig. 1. The witness produced by our verification tool in this case is the formula:

$$\mathbf{i} = \mathbf{x\_{\\_s}} \mathbf{s} \mathbf{h} \mathbf{a} \wedge \mathbf{x\_{\\_s}} \mathbf{s} \mathbf{q} + \mathbf{x\_{\\_s}} \mathbf{s} \mathbf{h} \mathbf{a} = 2s \wedge \mathbb{N} \ge \mathbf{i} \wedge \mathbb{N} \ge \mathbf{1} \wedge 2\mathbf{s} \ge \mathbf{i} \wedge \mathbf{i} \ge 0$$

After conjoining the instrumentation invariant x\_sq = x\_shad<sup>2</sup> and existentially quantifying over the involved ghost variables, we obtain an inductive invariant that is sufficient to verify the original program:

$$\exists x\_{\text{sq}}, x\_{\text{shad}}. \ (\mathbf{i} = x\_{\text{shad}} \land x\_{\text{sq}} + x\_{\text{shad}} = 2s \land \newline \qquad \mathsf{N} \ge \mathbf{i} \land \mathsf{2s} \ge \mathbf{i} \land \mathsf{i} \ge \mathbf{0} \land x\_{\text{sq}} = x\_{\text{shad}}^2)$$

Definition 3 (Weak Completeness). *The operator* Ω *is called* weakly complete *if for every program* P *and instrumented program* P ∈ Ω(P)*, whenever an assert statement that has not been added to the program by the instrumentation fails in the instrumented program* P *, then it also fails in the original program* P*.*

Similarly to the back-translation of invariants, when verification fails, counterexamples for assertions of the original program, found during verification of the instrumented program, can be translated back to counterexamples for the original program. We thus obtain the following result.

Theorem 1 (Soundness and weak completeness). *Every instrumentation operator* Ω *is sound and weakly complete.*

*Proof.* Let Ω = (G,R, I) be an instrumentation operator. Since I is a formula over ghost variables only, which holds initially and is preserved by all rewrites, I is an invariant of the fully instrumented program. This entails that rewrites of assignments are semantics-preserving. Furthermore, since instrumentation code only assigns to ghost variables or to r (i.e., the left-hand side of the original statement), program variables have the same valuation in the instrumented program as in the original one. Furthermore, since all rewrites are terminating under I, the instrumented program will terminate if and only if the original program does.

In the case when verification succeeds, and a witness is produced, weak completeness follows vacuously. A witness consists of the inductive invariants sufficient to verify the instrumented program. Thus, they are also sufficient to verify the assertions existing in the original program, since assertions are not rewritten and all program variables have the same valuation in the original and the instrumented programs. Since a witness for the instrumented program can be back-translated to a witness for the original program, any failing assertion in the original program must also fail after instrumentation, and Ω is therefore sound.

In the case when verification fails, soundness follows vacuously, and if the failing assertion was added during instrumentation, also weak completeness follows. If the assertion existed in the original program, since such assertions are not rewritten, and since program variables have the same valuation in the instrumented program as in the original program, then any counterexample for the instrumented program is also a counterexample for the original program, when projected onto the program variables. 

```
Input: Program P; statements S; instrumentation space R;
        oracle IsCorrect.
  Result: Instrumentation r ∈ R with IsCorrect(Pr); Incorrect; or
         Inconclusive.
1 begin
2 Cand ← R;
3 while Cand = ∅ do
4 pick r ∈ Cand;
5 if IsCorrect(Pr) then
6 return r;
7 else
8 cex ← counterexample path for Pr;
9 if failing assertion in cex also exists in P then
             /* cex is also a counterexample for P */
10 return Incorrect;
11 else
             /* instrumentation on cex may have been incorrect
                */
12 C ← {p ∈ C | insr(p) occurs on cex};
13 Cand ← Cand \ {r ∈ Cand | r(s) = r
                                           (s) for all p ∈ C
                                                        };
14 end
15 end
16 end
17 return Inconclusive;
18 end
```
Algorithm 1: Counterexample-guided instrumentation search

## 3 Instrumentation Application Strategies

We will now define a counterexample-guided search procedure to discover applications of instrumentation operators that make it possible to verify a program.

For our algorithm, we assume that we are given an oracle *IsCorrect* that is able to check the correctness of programs after instrumentation. Such an oracle could be approximated, for instance, using a software model checker. The oracle is free to ignore the complex functions we are trying to eliminate by instrumentation; for instance, in Fig. 1, the oracle can over-approximate the term N\*N by assuming that it can have any value. We further assume that C is the set of control points of a program P corresponding to the statements to which a given set of instrumentation operators can be applied. For each control point p ∈ C, let Q(p) be the set of rewrite rules applicable to the statement at p, including also a distinguished value ⊥ that expresses that p is not modified. For the program in Fig. 1, for instance, the choices could be defined by <sup>Q</sup>(A) = <sup>Q</sup>(B) = {(R1), ⊥}, <sup>Q</sup>(C) = {(R2), ⊥}, and <sup>Q</sup>(D) = {(R4), ⊥}, referring to the rules in Fig. 2. Any function r : C → - <sup>p</sup>∈<sup>C</sup> <sup>Q</sup>(p) with <sup>r</sup>(p) <sup>∈</sup> <sup>Q</sup>(p) Table 2. Extension of the core language with quantified expressions.

$$\begin{aligned} \langle Expr \rangle &::= \{ \lambda \{ \langle Variable \rangle, \langle Variable \rangle \} \dots \langle Expr \rangle, \langle Expr \rangle \} \\ &= \{ \mathtt{orals1} \{ \langle Expr \rangle, \langle Expr \rangle, \langle Expr \rangle, \lambda \{ \langle Variable \rangle, \langle Variable \rangle \}, \langle Expr \rangle \} \\ &= \mathtt{ex1} \mathtt{tss} \{ \langle Expr \rangle, \langle Expr \rangle, \langle Expr \rangle, \lambda \{ \langle Variable \rangle, \langle Variable \rangle \}, \langle Expr \rangle \} \end{aligned} $$

will then define one possible program instrumentation. We will denote the set of well-typed functions C → - <sup>p</sup>∈<sup>C</sup> <sup>Q</sup>(p) by <sup>R</sup>, and the program obtained by rewriting P according to r ∈ R by Pr. We further denote the control point in P<sup>r</sup> corresponding to some p ∈ C in P by *ins*r(p).

Algorithm 1 presents our algorithm to search for instrumentations that are sufficient to verify a program P. The algorithm maintains a set *Cand* ⊆ R of remaining ways to instrument P, and in each loop considers one of the remaining elements r ∈ *Cand* (line 4). If the oracle manages to verify P<sup>r</sup> in line 5, due to soundness of instrumentation the correctness of P has been shown (line 6); if P<sup>r</sup> is incorrect, there has to be a counterexample ending with a failing assertion (line 8). There are two possible causes of assertion failures: if the failing assertion in P<sup>r</sup> already existed in P, then due to the weak completeness of instrumentation also P has to be incorrect (line 10). Otherwise, the program instrumentation has to be refined, and for this from *Cand* we remove all instrumentations r that agree with r regarding the instrumentation of the statements occurring in the counterexample (line 13).

Since R is finite, and at least one element of *Cand* is eliminated in each iteration, the refinement loop terminates. The set *Cand* can be exponentially big, however, and therefore should be represented symbolically (using BDDs, or using an SMT solver managing the set of blocking constraints from line 13).

We can observe soundness and completeness of the algorithm w.r.t. the considered instrumentation operators (proof in [4]):

Lemma 1 (Correctness of *Algorithm 1*). *If Algorithm 1 returns an instrumentation* r ∈ R*, then* P<sup>r</sup> *and* P *are correct. If Algorithm 1 returns Incorrect, then* P *is incorrect. If there is* r ∈ R *such that* P<sup>r</sup> *is correct, then Algorithm 1 will return* r *such that* P<sup>r</sup>*is correct.*

#### 4 Instrumentation Operators for Arrays

#### 4.1 Instrumentation Operators for Quantification over Arrays

To handle quantifiers in a programming setting, we extend the language defined in Table 1 by adding quantified expressions over arrays, as shown in Table 2. As seen, we also extend the language with a lambda expression over two variables. The rationale for this is that many quantified properties can be expressed as a binary predicate with the first argument corresponding to the value of an element and the second to the index. This allows us to express properties over both the value of an element and its index. For example, we can express that each element

should be equal to its index, as is done in the example program in Fig. 3. In the program, each element in the array is assigned the value corresponding to its index, after which it is asserted that this property indeed holds.

Using P(x0,i0) as shorthand for (λ(x,i).P)(x0,i0), the new expressions can be defined formally as:

$$\begin{aligned} \left[ \mathsf{forall} \, \mathsf{l} \, \mathsf{a}, \, 1, \, \mathsf{u}, \, \lambda \, \mathsf{(x,i)} \, . \mathsf{P} \right] \mathbb{I}\_s &= \, \forall \mathbf{i} \in [\mathsf{1}, \mathsf{u}). \left[ \mathsf{P} \langle \mathsf{a} \, \mathsf{i} \, \mathsf{i} \, \mathsf{i} \, \mathsf{i} \rangle \right]\_s \\ \left[ \mathsf{exists} \, \mathsf{(a,1, u, \, \lambda \, \mathsf{(x,i)} \, . \mathsf{P} \right] \mathbb{I}\_s &= \, \exists \mathbf{i} \in [\mathsf{1}, \mathsf{u}). \left[ \mathsf{P} \langle \mathsf{a} \, \mathsf{i} \, \mathsf{i} \, \mathsf{i} \, \mathsf{i} \rangle \right]\_s \end{aligned} $$

Note that the types of x and a must be compatible and P be a Boolean-valued expression.

To handle programs such as the one in Fig. 3, we turn to the instrumentation framework outlined in Sect. 2.2, which we use here to define an instrumentation operator for universal quantification. The general idea is to instrument programs with a ghost variable, tracking if some predicate holds for all elements in an interval of the array, with shadow variables representing the tracked array, and the bounds of the interval. Naturally, an instrumentation operator for existential quantification can be defined in a similar fashion. For simplicity, we shall assume a *normal form* of programs, into which every program can be rewritten by introducing additional variables. In the normal form, store, select and forall can only occur in simple assignment statements. For example, stores are restricted to occur in statements of the form: a' = store(a, i, x).

Over such normalised programs, and for a universally quantified expression forall(a, l, u, λ(x,i)(P)), we define the instrumentation operator Ω∀,P = (G∀,*<sup>P</sup>* , R∀,*<sup>P</sup>* , I∀,*<sup>P</sup>* ) as shown in Fig. 4 over four ghost variables. The array over which quantification occurs is tracked by qu\_ar and the variables qu\_lo, qu\_hi represent the bounds of the currently tracked interval. The result of the quantified expression is tracked by qu\_P, whose value is *true* iff P holds for all elements in a in the interval [qu\_lo, qu\_hi). The rewrite rules for stores, selects and assignments of universally quantified expressions are then defined as follows. For stores, the first if-branch resets the tracking to the one element interval [i, i + 1) when accessing elements far outside of the currently tracked interval, or if we are tracking the empty interval (as is the case at initialisation). If an access occurs immediately adjacent to the currently tracked interval

Fig. 4. Definition of an instrumentation operator for universal quantification

(e.g., if <sup>i</sup> <sup>=</sup> qu\_lo <sup>−</sup> <sup>1</sup>), then that element is added to the tracked interval, and the value of qu\_P is updated to also account for the value of P at index i. If instead the access is within the tracked interval, then we either reset the interval (if qu\_P is false) or keep the interval unchanged (if qu\_P is true). Rewrites of selects are similar to stores, except tracking does not need to be reset when reading inside the tracked interval. For rewrites of quantified expressions, if the quantified interval is empty, b is assigned true. Otherwise, assertions check that the tracked interval matches the quantified interval before assigning t to qu\_P. If qu\_P is true, then it is sufficient that quantification occurs over a sub-interval of the tracked interval, and vice versa if qu\_P is false.

The result of applying Ω∀,P to the program in Fig. 3 is shown in [4]. As exhibited by the experiments in Sect. 5, the resulting program is in many cases easier to verify by state-of-the-art verification tools. Note that the instrumentation operator defined is only one possibility among many. For example, one could track several ranges simultaneously over the array in question, or also track the index of some element in the array over which P holds, or make different choices on stores outside of the tracked interval.

The following lemma establishes correctness of the instrumentation operator. The proof can be found in [4].

Lemma 2 (Correctness of Ω∀,*<sup>P</sup>* ). Ω∀,*<sup>P</sup> is an instrumentation operator, i.e., it adheres to the constraints imposed in Definition 1.*

#### 4.2 Instrumentation Operators for Aggregation over Arrays

We now turn to the verification of safety properties with *aggregation.* As examples of aggregation, we consider in particular the operators \sum and \max, calculating the sum and maximum value of an array, respectively. Aggregation is supported in the form of *extended quantifiers* in the specification languages JML [33] and ACSL [7], and is frequently needed for the specification of functional correctness properties. Although commonly used, most verification tools do not support aggregation, so that properties involving aggregation have to be manually rewritten using standard quantifiers, pure recursive functions, or ghost code involving loops. This reduction step is error-prone, and represents an additional complication for automatic verification approaches, but can be handled elegantly using the instrumentation framework. For generality, we formalise aggregation over arrays with the help of monoid homomorphisms.

Definition 4 (Monoid). *A* monoid *is a structure* (M, ◦, e) *consisting of a nonempty set* M*, a binary associative operation* ◦ *on* M*, and a neutral element* e ∈ M*. A* monoid *is* commutative *if* ◦ *is commutative. A monoid is* cancellative *if* x ◦ y = x ◦ z *implies* y = z*, and* y ◦ x = z ◦ x *implies* y = z*, for all* x, y, z ∈ M*.*

For aggregation, we model finite intervals of arrays using the cancellative monoid (D∗, ·, ) of finite sequences over some data domain D. The concatenation operator · is non-commutative.

Definition 5 (Monoid Homomorphism). *A* monoid homomorphism *is a function* h : M<sup>1</sup> → M<sup>2</sup> *between monoids* (M1, ◦1, e1) *and* (M2, ◦2, e2) *with the properties* h(x ◦<sup>1</sup> y) = h(x) ◦<sup>2</sup> h(y) *and* h(e1) = e2*.*

Ordinary quantifiers can be modelled as homormorphisms D<sup>∗</sup> → B, so that the instrumentation in this section strictly generalizes Sect. 4.1. A second classical example is the computation of the *maximum* (similarly, *minimum*) value in a sequence. For the domain of integers, the natural monoid to use is the algebra (Z−∞, max, −∞) of integers extended with −∞, <sup>1</sup> and the homomorphism hmax is generated by mapping singleton sequences n to the value n. A

<sup>1</sup> For machine integers, −∞ could be replaced with INT\_MIN.

third example is the computation of the element *sum* of an integer sequence, corresponding to the monoid (*Z*, +, 0) and the homomorphism hsum. Similarly, the *number of occurrences* of some element can be computed. The considered monoid in the last two cases of aggregation is even cancellative.

Programming Language with Aggregation. We extend our core programming language with expressions aggregateM,h(*Expr* ,*Expr* ,*Expr* ), and use monoid homomorphisms to formalise them. Recall that we denote by D<sup>σ</sup> the domain of a program type σ.

Definition 6. *Let Array* σ *be an array type,* σ<sup>M</sup> *a program type,* M *a commutative monoid that is a subset of* D<sup>σ</sup>*<sup>M</sup> , and* h : D<sup>∗</sup> <sup>σ</sup> → M *a monoid homomorphism. Let furthermore ar be an expression of type Array* σ*, and* l *and* u *integer expressions. Then, aggregate*M,h*(ar,*l*,*u*) is an expression of type* σM*, with semantics defined by:*

$$\begin{aligned} \left( \mathsf{aggreagate}\_{M,h} \langle ar, l, u \rangle \right)\_s &= h \left( \langle \left[ ar \right] \rangle\_s \langle \left[ l \right] \rangle\_s \right) \left( \left[ l \right]\_s + 1 \right), \dots, \left[ ar \right]\_s \left( \left[ u \right]\_s - 1 \right) \rangle \end{aligned}$$

Intuitively, the expression aggregateM,h(*ar*,l,u) denotes the result of applying the homomorphism h to the slice *ar* [l .. u − 1] of the array *ar* . As a convention, in case u<l we assume that the result of aggregate is h(). As with array accesses, we assume also that aggregate only occurs in normalised statements of the form t = aggregateM,h(*ar*,l,u).

In our examples, we use derived operations as found in ACSL: \max as shorthand notation for aggregate(Z−∞,max,−∞),hmax <sup>2</sup>, and \sum as short-hand notation for aggregate(Z,+,0),hsum .

An Instrumentation Operator for Maximum. For \max, an operator Ωmax = (G*max* , R*max* , I*max* ) can be defined similarly to the operator Ω∀,P from Sect. 4.1, in that the maximum value in a particular interval of the array is tracked. One key difference is that an extra ghost variable ag\_max\_idx is added to track an array index where the maximum value of the array interval is stored, in order to not have to reset tracking on every store inside of the tracked interval. A complete definition is proposed in [4].

An Instrumentation Operator for Sum. Cancellative aggregation is aggregation based on a cancellative monoid. Cancellative aggregation makes it possible to track aggregate values faithfully even when storing *inside* of the tracked interval, unlike \max and universal quantification. An example of a cancellative operator is the aggregate \sum .

The instrumentation operator Ωsum = (G*sum*, R*sum*, I*sum*) is defined in Fig. 5. The instrumentation code tracks the sum of values in the interval, and

<sup>2</sup> With a slight abuse of the framework, we assume that Z−∞ is represented by the program type Int, mapping −∞ to some fixed integer number. More elegant solutions are not difficult to devise, but add unnecessary complexity.

Fig. 5. Definition of an instrumentation operator <sup>Ω</sup>*sum* for Sum

when increasing the bounds of the tracked interval, the new values are simply added to the tracked sum. Since \sum is cancellative, when storing inside of the tracked interval, the previous value at the index being written to is first subtracted from the sum, before adding the new value, ensuring that the correct aggregate value is computed. The following correctness result is proved in [4].

Lemma 3. (Correctness of Ω*sum*). Ω*sum is an instrumentation operator, i.e., it adheres to the constraints imposed in Definition 1.*

Deductive Verification of Instrumentation Operators. As stated in Sect. 2.2, instrumentation operators may be verified independently of the programs to be instrumented. The operators described in this paper, i.e. square, universal quantification, maximum, and sum, have been verified in the verification tool Frama-C [15]. The verified instrumentations are adaptations for the C language semantics and execution model. More specifically, the adapted operators assume C native arrays, rather than functional ones.

## 5 Evaluation

#### 5.1 Implementation

To evaluate our instrumentation framework, we have implemented the instrumentation operators for quantifiers and aggregation over arrays. The implementation is done over constrained Horn clauses (CHCs), by adding the rewrite rules defined in Sect. 4 to Eldarica [30], an open-source solver for CHCs. We also implemented the automatic application of the instrumentation operators, largely following Algorithm 1 but with a few minor changes due to the CHC setting. The CHC setting makes our implementation available to various CHC-based verification tools, for instance JayHorn (Java) [32], Korn (C) [19], RustHorn (Rust) [36], SeaHorn (C/LLVM) [26] and TriCera (C) [20].

In order to evaluate our approach at the level of C programs, we extended TriCera, an open-source assertion-based model checker that translates C programs into a set of CHCs and relies on Eldarica as back-end solver. TriCera is extended to parse quantifiers and aggregation operators in its input C programs and to encode them as part of the translation into CHCs. We call the resulting toolchain MonoCera. An artefact that includes MonoCera and the benchmarks is available online [5].

To handle complicated access patterns, for instance a program processing an array from the beginning and end at the same time, the implementation can apply multiple instrumentation operators simultaneously; the number of operators is incremented when Algorithm 1 returns *Inconclusive*.

#### 5.2 Experiments and Comparisons

To assess our implementation, we assembled a test suite and carried out experiments comparing MonoCera with the state-of-the-art C model checkers CPAchecker 2.1.1 [11], SeaHorn 10.0.0 [26] and TriCera 0.2. It should be noted that deductive verification frameworks, such as Dafny and Frama-C, can handle, for example, the program in Fig. 3 if they are provided with a manually written loop invariant; however, since MonoCera relies on automatic techniques for invariant inference, we only benchmark against tools using similar automatic techniques. We also excluded VeriAbs [1], since its licence does not permit its use for scientific evaluation.

The tools were set up, as far as possible, with equivalent configurations; for instance, to use the SMT-LIB theory of arrays [6] in order to model C arrays, and a mathematical (as opposed to machine) semantics of integers. CPAchecker was configured to use k-induction [10], which was the only configuration that worked in our tests using mathematical integers. SeaHorn was run using the default settings. All tests were run on a Linux machine with AMD Opteron 2220 SE @ 2.8 GHz and 6 GB RAM with a timeout of 300 s.


Table 3. Results for MonoCera (Mono), TriCera (Tri), SeaHorn (Sea), and CPAchecker (CPA). For MonoCera, also statistics are given for verification time (s), size of the instrumentation search space, and search iterations.

*Test Suite.* The comparison includes a set of programs calculating properties related to the quantification and aggregation properties over arrays. The benchmarks and verification results are summarised in Table 3. The benchmark suite contains programs ranging between 16 to 117 LOC and is comprised of two parts: (i) 117 programs taken from the SV-COMP repository [9], and (ii) 26 programs crafted by the authors (min: 6, max: 8, sum: 9, forall: 3).

To construct the SV-COMP benchmark set for MonoCera we gathered all test files from the directories prefixed with array or loop, and singled out programs containing some assert statement that could be rewritten using a quantifier or an aggregation operator over a single array. For example, loops

$$\text{for } \langle \text{int } \| = \text{0}; \text{ i } < \mathbb{N}; \text{ i++} \rangle \text{ asserts} \\ \text{(a[i] \Leftarrow = 0)}; \text{)}$$

can be rewritten using forall or max operators. We created a benchmark for each possible rewriting; for instance, in the case of max, by rewriting the loop into assert(\max(a, 0, N) <= 0) . The original benchmarks were used for the evaluation of the other tools, none of which supported (extended) quantifiers.

In (ii), we crafted 9 programs that make use of aggregation or quantifiers, and derived further benchmarks by considering different array sizes (10, 100 and unbounded size); one combination (unbounded array inside a struct) had to be excluded, as it is not valid C. In order to evaluate other tools on our crafted benchmarks, we reversed the process described for the SV-COMP benchmarks and translated the operators into corresponding loop constructs.

*Results.* In Table 3, we present the number of verified programs per instrumentation operator for each tool, as well as further statistics for MonoCera regarding verification times and instrumentation search space. The "Inst. space" column indicates the size of the instrumentation search space (i.e., number of instrumentations producible by applying the non-deterministic instrumentation operator). "Inst. steps" column indicates the number of attempted instrumentations, i.e., number of iterations in the while-loop in Algorithm 1. In our implementation, the check in Algorithm 1 line 5 can time out and cause the check to be repeated at a later time with a greater timeout, which can lead to more iterations than the size of the search space. In [4], we list results per benchmark for each tool.

For the SV-COMP benchmarks, CPAchecker managed to verify 1 program, while SeaHorn and TriCera could not verify any programs. MonoCera verified in total 42 programs from SV-COMP. Regarding the crafted benchmarks, several tools could verify the examples with array size 10. However, when the array size was 100 or unbounded, only MonoCera succeeded.

#### 6 Related Work

It is common practice, in both model checking and deductive verification, to translate high-level specifications to low-level specifications prior to verification (e.g., [13,14,18,37]). Such translations often make use of ghost variables and ghost code, although relatively little systematic research has been done on the required properties of ghost code [22]. The addition of ghost variables to a program for tracking the value of complex expressions also has similarities with the concept of term abstraction in Horn solving [3]. To the best of our knowledge, we are presenting the first general framework for automatic program instrumentation.

A lot of research in *software model checking* considered the handling of standard quantifiers ∀, ∃ over arrays. In the setting of constrained Horn clauses, properties with universal quantifiers can sometimes be reduced to quantifier-free reasoning over non-linear Horn clauses [13,37]. Our approach follows the same philosophy of applying an up-front program transformation, but in a more general setting. Various direct approaches to infer quantified array invariants have been proposed as well: e.g., by extending the IC3 algorithm [27], syntax-guided synthesis [21], learning [24], by solving recurrence equations [29], backward reachability [3], or superposition [25]. To the best of our knowledge, such methods have not been extended to aggregation.

*Deductive verification* tools usually have rich support for quantified specifications, but rely on auxiliary assertions like loop invariants provided by the user, and on SMT solvers or automated theorem provers for quantifier reasoning. Although several deductive verification tools can parse extended quantifiers, few offer support for reasoning about them. Our work is closest to the method for handling comprehension operators in Spec# [35], which relies on code annotations provided by the user, but provides heuristics to automatically verify such annotations. The code instrumentation presented in this paper has similarity with the proof rules in Spec#; the main differences are that our method is based on an upfront program transformation, and that we aim at automatically finding required program invariants, as opposed to only verifying their correctness. The KeY tool provides proof rules similar to the ones in Spec# for some of the JML extended quantifiers [2]; those proof rules can be applied manually to verify human-written invariants. The Frama-C system [15] can parse ACSL extended quantifiers [7], but, to the best of our knowledge, none of the Frama-C plugins can automatically process such quantifiers. Other systems, e.g., Dafny [34], require users to manually define aggregation operators as recursive functions.

In the theory of *algebraic data-types*, several transformation-based approaches have been proposed to verify properties that involve recursive functions or catamorphisms [17,31]. Aggregation over arrays resembles the evaluation of recursive functions over data-types; a major difference is that data-types are more restricted with respect to accessing and updating data than arrays.

Array folds logic (AFL) [16] is a decidable logic in which properties on arrays beyond standard quantification can be expressed: for instance, counting the number of elements with some property. Similar properties can be expressed using automata on data words [41], or in variants of monadic second-order logic [38]. Such languages can be seen as alternative formalisms to aggregation or extended quantifiers; they do not cover, however, all kinds of aggregation we are interested in. Array sums cannot be expressed in AFL or data automata, for instance.

## 7 Conclusion

We have presented a framework for automatic and provably correct program instrumentation, allowing the automatic verification of programs containing certain expressive language constructs, which are not directly supported by the existing automatic verification tools. Our experiments with a prototypical implementation, in the tool MonoCera, show that our method is able to automatically verify a significant number of benchmark programs involving quantification and aggregation over arrays that are beyond the scope of other tools.

There are still various other benchmarks that MonoCera (as well as other tools) cannot verify. We believe that many of those benchmarks are in reach of our method, because of the generality of our approach. Ghost code is known to be a powerful specification mechanism; similarly, in our setting, more powerful instrumentation operators can be easily formulated for specific kinds of programs. In future work, we therefore plan to develop a library of instrumentation operators for different language constructs (including arithmetic operators), non-linear arithmetic, other types of structures with regular access patterns such as binary heaps, and general linked-data structures.

We also plan to refine our method for showing incorrectness of programs more efficiently, as the approach is currently applicable mainly for verifying correctness (experiments in [4]). Another line of work is the establishment of stronger completeness results than the weak completeness result presented here, for specific programming language fragments.

Acknowledgements. This work has been partially funded by the Swedish Vinnova FFI Programme under grant 2021-02519, the Swedish Research Council (VR) under grant 2018-04727, the Swedish Foundation for Strategic Research (SSF) under the project WebSec (Ref. RIT17-0011), and the Wallenberg project UPDATE. We are also grateful for the opportunity to discuss the research at the Dagstuhl Seminar 22451 on "Principles of Contract Languages."

## References


combe, M. (eds.) SEFM 2012. LNCS, vol. 7504, pp. 233–247. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33826-7\_16


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Boolean Abstractions for Realizability Modulo Theories**

Andoni Rodr´ıguez1,2(B) and C´esar S´anchez<sup>1</sup>

<sup>1</sup> IMDEA Software Institute, Madrid, Spain {andoni.rodriguez,cesar.sanchez}@imdea.org <sup>2</sup> Universidad Polit´ecnica de Madrid, Madrid, Spain

**Abstract.** In this paper, we address the problem of the (reactive) realizability of specifications of theories richer than Booleans, including arithmetic theories. Our approach transforms theory specifications into purely Boolean specifications by (1) substituting theory literals by Boolean variables, and (2) computing an additional Boolean requirement that captures the dependencies between the new variables imposed by the literals. The resulting specification can be passed to existing Boolean off-the-shelf realizability tools, and is realizable if and only if the original specification is realizable. The first contribution is a brute-force version of our method, which requires a number of SMT queries that is doubly exponential in the number of input literals. Then, we present a faster method that exploits a nested encoding of the search for the extra requirement and uses SAT solving for faster traversing the search space and uses SMT queries internally. Another contribution is a prototype in Z3-Python. Finally, we report an empirical evaluation using specifications inspired in real industrial cases. To the best of our knowledge, this is the first method that succeeds in non-Boolean LTL realizability.

## **1 Introduction**

Reactive synthesis [30,31] is the problem of automatically producing a system that is guaranteed to model a given temporal specification, where the Boolean variables (i.e., atomic propositions) are split into variables controlled by the environment and variables controlled by the system. Realizability is the related decision problem of deciding whether such a system exists. These problems have been widely studied [17,21], specially in the domain of Linear Temporal Logic (LTL) [29]. Realizability corresponds to infinite games where players alternatively choose the valuations of the Boolean variables they control. The winning condition is extracted from the temporal specification and determines which player wins a given play. A system is realizable if and only if the system player

c The Author(s) 2023

This work was funded in part by the Madrid Regional Gov. Project "S2018/TCS-4339 (BLOQUES-CM)", by PRODIGY Project (TED2021-132464B-I00) funded by MCIN/AEI/10.13039/501100011033/ and the European Union Next Generation EU/PRTR, and by a research grant from Nomadic Labs and the Tezos Foundation.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 305–328, 2023. https://doi.org/10.1007/978-3-031-37709-9\_15

has a winning strategy, i.e., if there is a way to play such that the specification is satisfied in all plays played according to the strategy.

However, in practice, many real and industrial specifications use complex data beyond Boolean atomic propositions, which precludes the direct use of realizability tools. These specifications cannot be written in (propositional) LTL, but instead use literals from a richer domain. We use LTL<sup>T</sup> for the extension of LTL where Boolean atomic propositions can be literals from a (multi-sorted) first-order theory T . The T variables (i.e., non-Boolean) in the specification are again split into those controlled by the system and those controlled by the environment. The resulting realizability problem also corresponds to infinite games, but, in this case, players chose valuations from the domains of T , which may be infinite. Therefore, arenas may be infinite and positions may have infinitely many successors. In this paper, we present a method that transforms a specification that uses data from a theory T into an equi-realizable Boolean specification. The resulting specification can then be processed by an off-the-shelf realizability tool.

The main element of our method is a novel *Boolean abstraction* method, which allows to transform LTL<sup>T</sup> specifications into pure (Boolean) LTL specifications. The method first substitutes all T literals by fresh Boolean variables controlled by the system, and then extends the specification with an additional subformula that constrains the combination values of these variables. This method is described in Sect. 3. The main idea is that, after the environment selects values for its (data) variables, the system responds with values for the variables it controls, which induces a Boolean value for all the literals. The additional formula we compute captures the set of possible valuations of literals and the precise power of each player to produce each valuation.

*Example 1.* Consider the following specification <sup>ϕ</sup> <sup>=</sup> <sup>◻</sup>(R<sup>0</sup> <sup>∧</sup> <sup>R</sup>1), where:

$$R\_0: (x<2) \to \bigcirc(y>1) \qquad\qquad R\_1: (x \ge 2) \to (y$$

where x is a numeric variable that belongs to the environment and y to the system. In the game corresponding to this specification, each player has an infinite number of choices at each time step. For example, in T<sup>Z</sup> (the theory of integers), the environment player chooses an integer for x and the system responds with an integer for y. This induces a valuation of all literals in the formula, which in turn induces (also considering the valuations of the literals at other time instants, according to the temporal operators) a valuation of the full specification.

In this paper, we exploit that, from the point of view of the valuations of the literals, there are only *finitely many* cases and provide a systematic manner to compute these cases. This allows us to reduce a specification into a purely Boolean specification that is equi-realizable. This specification encodes the (finite) set of decisions of the environment, and the (finite) set of reactions of the system.

Example 1 suggests a naive algorithm to capture the powers of the environment and system to determine a combination of the valuations of the literals, by enumerating all these combinations and checking the validity of each potential reaction. Checking that a given combination is a possible reaction requires an ∃∗∀<sup>∗</sup> query (which can be delegated to an SMT solver for appropriate theories).

In this paper, we describe and prove correct a Boolean abstraction method based on this idea. Then, we propose a more efficient search method for the set of possible reactions using SAT solving to speed up the exploration of the set of reactions. The main idea of this faster method is to learn from an invalid reaction which other reactions are guaranteed to be invalid, and from a valid reaction which other reactions are not worth being explored. We encode these learnt sets as a incremental SAT formula that allows to prune the search space. The resulting method is much more efficient than brute-force enumeration because, in each iteration, the learning can prune an exponential number of cases. An important technical detail is that computing the set of cases to be pruned from the outcome of a given query can be described efficiently using a SAT solver.

In summary, our contributions are: (1) a proof that realizability is decidable for all LTL<sup>T</sup> specifications for those theories T with a decidable ∃<sup>∗</sup>∀<sup>∗</sup> fragment; (2) a simple implementation of the resulting Boolean abstraction method; (3) a much faster method based on a nested-SAT implementation of the Boolean abstraction method that efficiently explores the search space of potential reactions; and (4) an empirical evaluation of these algorithms, where our early findings suggest that Boolean abstractions can be used with specifications containing different arithmetic theories, and also with industrial specifications. We used Z3 [10] both as an SMT solver and a SAT solver, and Strix [27] as the realizability checker. To the best of our knowledge, this is the first method that succeeds (and efficiently) in non-Boolean LTL realizability.

#### **2 Preliminaries**

We study realizability of LTL [26,29] specifications. The syntax of LTL is:

$$\varphi ::= T \mid a \mid \varphi \vee \varphi \mid \neg \varphi \mid \bigcirc \varphi \mid \varphi \,\mathcal{U} \,\varphi \mid$$

where <sup>a</sup> ranges from an atomic set of proposition AP, <sup>∨</sup>, <sup>∧</sup> and <sup>¬</sup> are the usual Boolean disjunction, conjunction and negation, and ◯ and U are the next and until temporal operators. The semantics of LTL associate traces <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> with formulae as follows:

$$\begin{array}{lll} \sigma = T & \text{always} \\ \sigma = a & \text{iff} & a \in \sigma(0) \\ \sigma = \varphi\_1 \vee \varphi\_2 & \text{iff} & \sigma \Vdash \varphi\_1 \text{ or } \sigma \Vdash \varphi\_2 \\ \sigma \Vdash \neg \varphi & \text{iff} & \sigma \Vdash \varphi \\ \sigma = \bigcirc \varphi & \text{iff} & \sigma^1 \Vdash \varphi \\ \sigma = \varphi\_1 \mathcal{U} \varphi\_2 & \text{iff} & \text{for some } i \ge 0 \ \sigma^i \Vdash \varphi\_2, \text{ and for all } 0 \le j < i, \sigma^j \vdash \varphi\_1 \end{array}$$

We use common derived operators like ∨, R, ◇ and ◻.

Reactive synthesis [4,5,14,28,33] is the problem of producing a system from an LTL specification, where the atomic propositions are split into propositions that are controlled by the environment and those that are controlled by the system. Synthesis corresponds to a turn-based game where, in each turn, the environment produces values of its variables (inputs) and the system responds with values of its variables (outputs). A play is an infinite sequence of turns. The system player wins a play according to an LTL formula ϕ if the trace of the play satisfies ϕ. A (memory-less) strategy of a player is a map from positions into a move for the player. A play is played according to a strategy if all the moves of the corresponding player are played according to the strategy. A strategy is winning for a player if all the possible plays played according to the strategy are winning.

Depending on the fragment of LTL used, the synthesis problem has different complexities. The method that we present in this paper generates a formula in the same temporal fragment as the original formula (e.g., starting from a safety formula another safety formula is generated). The generated formula is discharged into a solver capable to solve formulas in the right fragment. For simplicity in the presentation, we illustrate our method with safety formulae.

We use LTL<sup>T</sup> as the extension of LTL where propositions are replaced by literals from a first-order theory T . In realizability for LTL<sup>T</sup> , the variables that occur in the literals of a specification ϕ are split into those variables controlled by the environment (denoted by ve) and those controlled by the system (vs), where <sup>v</sup><sup>e</sup> <sup>∩</sup> <sup>v</sup><sup>s</sup> <sup>=</sup> <sup>∅</sup>. We use <sup>ϕ</sup>(ve, <sup>v</sup>s) to remark that <sup>v</sup><sup>e</sup> <sup>∪</sup> <sup>v</sup><sup>s</sup> are the variables occurring in <sup>ϕ</sup>. The alphabet <sup>Σ</sup><sup>T</sup> is now a valuation of the variables in <sup>v</sup><sup>e</sup> <sup>∪</sup> <sup>v</sup>s. A trace is an infinite sequence of valuations, which induces an infinite sequence of Boolean values of the literals occurring in ϕ and, in turn, a valuation of the temporal formula.

Realizability for LTL<sup>T</sup> corresponds to an infinite game with an infinite arena where positions may have infinitely many successors if the ranges of the variables controlled by the system and the environment are infinite. For instance, in Ex. 1 with <sup>T</sup> <sup>=</sup> <sup>T</sup><sup>Z</sup>, valuation ranges over infinite values, and literal (<sup>x</sup> <sup>≥</sup> 2) can be satisfied with x = 2, x = 3, etc.

Arithmetic theories are a particular class of first-order theories. Even though our Boolean abstraction technique is applicable to any theory with a decidable ∃<sup>∗</sup>∀<sup>∗</sup> fragment, we illustrate our technique with arithmetic specifications. Concretely, we will consider T<sup>Z</sup> (i.e., linear integer arithmetic) and T<sup>R</sup> (i.e., non-linear real arithmetic). Both theories have a decidable ∃<sup>∗</sup>∀<sup>∗</sup> fragment. Note that the choice of the theory influences the realizability of a given formula.

*Example 2.* Consider Ex. 1. The formula <sup>ϕ</sup> := <sup>R</sup><sup>0</sup> <sup>∧</sup> <sup>R</sup><sup>1</sup> is not realizable for <sup>T</sup><sup>Z</sup>, since, if at a given instant t, the environment plays x = 0 (and hence x < 2 is true), then y must be greater than 1 at time t+1. Then, if at t+1 the environment plays <sup>x</sup> = 2 then (<sup>x</sup> <sup>≥</sup> 2) is true but there is no <sup>y</sup> such that both (y > 1) and (y < 2). However, for <sup>T</sup><sup>R</sup>, <sup>ϕ</sup> is realizable (consider the system strategy to always play y = 1.5).

The following slight modifications of Ex. 1 alters its realizability (R <sup>1</sup> substitutes <sup>R</sup><sup>1</sup> by having the <sup>T</sup> -predicate <sup>y</sup> <sup>≤</sup> <sup>x</sup> instead of y<x):

$$R\_0: (x<2) \to \bigcirc(y>1) \qquad\qquad R'\_1: (x \ge 2) \to (y \le x)$$

Now, <sup>ϕ</sup> <sup>=</sup> <sup>◻</sup>(R<sup>0</sup> <sup>∧</sup> <sup>R</sup> <sup>1</sup>) is realizable for both T<sup>Z</sup> and TR, as the strategy of the system to always pick <sup>y</sup> = 2 is winning in both theories.

### **3 Boolean Abstraction**

We solve the realizability problem modulo theories by transforming the specification into an equi-realizable Boolean specification. Given a specification ϕ with literals <sup>l</sup>i, we get a new specification <sup>ϕ</sup>[l<sup>i</sup> <sup>←</sup> <sup>s</sup>i] <sup>∧</sup> <sup>◻</sup>ϕ*extra*, where <sup>s</sup><sup>i</sup> are fresh Boolean variables and <sup>ϕ</sup>*extra* <sup>∈</sup> LTL<sup>B</sup> is a Boolean formula (without temporal operators). The additional sub-formula ϕ*extra* uses the freshly introduced variables s<sup>i</sup> controlled by the system, as well as additional Boolean variables controlled by the environment e, and captures the precise combined power of the players to decide the valuations of the literals in the original formula. We call our approach *Booleanization* or *Boolean abstraction*. The approach is summarized in Fig. 1: given an LTL specification <sup>ϕ</sup><sup>T</sup> , it is translated into a Boolean <sup>ϕ</sup><sup>B</sup> which can be analyzed with off-the-shelf realizability checkers. Note that <sup>G</sup><sup>B</sup> and <sup>G</sup><sup>T</sup> are the games constructed from specifications <sup>ϕ</sup><sup>B</sup> and <sup>ϕ</sup><sup>T</sup> , respectively. Also, note that [20] shows that we can construct a game G from a specification <sup>ϕ</sup> and that <sup>ϕ</sup> is realizable if and only if <sup>G</sup> is winning for the system.

**Fig. 1.** The tool chain with the correctness argument.

The Booleanization procedure constructs an extra requirement ϕ*extra* and conjoins <sup>◻</sup>ϕ*extra* with the formula <sup>ϕ</sup>[l<sup>i</sup> <sup>←</sup> <sup>s</sup>i]. In a nutshell, after the environment chooses a valuation of the variables it controls (including e), the system responds with valuations of its variables (including si), which induces a Boolean value for all literals. Therefore, for each possible choice of the environment, the system has the power to choose a Boolean response among a specific collection of responses (a subset of all the possible combinations of Boolean valuations of the literals). Since the set of all possible responses is finite, so are the different cases. The extra requirement captures precisely the finite collection of choices of the environment and the resulting finite collection of responses of the system for each case.

#### **3.1 Notation**

In order to explain the construction of the extra requirement, we introduce some preliminary definitions. We will use Ex. 1 as the running example.

#### 310 A. Rodr´ıguez and C. S´anchez

A literal is an atom or its negation, regardless of whether the atom is a Boolean variable or a predicate of a theory. Let *Lit*(ϕ) be the collection of literals that appear in ϕ (or *Lit*, if the formula is clear from the context). For simplicity, we assume that all literals belong the same theory, but each theory can be Booleanized in turn, as each literal belongs to exactly one theory and we assume in this paper that literals from different theories do not share variables. We will use x as the environment controlled variables occurring in *Lit*(ϕ) and y for the variables controlled by the system.

In Ex. 1, we first translate the literals in ϕ. Since (x < 2) is equivalent to <sup>¬</sup>(<sup>x</sup> <sup>≥</sup> 2), we use a single Boolean variable for both. The substitutions is:

$$\begin{array}{ccc} (x<2) \leftarrow s\_0 & & (y>1) \leftarrow s\_1 & & (y$$

After the substitution we obtain ϕ = ◻(R<sup>B</sup> <sup>0</sup> <sup>∧</sup> <sup>R</sup><sup>B</sup> <sup>1</sup> ) where

$$R\_0^\mathbf{B} : s\_0 \to \bigcirc s\_1 \qquad \qquad \qquad R\_1^\mathbf{B} : \neg s\_0 \to s\_2$$

Note that ϕ may not be equi-realizable to ϕ, as we may be giving too much power to the system if s0, s<sup>1</sup> and s<sup>2</sup> are chosen independently without restriction. Note that ϕ is realizable, for example by always choosing s<sup>1</sup> and s<sup>2</sup> to be true, but <sup>ϕ</sup> is not realizable in *LTL*T<sup>Z</sup> . This justifies the need of an extra sub-formula.

## **Definition 1 (Choice).** *A choice* <sup>c</sup> <sup>⊆</sup> *Lit*(ϕ) *is a subset of the literals of* <sup>ϕ</sup>*.*

The intended meaning of a choice is to capture what literals are true in the choice, while the rest (i.e., *Lit* \ <sup>c</sup>) are false. Once the environment picks values for x, the system can realize some choice c by selecting y and making the literals in c true (and the rest false). However, for some values of x, some choices may not be possible for the system for any y. Given a choice c, we use f(c(x, y)) to denote the formula:

$$\bigwedge\_{l \in c} l \wedge \bigwedge\_{l \notin c} \neg l$$

which is a formula with variables x and y that captures logically the set of values of <sup>x</sup> and <sup>y</sup> that realize precisely choice <sup>c</sup>. We use <sup>C</sup> for the set of choices. Note that there are |C| = 2|*Lit*<sup>|</sup> different choices. We call the elements of <sup>C</sup> choices because they may be at the disposal of the system to choose by picking the right values of its variables.

A given choice c can act as *potential* (meaning that the response is possible) or as *antipotential* (meaning that the response is not possible). A potential is a formula (that depends only on x) that captures those values of x for which the system can respond and make precisely the literals in c true (and the rest of the literals false). The negation of the potential (i.e., an antipotential) captures precisely those values of x for which there are no values of y that lead to c.

**Definition 2 (Potential and Antipotential).** *Given a choice* c*, a potential is the following formula* c<sup>p</sup> *and an antipotential is the following formula* c<sup>a</sup>*:*

$$c^p(\overline{x}) = \exists \overline{y}. f(c(\overline{x}, \overline{y})) \qquad\qquad\qquad c^a(\overline{x}) = \forall \overline{y}. \neg f(c(\overline{x}, \overline{y})) $$

*Example 3.* We illustrate two choices for Ex. 1. Consider choices <sup>c</sup><sup>0</sup> <sup>=</sup> {(x < 2),(y > 1),(y<x)} and <sup>c</sup><sup>1</sup> <sup>=</sup> {(x < 2),(y > 1)}. Choice <sup>c</sup><sup>0</sup> corresponds to <sup>f</sup>(c0)=(x < 2) <sup>∧</sup> (y > 1) <sup>∧</sup> (y<x), that is, literals (x < 2), (y > 1) and (y<x) are true. Choice <sup>c</sup><sup>1</sup> corresponds to <sup>f</sup>(c1)=(x < 2) <sup>∧</sup> (y > 1) <sup>∧</sup> (<sup>y</sup> <sup>≥</sup> <sup>x</sup>), that is, literals (x < 2) and (y > 1) being true and (y<x) being false (i.e., (<sup>y</sup> <sup>≥</sup> <sup>x</sup>) being true). It is easy to see the meaning of c2, c<sup>3</sup> etc. Then, the potential and antipotential formulae of e.g., choices c<sup>0</sup> and c<sup>1</sup> from Ex. 1 are as follows:

$$\begin{array}{llll} c\_0^p = \exists y. (x < 2) \land (y > 1) \land (y < x) & c\_0^a = \forall y. \neg \left( (x < 2) \land (y > 1) \land (y < x) \right) \\ c\_1^p = \exists y. (x < 2) \land (y > 1) \land (y \ge x) & c\_1^a = \forall y. \neg \left( (x < 2) \land (y > 1) \land (y \ge x) \right) \end{array}$$

Note that potentials and antipotentials have <sup>x</sup> as the only free variables.

Depending on the theory, the validity of potentials and antipotentials may be different. For instance, consider c p <sup>0</sup> and theories T<sup>Z</sup> and T<sup>R</sup>:

– In <sup>T</sup><sup>Z</sup>: <sup>∃</sup>y.(x < 2) <sup>∧</sup> (y > 1) <sup>∧</sup> (y<x) is equivalent to *false*. – In <sup>T</sup><sup>R</sup>: <sup>∃</sup>y.(x < 2) <sup>∧</sup> (y > 1) <sup>∧</sup> (y<x) is equivalent to (x < 2).

These equivalences can be obtained using classic quantifier elimination procedures, e.g., with Cooper's algorithm [9] for T<sup>Z</sup> and Tarski's method [32] for T<sup>R</sup>.

A reaction is a description of the specific choices that the system has the power to choose.

**Definition 3 (Reaction).** *Let* <sup>P</sup> *and* <sup>A</sup> *be a partition of* <sup>C</sup> *that is:* <sup>P</sup> ⊆ C*,* <sup>A</sup> ⊆ C*,* <sup>P</sup> <sup>∩</sup> <sup>A</sup> <sup>=</sup> <sup>∅</sup> *and* <sup>P</sup> <sup>∪</sup> <sup>A</sup> <sup>=</sup> <sup>C</sup>*. The reaction react*(P,A) *is as follows:*

$$\operatorname{react}\_{(P,A)}(\overline{x}) \stackrel{def}{=} \bigwedge\_{c \in P} c^p \wedge \bigwedge\_{c \in A} c^a$$

The reaction *react*(P,A) is equivalent to:

$$\operatorname{react}\_{(P,A)}(\overline{x}) = \bigwedge\_{c \in P} \left( \exists \overline{y}. f(c(\overline{x}, \overline{y})) \right) \wedge \bigwedge\_{c \in A} \left( \forall \overline{y}. \neg f(c(\overline{x}, \overline{y})) \right) \dots$$

There are 2<sup>2</sup>|*Lit*<sup>|</sup> different reactions.

A reaction r is called valid whenever there is a move of the environment for which r captures precisely the power of the system, that is exactly which choices the system can choose. Formally, a reaction is valid whenever <sup>∃</sup>x.r(x) is a valid formula. We use R for the set of reactions and *VR* for the set of valid reactions. It is easy to see that, for all possible valuations of x the environment can pick, the system has a specific power to respond (among the finitely many cases). Therefore, the following formula is valid:

$$
\varphi\_{VR} = \forall \overline{x}. \bigvee\_{r \in VR} r(\overline{x}).
$$

*Example 4.* In Ex. 1, for theory TZ, we find there are two valid reactions (using choices from Ex. 3):

$$\begin{array}{c} r\_1 : \exists x. c\_0^a \land c\_1^p \land c\_2^p \land c\_3^p \land c\_4^a \land c\_5^a \land c\_6^a \land c\_7^a\\ r\_2 : \exists x. c\_0^a \land c\_1^a \land c\_2^a \land c\_3^a \land c\_4^a \land c\_5^p \land c\_6^p \land c\_7^a, \end{array}$$

where reaction r<sup>1</sup> models the possible responses of the system after the environment picks a value for x with (x < 2), whereas r<sup>2</sup> models the responses to (<sup>x</sup> <sup>≥</sup> 2). On the other hand, for <sup>T</sup>R, there are three valid reactions:

$$\begin{array}{l} r\_1 : \exists x. c\_0^a \land c\_1^p \land c\_2^p \land c\_3^p \land c\_4^a \land c\_5^a \land c\_6^a \land c\_7^a\\ r\_2 : \exists x. c\_0^p \land c\_1^p \land c\_2^p \land c\_3^a \land c\_4^a \land c\_5^a \land c\_6^a \land c\_7^a\\ r\_3 : \exists x. c\_0^a \land c\_1^a \land c\_2^a \land c\_3^a \land c\_4^p \land c\_5^p \land c\_6^p \land c\_7^a \end{array}$$

Note that there is one valid reaction more, since in T<sup>R</sup> there is one more case: <sup>x</sup> <sup>∈</sup> (1, 2]. Also, note that <sup>c</sup><sup>4</sup> cannot be a potential in <sup>T</sup><sup>Z</sup> (not even with a collaboration between environment and system), whereas it can in T<sup>R</sup>.

#### **3.2 The Boolean Abstraction Algorithm**

Boolean abstraction is a method to compute <sup>ϕ</sup><sup>B</sup> from <sup>ϕ</sup><sup>T</sup> . In this section we describe and prove correct a basic brute-force version of this method, and later in Sect. 4, we present faster algorithms. All Boolean abstraction algorithms that we present on this paper first compute the extra requirement, by visiting the set of reactions and computing a subset of the valid reactions that is sufficient to preserve realizability. The three main building blocks of our algorithms are (1) the stop criteria of the search for reactions; (2) how to obtain the next reaction to consider; and (3) how to modify the current set of valid reactions (by adding new valid reactions to it) and the set of remaining reactions (by pruning the search space). Finally, after the loop, the algorithm produces as ϕ*extra* a conjunction of cases, one per valid reaction (P, A) in *VR*.

**Algorithm 1:** Brute-force

 Input: <sup>ϕ</sup><sup>T</sup> <sup>ϕ</sup> <sup>←</sup> <sup>ϕ</sup><sup>T</sup> [l<sup>i</sup> <sup>←</sup> <sup>s</sup>i] *VR* ← {} C ← *choices*(*literals*(ϕ<sup>T</sup> )) **<sup>4</sup>** R ← 2<sup>C</sup> **for** (P, A) ∈ R **do if** <sup>∃</sup>x.*react*(P,A)(x) **then** *VR* <sup>←</sup> *VR* ∪ {(P, A)} <sup>ϕ</sup>*extra* <sup>←</sup> *getExtra*(*VR*) **return** <sup>ϕ</sup> <sup>∧</sup> <sup>◻</sup>(<sup>A</sup> <sup>→</sup> <sup>ϕ</sup>*extra*)

We introduce a fresh variable e(P,A), controlled by the environment for each valid reaction (P, A), to capture that the environment plays values for x that correspond to the case where the system is left with the power to choose captured precisely by (P, A). Therefore, there is one additional environment Boolean variable per valid reaction (in practice we can enumerate the number of valid reactions and introduce only a logarithmic number of environment variables). Finally, the extra requirement uses P for each valid reac-

tion (P, A) to encode the potential moves of the systems as a disjunction of the literals described by each choice in P. Each of these disjunction contains precisely the combinations of literals that are possible for the concrete case that (P, A) captures.

A brute-force algorithm that implements Boolean abstraction method by exhaustively searching all reactions is shown in Algorithm 1. The building blocks of this algorithm are:


Finally, the extra sub-formula ϕ*extra* is generated by *getExtra* (line 8) defined as follows:

$$\operatorname{getExtra}(VR) = \bigwedge\_{\substack{(P,A)\in VR \ \text{\$c|}P \ \text{\$i\$}\in \text{\$c|}P}} (e\_{(P,A)} \to \bigvee\_{\substack{c\in P \ \text{\$l|}e \in c}} (\bigwedge\_{i\in c} s\_i \wedge \bigwedge\_{\substack{\neg s\_i \nmid \,c}} \neg s\_i))$$

Note that there is an ∃<sup>∗</sup>∀<sup>∗</sup> validity query in the body of the loop (line 6) to check whether the candidate reaction is valid. This is why decidability of the ∃<sup>∗</sup>∀<sup>∗</sup> fragment is crucial because it captures the finite partitioning of the environment moves (which is existentially quantified) for which the system can react in certain ways (i.e., potentials, which are existentially quantified) by picking appropriate valuations but not in others (i.e., antipotentials, which are universally quantified). In essence, the brute-force algorithm iterates over all the reactions, one at a time, checking whether each reaction is valid or not. In case the reaction (characterized by the set of potential choices<sup>1</sup>) is valid, it is added to *VR*.

*Example 5.* Consider again the specification in Ex. 1, with T<sup>Z</sup> as theory. Note that the valid reactions are r<sup>1</sup> and r2, as shown in Ex. 4, where the potentials of <sup>r</sup><sup>1</sup> are {c1, c2, c3} and the potentials of <sup>r</sup><sup>2</sup> are {c5, c6}. Now, the creation of <sup>ϕ</sup>*extra* requires two fresh variables d<sup>0</sup> and d<sup>1</sup> for the environment (they correspond to environment decisions (x < 2) and (<sup>x</sup> <sup>≥</sup> 2), respectively), resulting into:

$$\varphi\_{T\_{\mathbb{Z}}}^{\text{extra}} : \begin{pmatrix} d\_0 \to \left( \left( s\_0 \wedge s\_1 \wedge \neg s\_2 \right) \vee \left( s\_0 \wedge \neg s\_1 \wedge s\_2 \right) \vee \left( s\_0 \wedge \neg s\_1 \wedge \neg s\_2 \right) \right) \\ \wedge \\ d\_1 \to \left( \left( \neg s\_0 \wedge s\_1 \wedge \neg s\_2 \right) \vee \left( \neg s\_0 \wedge \neg s\_1 \wedge s\_2 \right) \right) \end{pmatrix}$$

For example <sup>c</sup><sup>2</sup> <sup>=</sup> {s0} is a choice that appears as potential in valid reaction <sup>r</sup>1, so it appears as a disjunct of <sup>d</sup><sup>0</sup> as (s<sup>0</sup> ∧ ¬s<sup>1</sup> ∧ ¬s2). The resulting *Booleanized* specification ϕ<sup>B</sup> is as follows:

$$
\varphi\_{T\_{\mathbb{Z}}}^{\mathbb{B}} = (\varphi^{\prime\prime} \land \Box(A\_{\mathbb{B}} \to \varphi\_{T\_{\mathbb{Z}}}^{\text{extra}}))
$$

<sup>1</sup> The potentials in a choice characterize the precise power of the system player, because the potentials correspond with what the system can respond.

Note that the Boolean encoding is extended with an assumption formula <sup>A</sup><sup>B</sup> = (d<sup>0</sup> ↔ ¬d1)∧(d<sup>0</sup> <sup>∨</sup>d1) that restricts environment moves to guarantee that exactly one environment decision variable is picked. Also, note that a Boolean abstraction algorithm will output three (instead of two) decisions for the environment, but we ackowledge that one of them will never be played by it, since it gives strictly more power to the system. The complexity of this brute-force Booleanization algorithm is doubly exponential in the number of literals.

#### **3.3 From Local Simulation to Equi-Realizability**

The intuition about the correctness of the algorithm is that the extra requirement encodes precisely all reactions (i.e., collections of choices), for which there is a move of the environment that leaves the system with precisely that power to respond. As an observation, in the extra requirement, the set of potentials in valid reactions cannot be empty. This is stated in Lemma 1.

**Lemma 1.** *Let* <sup>C</sup> ∈ C *be such that* react<sup>C</sup> <sup>∈</sup> *VR. Then* <sup>C</sup> <sup>=</sup> <sup>∅</sup>*.*

*Proof.* Bear in mind *react*<sup>C</sup> <sup>∈</sup> *VR* is valid. Let <sup>v</sup> be such that *react*<sup>C</sup> [<sup>x</sup> v] is valid. Let w be an arbitrary valuation of y and let c be a choice and l a literal. Therefore:

$$\bigwedge\_{\substack{l\left[\overline{x}\leftarrow\overline{v},\overline{y}\leftarrow\overline{w}\right] \text{ is } \vdash w}} l \wedge \bigwedge\_{\substack{l\left[\overline{x}\leftarrow\overline{v},\overline{y}\leftarrow\overline{w}\right] \text{ is } \vdash \overline{w} \text{ is } \vdash d}} \neg l$$

It follows that <sup>I</sup>[<sup>x</sup> <sup>←</sup> <sup>v</sup>]∃y.c, so <sup>c</sup> <sup>∈</sup> <sup>C</sup>.

Lemma 1 is crucial, because it ensures that once a Boolean abstraction algorithm is executed, for each fresh e variable in the extra requirement, at least one reaction with one or more potentials can be responded by the system.

Therefore, in each position in the realizability game, the system can respond to moves of the system leaving to precisely corresponding positions in the Boolean game. In turn, this leads to equi-realizability because each move can be simulated in the corresponding game. Concretely, it is easy to see that we can define a simulation between the positions of the games for <sup>ϕ</sup><sup>T</sup> and <sup>ϕ</sup><sup>B</sup> such that (1) each literal l<sup>i</sup> and the corresponding variable s<sup>i</sup> have the same truth value in related positions, (2) the extra requirement is always satisfied, and (3) moves of the system in each game from related positions in each game can be mimicked in the other game. This is captured by the following theorem:

**Theorem 1.** *System wins* <sup>G</sup><sup>T</sup> *if and only if System wins the game* <sup>G</sup><sup>B</sup>*. Therefore,* <sup>ϕ</sup><sup>T</sup> *is realizable if and only if* <sup>ϕ</sup><sup>B</sup> *is realizable.*

*Proof.* (Sketch). Since realizability games are memory-less determined, it is sufficient to consider only local strategies. Given a strategy ρ<sup>B</sup> that is winning in <sup>G</sup><sup>B</sup> we define a strategy <sup>ρ</sup><sup>T</sup> in <sup>G</sup><sup>T</sup> as follows. Assuming related positions, <sup>ρ</sup><sup>T</sup>

moves in <sup>G</sup><sup>T</sup> to the successor that is related to the position where <sup>ρ</sup><sup>B</sup> moves in <sup>G</sup>B. By (3) above, it follows that for every play played in <sup>G</sup><sup>B</sup> according to <sup>ρ</sup><sup>B</sup> there is a play in <sup>G</sup><sup>T</sup> played according to <sup>ρ</sup><sup>T</sup> that results in the same trace, and vice-versa: for every play played in <sup>G</sup><sup>T</sup> according to <sup>ρ</sup><sup>T</sup> there is a play in <sup>G</sup><sup>B</sup> played according to ρ<sup>B</sup> that results in the same trace. Since ρ<sup>B</sup> is winning, so is <sup>ρ</sup><sup>T</sup> . The other direction follows similarly, because again <sup>ρ</sup><sup>B</sup> can be constructed from <sup>ρ</sup><sup>T</sup> not only guaranteeing the same valuation of literals and corresponding variables, but also that the extra requirement holds in the resulting position.

The following corollary of Thm. 1 follows immediately.

**Theorem 2.** *Let* T *be a theory with a decidable* ∃<sup>∗</sup>∀<sup>∗</sup>*-fragment. Then,* LTL<sup>T</sup> *realizability is decidable.*

## **4 Efficient Algorithms for Boolean Abstraction**

#### **4.1 Quasi-reactions**

The basic algorithm presented in Sect. 3 exhaustively traverses the set of reactions, one at a time, checking whether each reaction is valid. Therefore, the body of the loop is visited 2|C| times. In practice, the running time of this basic algorithm quickly becomes unfeasible.

We now improve Alg. 1 by exploiting the observation that every SMT query for the validity of a reaction reveals information about the validity of other reactions. We will exploit this idea by learning uninteresting subsequent sets of reactions and pruning the search space. The faster algorithms that we present below encode the remaining search space using a SAT formula, whose models are further reactions to explore.

To implement the learning-and-pruning idea we first introduce the notion of quasi-reaction.

**Definition 4 (Quasi-reaction).** *A quasi-reaction is a pair* (P, A) *where* <sup>P</sup> <sup>⊆</sup> <sup>C</sup>*,* <sup>A</sup> ⊆ C *and* <sup>P</sup> <sup>∩</sup> <sup>A</sup> <sup>=</sup> <sup>∅</sup>*.*

Quasi-reactions remove from reactions the constraint that <sup>P</sup>∪<sup>A</sup> <sup>=</sup> <sup>C</sup>. A quasireaction represents the set of reactions that would be obtained from choosing the remaining choices that are neither in P nor in A as either potential or antipotential. The set of quasi-reactions is:

<sup>Q</sup> <sup>=</sup> {(P, A)|P, A ⊆ C *and* <sup>P</sup> <sup>∩</sup> <sup>A</sup> <sup>=</sup> ∅}

Note that <sup>R</sup> <sup>=</sup> {(P, A) ∈ Q|<sup>P</sup> <sup>∪</sup> <sup>A</sup> <sup>=</sup> C}.

*Example 6.* Consider a case with four choices c0, c1, c<sup>2</sup> and c3. The quasi-reaction ({c0, c2}, {c1}) corresponds to the following formula:

$$\exists \overline{x}. \; (\exists \overline{y}. \; f(c\_0(\overline{x}, \overline{y})) \land \forall \overline{y}. \; \neg f(c\_1(\overline{x}, \overline{y})) \land \exists \overline{y}. \; f(c\_2(\overline{x}, \overline{y})) \Big)$$

Note that nothing is stated in this quasi-reaction about c<sup>3</sup> (it neither acts as a potential nor as an antipotential).

Consider the following order between quasi-reactions: (P, A) (P , A ) holds if and only if <sup>P</sup> <sup>⊆</sup> <sup>P</sup> and <sup>A</sup> <sup>⊆</sup> <sup>A</sup> . It is easy to see that is a partial order, that (∅, <sup>∅</sup>) is the lowest element and that for every two elements (P, A) and (P , A ) there is a greatest lower bound (namely (<sup>P</sup> <sup>∩</sup> <sup>P</sup> , A <sup>∩</sup> <sup>A</sup> )). Therefore (P, A) (P , A ) def = (<sup>P</sup> <sup>∩</sup> <sup>P</sup> , A <sup>∩</sup> <sup>A</sup> ) is a meet operation (it is associative, commutative and idempotent). Note that <sup>q</sup> <sup>q</sup> if and only if <sup>q</sup> <sup>q</sup> <sup>=</sup> <sup>q</sup>. Formally:

## **Proposition 1.** (Q,) *is a lower semi-lattice.*

The quasi-reaction semi-lattice represents how *informative* a quasi-reaction is. Given a quasi-reaction (P, A), removing an element from either P or A results in a strictly less informative quasi-reaction. The lowest element (∅, <sup>∅</sup>) contains the least information.

Given a quasi-reaction <sup>q</sup>, the set <sup>Q</sup><sup>q</sup> <sup>=</sup> {q ∈ Q|q <sup>q</sup>} of the quasi-reactions below <sup>q</sup> form a full lattice with join (P, Q) (P , Q ) def = (<sup>P</sup> <sup>∪</sup> <sup>P</sup> , Q <sup>∪</sup> <sup>Q</sup> ). This is well defined because P and Q, and P and Q are guaranteed to be disjoint.

**Proposition 2.** *For every* <sup>q</sup>*,* (Qq,,) *is a lattice.*

As for reactions, quasi-reactions correspond to a formula in the theory as follows:

$$\operatorname{qreact}\_{(P,A)}(\overline{x}) = \bigwedge\_{c \in P} \left( \exists \overline{y}. c(\overline{x}, \overline{y}) \right) \wedge \bigwedge\_{c \in A} \left( \forall \overline{y}. \neg c(\overline{x}, \overline{y}) \right)$$

Again, given a quasi-reaction <sup>q</sup>, if <sup>∃</sup>x.*qreact*q(x) is valid we say that <sup>q</sup> is valid, otherwise we say that q is invalid. The following holds directly from the definition (and the fact that adding conjuncts makes a first-order formula "less satisfiable").

**Proposition 3.** *Let* q, q *be two quasi-reactions with* <sup>q</sup> <sup>q</sup> *. If* q *is invalid then* q *is invalid. If* q *is valid then* q *is valid.*

These results enable the following optimizations.

#### **4.2 Quasi-reaction-based Optimizations**

**A Logic-Based Optimization.** Consider that, during the search for valid reactions in the main loop, a reaction (P, A) is found to be invalid, that is *react*(P,A) is unsatisfiable. If the algorithms explores the quasi-reactions below (P, A), finding (P , A ) (P, A) such that *qreact*(<sup>P</sup> -,A-), then by Prop. 3, every reaction (P, A) above (P , A ) is guaranteed to be invalid. This allows to prune the search in the main loop by computing a more informative quasi-reaction q after an invalid reaction r is found, and skipping all reactions above q (and not only <sup>r</sup>). For example, if the reaction corresponding to ({c0, c2, c3}, {c1}) is found to be invalid, and by exploring quasi-reactions below it, we find that ({c0}, {c1}) is also invalid, then we can skip all reactions above ({c0}, {c1}). This includes for example ({c0, c2}, {c1, c3}) and ({c0, c3}, {c1, c2}). In general, the lower the invalid quasi-reaction in , the more reactions will be pruned. This optimization resembles a standard choosing of max/min elements in an anti-chain.

**A Game-Based Optimization.** Consider now two reactions r = (P, A) and r = (P , A ) such that <sup>P</sup> <sup>⊆</sup> <sup>P</sup> and assume that both are valid reactions. Since r allows more choices to the system (because the potentials P determine these choices), the environment player will always prefer to play r than r . Formally, if there is a winning strategy for the environment that chooses values for x (corresponding to a model of *react*r), then choosing values for x instead (corresponding to a model of *react*<sup>r</sup>-) will also be winning.

Therefore, if a reaction r is found to be valid, we can prune the search for reactions r that contain strictly more potentials, because even if r is also valid, it will be less interesting for the environment player. For instance, if ({c0, c3}, {c1, c2}) is valid, then ({c0, c1, c3}, {c2}) and ({c0, c1, c3, c2}, {}) become uninteresting to be explored and can be pruned from the search.

#### **4.3 A Single Model-Loop Algorithm (Algorithm 2)**

We present now a faster algorithm that replaces the main loop of Algorithm 1 that performs exhaustive exploration with a SAT-based search procedure that prunes uninteresting reactions. In order to do so, we use a SAT formula ψ with one variable <sup>z</sup><sup>i</sup> per choice <sup>c</sup>i, in a DPLL(T) fashion. An assignment <sup>v</sup> : *Vars*(ψ) <sup>→</sup> B to these variables represents a reaction (P, A) where

$$P = \{c\_i | v(z\_i) = true\} \qquad \qquad A = \{c\_j | v(z\_j) = false\}$$

Similarly, a partial assignment v : *Vars*(ψ) B represents a quasi-reaction. The intended meaning of ψ is that its models encode the set of interesting reactions that remain to be explored. This formula is initialized with <sup>ψ</sup> <sup>=</sup> *true* (note that <sup>¬</sup>( <sup>z</sup>*<sup>i</sup>* <sup>¬</sup>zi) is also a correct starting point because the reaction where all choices are antipotentials is invalid). Then, a SAT query is used to find a satisfying assignment for ψ, which corresponds to a (quasi- )reaction r whose validity is interesting to be explored. Algorithm 2 shows

#### **Algorithm 2:** Model-loop

```
10 Input: ϕT
11 ϕ ← ϕT [li ← si] ; VR ← {}
12 C ← choices(literals(ϕT ))
13 R ← 2C ; ψ ← 
14 while SAT(ψ) do
15 m = model(ψ)
16 if ∃x. (toTheory(m, C))
     then
17 P ← posVars(m)
18 ψ ← ψ ∧ ¬(

                  p∈P p)
19 VR ← VR ∪ (et, P)
20 else
21 N ← negVars(m)
22 fh ← 
             n∈N n
23 if ∃x. toTheory(fh, C)
        then
24 ψ ← ψ ∧ ¬m
25 else
26 ψ ← ψ ∧ ¬fh
27 ϕextra ← getExtra(VR)
28 return ϕ ∧ ◻(A → ϕextra)
```
the Model-loop algorithm. The three main building blocks of the model-loop algorithm are:

	- If the reaction is invalid (as a result of the SMT query in line 16), then it checks the validity of quasi-reaction <sup>q</sup> = (∅, A) in line 23. If q is invalid, add the negation of q as a new conjunction of ψ (line 26). If q is valid, add the negation of the reaction (line 24). This prevents all SAT models that agree with one of these q, which correspond to reactions <sup>q</sup> <sup>r</sup> , including <sup>r</sup>. – If the reaction is valid, then it is
	- added to the set of valid reactions

*VR* and the corresponding quasi-reaction that results from removing the antipotentials is added (negated) to ψ (line 18), preventing the exploration of uninteresting cases, according to the game-based optimization.

As for the notation in Algorithm 2 (also in Algorithm 3 and Algorithm 4), *model(*ψ*)* in line 15 is a function that returns a satisfying assignment of the SAT formula ψ, *posVars(m)* returns the positive variables of m (e.g., ci, c<sup>j</sup> etc.) and *negVars(m)* returns the negative variables. Finally, *toTheory*(m, <sup>C</sup>) = <sup>m</sup>*<sup>i</sup>* <sup>c</sup> p <sup>i</sup> ∧ <sup>¬</sup>m*<sup>i</sup>* <sup>c</sup><sup>a</sup> <sup>i</sup> (in lines 16 and 23) translates a Boolean formula into its corresponding formula in the given <sup>T</sup> theory. Note that unsatisfiable <sup>m</sup> can be minimized finding cores.

If <sup>r</sup> is invalid and (∅, A) is found also to be invalid, then exponentially many cases can be pruned. Similarly, if r is valid, also exponentially many cases can be pruned. The following result shows the correctness of Algorithm 2:

#### **Theorem 3.** *Algorithm 2 terminates and outputs a correct Boolean abstraction.*

*Proof.* (Sketch). Algorithm 2 terminates because, at each step in the loop, ψ removes at least one satisfying assignment and the total number is bounded by 2|C|. Also, the correctness of the generated formula is guaranteed because, for every valid reaction in Algorithm 1, either there is a valid reaction found in Algorithm 2 or a more promising reaction found in Algorithm 2.

#### **4.4 A Nested-SAT Algorithm (Algorithm 3)**

We now present an improvement of Algorithm 2 that performs a more detailed search for a promising collection of invalid quasi-reactions under an invalid reaction r.


Note that it is not necessary to find the precise collection of all the smallest quasi-reactions that are under an invalid reaction r, as long as at least one quasireaction under r is calculated (perhaps, r itself). Finding lower quasi-reactions allow to prune more, but its calculation is more costly, because more SMT queries need to be performed. The Nested-SAT algorithm (Algorithm 3) explores (using an inner SAT encoding) this trade-off between computing more exhaustively better invalid quasi-reactions and the cost of the search. The three main building blocks of the nested-SAT algorithm (see Algorithm 3) are:


The inner loop, shown in Algorithm 4 (where *VQ* stands for *valid quasireactions*), explores a full lattice.



Also, note that ¬( <sup>z</sup>*<sup>i</sup>* <sup>¬</sup>zi) is, again, a correct starting point. Consider, for example, that the outer loop finds ({c1, c3}, {c0, c2}) to be invalid and that the inner loop produces assignment <sup>w</sup><sup>0</sup> <sup>∧</sup> <sup>w</sup><sup>1</sup> <sup>∧</sup> <sup>w</sup><sup>2</sup> ∧ ¬w3. This corresponds to c<sup>3</sup> being masked producing quasi-reaction ({c1}, {c0, c2}). The pruning system is the following:


Note that *toTheory inn*(u, m, <sup>C</sup>) = <sup>m</sup>*i*∧u*<sup>j</sup>* <sup>c</sup> p <sup>i</sup> ∧ <sup>¬</sup>m*i*∧u*<sup>j</sup>* <sup>c</sup><sup>a</sup> <sup>i</sup> is not the same function as the *toTheory()* used in Algorithm 2 and Algorithm 3, since the inner loops needs both model m and mask u (which makes no sense to be negated) to translate a Boolean formula into a T -formula. Also, note that there is again a trade-off in the inner loop because an exhaustive search is not necessary. Thus, in practice, we also used some basic heuristics: (1) entering the inner loop only when (∅, A) is invalid; (2) fixing a maximum number of inner model queries per outer model with the possibility to decrement this amount dynamically with a decay; and (3) reducing the number of times the inner loop is exercised (e.g., *enter the inner loop only if the number of invalid outer models so far is even*).

*Example 7.* We explore the results of Algorithm 3. A possible execution for 2 literals can be as follows:


4. A fourth reaction ({c1, c2}, {c0, c3}) is obtained in line 33, which is now invalid (line 35). The inner loop called in line 42 generates the following cores: ({c1}, {c0}) and ({c2}, {c3}). The addition of the negation of these cores leads to an unsatisfiable outer SAT formula, and the algorithm terminates.

The execution in this example has performed 4 SAT+SMT queries in the outer loop, and 3+2 SAT+SMT queries in the inner loops. The brute-force Algorithm 1 would have performed 16 queries. Note that the difference between the exhaustive version and the optimisations soon increases exponentially when we consider specifications with more literals.

#### **5 Empirical Evaluation**

We perform an empirical evaluation on six specifications inspired by real industrial cases: *Lift* (*Li.*), *Train* (*Tr.*), *Connect* (*Con.*), *Cooker* (*Coo.*), *Usb* (*Usb*) and *Stage* (*St.*), and a synthetic example (*Syn.*) with versions from 2 to 7 literals. For the implementation, we used used Python 3.8.8 with Z3 4.11.

It is easy to see that "clusters" of literals that do not share variables can be Booleanized independently, so we split into clusters each of the examples. We report our results in Fig. 2. Each row contains the result for a cluster of an experiment (each one for the fastest heuristic). Each benchmark is split into clusters, where we show the number of variables (*vr*.) and literals (*lt.*) per cluster. We also show running times of each algorithm against each cluster; concretely, we test Algorithm 1 (*BF*), Algorithm 2 (*SAT*) and Algorithm 3 (*Doub*.). For Algorithm 2 and Algorithm3, we show the number of queries performed; in the case of Algorithm 3, we also show both outer and inner queries. Algorithm 1 and Algorithm 2 require no heuristics. For Algorithm 3, we report, left to right: maximum number of inner loops (*MxI.*), the modulo division criteria (*Md.*)<sup>2</sup>, the number of queries after which we perform a decay of 1 in the maximum number of inner loops (*Dc.*), and if we apply the invalidity of (∅, A) as a criteria to enter the inner loop (A.), where means that we do and × means the contrary. Also, ⊥ means timeout (or *no data*).

The brute-force (BF) Algorithm 1 performs well with 3 or fewer literals, but the performance dramatically decreases with 4 literals. Algorithm 2 (single SAT) performs well up to 4 literals, and it can hardly handle cases with 6 or more literals. An exception is *Lift (1,7)* which is simpler since it has only one variable (and this implies that there is only one player). The performance improvement of SAT with respect to BF is due to the decreasing of queries. For example, *Train (3,6)* performs 13706 queries, whereas BF would need 2<sup>2</sup><sup>6</sup> = 1.<sup>844</sup> · <sup>10</sup><sup>18</sup> queries.

All examples are Booleanizable when using Algorithm 3 (two SAT loops), particularly when using a combination of concrete heuristics. For instance, in

<sup>2</sup> This means that the inner loop is entered if and only if the number of invalid models so far is divisible by *Md*, and we found *Md* values of 2, 3 and 20 to be interesting.


**Fig. 2.** Empirical evaluation results of the different Boolean abstraction algorithms , where the best results are in **bold** and ϕ<sup>B</sup> only refers to best times.

small cases (2 to 5 literals) it seems that heuristic-setups like 3/3/3/0/-<sup>3</sup> are fast, whereas in bigger cases other setups like 40/2/0/ or 100/40/20/<sup>×</sup> are faster. We conjecture that a non-zero decay is required to handle large inputs, since inner loop exploration becomes less useful after some time. However, adding a decay is not always faster than fixing a number of inner loops (see *Syn (2,7)*), but it always yields better results in balancing the number of queries between the two nested SAT layers. Thus, since balancing the number of queries typically leads to faster execution times, we recommend to use decays. Note that we performed all the experiments reported in this section running all cases several times and computing averages, because Z3 exhibited a big volatility in the models it produces, which in turn influenced the running time of our algorithms. This significantly affects the precise reproducibility of the running times. For instance,

<sup>3</sup> This means: we only perform 3 inner loop queries per outer loop query (and there is no decay, i.e., decay = 0), we enter the inner loop once per 3 outer loops and we only enter the inner loop if (∅, A) is invalid.


**Fig. 3.** Best numbers of queries for Algorithm 2 and 3 relative to brute-force (Alg.1).


**Fig. 4.** Comparison of T<sup>Z</sup> and T<sup>R</sup> for *Syn (2,3)* to *Syn (2,6).*

for *Syn(2,5)* the worst case execution was almost three times worst than the average execution reported in Fig. 2. Studying this phenomena more closely is work in progress. Note that there are cases in which the number of queries of *SAT* and *Doub*. are the same (e.g., *Usb(3,5)*), which happened when the *A.* heuristic had the effect of making the search not to enter the inner loop.

In Fig. 2 we also analyzed the constructed ϕB, measuring the number of valid reactions from which it is made (*Val.*) and the time (*Tme.*) that a realizability checker takes to verify whether <sup>ϕ</sup><sup>B</sup> (hence, <sup>ϕ</sup><sup>T</sup> ) is realizable or not (expressed with dark and light gray colours, respectively). We used Strix [27] as the realizability checker. As we can see, there is a correspondence between the expected realizability in <sup>ϕ</sup><sup>T</sup> and the realizability result that Strix returns in <sup>ϕ</sup>B. Indeed, we can see all instances can be solved in less than 7 seconds, and the length of the Boolean formula (characterized by the number of valid reactions) hardly affects performance. This suggests that future work should be focused on reducing time necessary to produce Boolean abstraction to scale even further.

Also, note that Fig. 2 shows remarkable results as for ratios of queries required with respect to the (doubly exponential) brute-force algorithm: e.g., 4792+ 9941 (outer + inner loops) out of the 1.844·10<sup>19</sup> queries that the brute-force algorithm would need, which is less than its 1·10−<sup>13</sup>% (see Fig. <sup>3</sup> for more details). We also compared the performance and number of queries for two different theories T<sup>Z</sup> and T<sup>R</sup> for *Syn (2,3)* to *Syn (2,6)*. Note, again, that the realizability result may vary if a specification is interpreted in different theories, but this is not relevant for the experiment in Fig. 4, which suggests that time results are not dominated by the SMT solver; but, again, from the enclosing abstraction algorithms.

#### **6 Related Work and Conclusions**

**Related Work.** Constraint LTL [11] extends LTL with the possibility of expressing constraints between variables at bounded distance (of time). The theories considered are a restricted form of T<sup>Z</sup> with only comparisons with additional restrictions to overcome undecidability. In comparison, we do not allow predicates to compare variables at different timesteps, but we prove decidability for all theories with an ∃∗∀<sup>∗</sup> decidable fragment. LTL modulo theories is studied in [12,19] for finite traces and they allow temporal operators within predicates, leading the logic to undecidability.

As for works closest to ours, [7] proposes numerical LTL synthesis using an interplay between an LTL synthesizer and a non-linear real arithmetic checker. However, [7] overapproximates the power of the system and hence it is not precise for realizability. Linear arithmetic games are studied in [13] introducing algorithms for synthesizing winning strategies for non-reactive specifications. Also, [22] considers infinite theories (like us), but it does not guarantee success or termination, whereas our Boolean abstraction is complete. They only consider safety, while our approach considers all LTL. The follow-up [23] has still similar limitations: only liveness properties that can be reduced to safety are accepted, and guarantees termination only for the unrealizability case. Similarly, [18] is incomplete, and requires a powerful solver for many quantifier alternations, which can be reduced to 1-alternation, but at the expense of the algorithm being no longer sound for the unrealizable case (e.g., depends on Z3 not answering "unknown"). As for [34], it (1) only considers safety/liveness GR(1) specifications, (2) is limited to the theory of fixed-size vectors and requires (3) quantifier elimination (4) and guidance. We only require ∃<sup>∗</sup>∀<sup>∗</sup>-satisfiability (for Boolean abstraction) and we consider multiple infinite theories. The usual main difference is that Boolean abstraction generates a (Boolean) LTL specification so that existing tools can be used with any of their internal techniques and algorithms (bounded synthesis, for example) and will automatically benefit from further optimizations. Moreover, it preserves fragments like safety and GR(1) so specialized solvers can be used. On the contrary, all approaches above adapt one specific technique and implement it in a monolithic way.

Temporal Stream Logic (TSL) [16] extends LTL with complex data that can be related accross time, making use of a new *update* operator <sup>y</sup> <sup>←</sup> fx, to indicate that y receives the result of applying function f to variable x. TSL is later extended to theories in [15,25]. In all these works, realizability is undecidable. Also, in [8] reactive synthesis and syntax guided synthesis (SyGuS) [1] collaborate in the synthesis process, and generate executable code that guarantees reactive and data-level properties. It also suffers from undecidability: both due to the undecidability of TSL [16] and of SyGus [6]. In comparison, we cannot relate values accross time but we provide a decidable realizability procedure.

Comparing TSL with LTL<sup>T</sup> , TSL is undecidable already for safety, the theory of equality and Presburger arithmetic. More precisely, TSL is only known to be decidable for three fragments (see Thm. 7 in [15]). TSL is (1) semi-decidable for the reachability fragment of TSL (i.e., the fragment of TSL that only permits the next operator and the eventually operator as temporal operators); (2) decidable for formulae consisting of only logical operators, predicates, updates, next operators, and at most one top-level eventually operator; and (3) semi-decidable for formulae with one cell (i.e., controllable outputs). All the specifications considered for empirical evaluation in Sect. 5 are not within the considered decidable or semi-decidable fragments. Also, TSL allows (finite) uninterpreted predicates, whereas we need to have predicates well defined within the semantics of theories of specifications for which we perform Boolean abstraction.

**Conclusion.** The main contribution of this paper is to show that LTL<sup>T</sup> is decidable via a Boolean abstraction technique for all theories of data with a decidable ∃<sup>∗</sup>∀<sup>∗</sup> fragment. Our algorithms create, from a given LTL<sup>T</sup> specification where atomic propositions are literals in such a theory, an equi-realizable specification with Boolean atomic propositions. We also have introduced efficient algorithms using SAT solvers for efficiently traversing the search space. A SAT formula encodes the space of reactions to be explore and our algorithms reduce this space by learning uninteresting areas from each reaction explores. The fastest algorithm uses a two layer SAT nested encoding, in a DPLL(T) fashion. This search yields dramatically more efficient running times and makes Boolean abstraction applicable to larger cases. We have performed an empirical evaluation of implementations of our algorithms. We found empirically that the best performances are obtained when there is a balance in the number of queries made by each layer of the SAT-search. To the best of our knowledge, this is the first method to propose a solution (and efficient) to realizability for general ∃<sup>∗</sup>∀<sup>∗</sup> decidable theories, which include, for instance, the theories of integers and reals.

Future work includes first how to improve scalability further. We plan to leverage quantifier elimination procedures [9] to produce candidates for the sets of valid reactions and then check (and correct) with faster algorithms. Also, optimizations based in quasi-reactions can be enhanced if state-of-the-art tools for satisfiability core search (e.g., [2,3,24]) are used. Another direction is to extend our realizability method into a synthesis procedure by synthesizing functions in T to produces witness values of variables controlled by the system given (1) environment and system moves in the Boolean game, and (2) environment values (consistent with the environment move). Finally, we plan to study how to extend LTL<sup>T</sup> with controlled transfer of data accross time preserving decidability.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Certified Verification for Algebraic Abstraction**

Ming-Hsien Tsai4, Yu-Fu Fu2, Jiaxiang Liu5(B) , Xiaomu Shi3, Bow-Yaw Wang1, and Bo-Yin Yang<sup>1</sup>

<sup>1</sup> Academia Sinica, Taipei, Taiwan *{*bywang,byyang*}*@iis.sinica.edu.tw <sup>2</sup> Georgia Institute of Technology, Atlanta, USA yufu@gatech.edu <sup>3</sup> Institute of Software, Chinese Academy of Sciences, Beijing, China xshi0811@gmail.com <sup>4</sup> National Institute of Cyber Security, Taipei, Taiwan mhtsai208@gmail.com <sup>5</sup> Shenzhen University, Shenzhen, China jiaxiang0924@gmail.com

**Abstract.** We present a certified algebraic abstraction technique for verifying bit-accurate non-linear integer computations. In algebraic abstraction, programs are lifted to polynomial equations in the abstract domain. Algebraic techniques are employed to analyze abstract polynomial programs; SMT QF BV solvers are adopted for bit-accurate analysis of soundness conditions. We explain how to verify our abstraction algorithm and certify verification results. Our hybrid technique has verified non-linear computations in various security libraries such as BITCOIN and OPENSSL. We also report the certified verification of Number-Theoretic Transform programs from the post-quantum cryptosystem KYBER.

## **1 Introduction**

Bit-accurate non-linear integer computations are infamously hard to verify. Conventional bit-accurate techniques such as bit blasting do not work well for non-linear computations. Approximation techniques through floating-point computation on the other hand are inaccurate. Non-linear integer computation nonetheless is essential to computer cryptography. Analyzing complex non-linear computation in cryptographic libraries is still one of the most challenging problems of the utmost importance today.

In this paper, we address the verification problem through algebraic abstraction. In algebraic abstraction, abstract programs are represented by polynomial equations. Non-linear computation about abstract polynomial programs is analyzed algebraically and hence more efficiently through techniques from commutative algebra. Algebraic abstraction however is unsound due to overflow in bounded integer computation. We characterize soundness conditions with queries using the Quantifier-Free Bit-Vector (QF BV) logic from Satisfiability Modulo Theories (SMT) [2]. SMT solvers are then used to check soundness conditions before applying algebraic abstraction.

Our hybrid technique takes advantages of both algebraic and bit-accurate analyses. Non-linear algebraic properties are verified algebraically. Polynomials are computed and analyzed by algorithms from commutative algebra. Coefficients, variables and arithmetic functions are atomic in such algorithms. Our algebraic analysis is hence very efficient for non-linear computation. Soundness conditions, on the other hand, require bit-accurate analysis. Our technique applies SMT QF BV solvers to check soundness conditions. By combining algebraic with bit-accurate analyses, algebraic abstraction successfully verifies non-linear computation in real-world cryptographic programs.

Cryptographic programs undoubtedly are widely deployed critical software. Errors in their verification need to be minimized. To this end, we use the proof assistant COQ [4] to verify the soundness theorem for algebraic abstraction. To ensure the correctness of external algebraic and bit-accurate analysis tools, results from external tools are certified in our technique as well. With verified abstraction and certified external results, verification of bit-accurate non-linear integer computation through algebraic abstraction is certified. We explain how to certify our hybrid verification technique.

We evaluate our certified technique with cryptographic programs from security libraries in BITCOIN [27], BORINGSSL [8,12], NSS [20], OPENSSL [23] and PQCRYPTO-SIDH [18]. These programs compute field and group operations in elliptic curve cryptography. We also verify Number-Theoretic Transform (NTT) programs from the post-quantum cryptosystem KYBER [6]. In lattice-based post-quantum cryptography, computation in polynomial rings is needed. NTT is a discrete variant of the Fast Fourier Transform used for polynomial multiplication in KYBER. Our certified algebraic abstraction technique verifies cryptographic programs from elliptic curve and post-quantum cryptography successfully. Our contributions are summarized as follows.


*Related Work.* GFVERIF employs an ad hoc technique to verify non-linear computation in cryptographic programs with a computer algebra system [3]. CRYPTOLINE [9,24,29] is a tool designed for the specification and verification of cryptographic assembly codes. Its verification algorithm utilizes computer algebra systems in addition to SMT solvers. CRYPTOLINE is also leveraged to verify cryptographic C programs [9,17]. The optimized KYBER NTT program for avx2 is verified in [15], but the underlying verification algorithm is left unexplained. None of these works certified their verification results. Users had to trust these verification tools. BVCRYPTOLINE certifies algebraic abstraction but not soundness conditions [29]. It does not allow multiple moduli in modular equations either. Particularly, it cannot concisely specify NTT by the Chinese remainder theorem over polynomial rings. Compared with these works, our technique admits modular equations with multiple moduli in assumptions and assertions, and is fully certified. To explicate our advantages, consider the specification of multiplication in the field <sup>Z</sup>p<sup>434</sup>/<sup>x</sup><sup>2</sup> + 1 where <sup>p</sup><sup>434</sup> is a prime number. An element in the field is of the form u<sup>0</sup> + u1x where x<sup>2</sup> +1 = 0. To specify r<sup>0</sup> + r1x is the product of u<sup>0</sup> + u1x and v<sup>0</sup> + v1x, one can write two modular equations with one modulo: <sup>r</sup><sup>0</sup> <sup>≡</sup> <sup>u</sup>0v<sup>0</sup> <sup>−</sup> <sup>u</sup>1v<sup>1</sup> mod [p434] and <sup>r</sup><sup>1</sup> <sup>≡</sup> <sup>u</sup>0v<sup>1</sup> <sup>+</sup> <sup>u</sup>1v<sup>0</sup> mod [p434]. With multiple moduli, we write <sup>r</sup><sup>0</sup> <sup>+</sup> <sup>r</sup>1<sup>x</sup> <sup>≡</sup> (u<sup>0</sup> <sup>+</sup> <sup>u</sup>1x)(v<sup>0</sup> <sup>+</sup> <sup>v</sup>1x) mod [p434, x<sup>2</sup> + 1] succinctly. Our simple specifications are most useful for complicated fields such as <sup>Z</sup>p381/<sup>x</sup><sup>2</sup> + 1, y<sup>3</sup> <sup>−</sup> <sup>x</sup> <sup>−</sup> <sup>1</sup>, z - <sup>2</sup> <sup>−</sup> <sup>y</sup>. Each element of the complex field is of the form ui,j,kx<sup>i</sup> <sup>y</sup><sup>j</sup> <sup>z</sup><sup>k</sup> with <sup>0</sup> <sup>≤</sup> i, k < <sup>2</sup> and <sup>0</sup> <sup>≤</sup> j < <sup>3</sup>. Twelve modular equations are needed previously. One modular equation with multiple moduli suffices to specify its field multiplication in this work. Furthermore, our technique is verified in COQ. The correctness of our abstraction algorithm and soundness theorem are formally proven in COQ. We also show how to certify results from external tools. In summary, the correctness of algebraic abstraction algorithm is verified and answers from external tools are certified. Verification results are therefore fully certified. We believe this is the best guarantee a model checker can offer. Our verified model checker is sufficiently practical to verify industrial cryptographic programs too!

Analysis of linear polynomial programs was discussed, for instance, in [21,22]. The reduction from the root entailment problem to the ideal membership problem is discussed in [14]. In this work, the computer algebra system SINGULAR [13] is employed to compute standard bases of ideals and certificates. The certified SMT QF BV solver COQQFBV [26] is adopted to certify soundness conditions.

The paper is organized as follows. Section 2 gives the needed backgrounds. It is followed by the syntax and semantics of the language TOYLANG. An implementation of the unsigned Montgomery reduction is given as a running example (Sect. 3). Section 4 presents algebraic abstraction and its verification algorithms. We briefly describe certified verification of algebraic abstraction in Sect. 5. Section 6 shows experimental results of real-world cryptographic programs. We conclude in Sect. 7.

## **2 Preliminaries**

Let N and Z denote the set of non-negative and all integers respectively. Fix a set of variables **x**. We write Z[**x**] for the set of polynomials in variables **x** with coefficients in <sup>Z</sup>. A polynomial *equation* is of the form <sup>e</sup> <sup>=</sup> <sup>e</sup> with e, e <sup>∈</sup> <sup>Z</sup>[**x**]; a polynomial *modular equation* is of the form <sup>e</sup> <sup>≡</sup> <sup>e</sup> mod [f0, f1,...,fm] with e, e , f0, f1,...,f<sup>m</sup> <sup>∈</sup> <sup>Z</sup>[**x**]. A *valuation* ρ of **x** is a mapping from **x** to Z. Given a valuation ρ, a polynomial e *evaluates* to the integer e[ρ] by replacing every variable x with ρ(x). A valuation ρ is a *root* of the equation <sup>e</sup> <sup>=</sup> <sup>e</sup> if (<sup>e</sup> <sup>−</sup> <sup>e</sup> )[ρ]=0. A valuation ρ is a *root* of the modular equation <sup>e</sup> <sup>≡</sup> <sup>e</sup> mod [f0, f1,...,fm] if (<sup>e</sup> <sup>−</sup> <sup>e</sup> )[ρ] = z0f0[ρ] + z1f1[ρ] + ··· <sup>+</sup> <sup>z</sup>mfm[ρ] for some <sup>z</sup>0, z1,...,z<sup>m</sup> <sup>∈</sup> <sup>Z</sup>. A *(modular) equation* is an equation or a modular equation. A *system* of (modular) equations is a set of (modular) equations. A *root* of a system of (modular) equations is a common root of every (modular) equation in the system. Let Φ be a system of (modular) equations and φ a (modular) equation, roots of <sup>Φ</sup> *entail* roots of <sup>φ</sup> (written <sup>∀</sup>**x**.Φ <sup>=</sup><sup>⇒</sup> <sup>φ</sup>) if all roots of <sup>Φ</sup> are also roots of <sup>φ</sup>. Given <sup>Φ</sup> and <sup>φ</sup>, the *root entailment* problem is to decide whether <sup>∀</sup>**x**.Φ <sup>=</sup><sup>⇒</sup> <sup>φ</sup>.

An *ideal* in <sup>Z</sup>[**x**] generated by <sup>f</sup>0, f1,...,f<sup>m</sup> <sup>∈</sup> <sup>Z</sup>[**x**] is defined by <sup>f</sup>0, f1,...,fm <sup>=</sup> {f0h<sup>0</sup> <sup>+</sup> <sup>f</sup>1h<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>f</sup>mhm|h0, h1,...,h<sup>m</sup> <sup>∈</sup> <sup>Z</sup>[**x**]}. If f0, f1, ..., fm and g0, g1,..., gn are ideals, define their *sum* <sup>f</sup>0, f1,...,fm <sup>+</sup> <sup>g</sup>0, g1,...,gn <sup>=</sup> f0, f1,...,fm, g0, g1,..., gn. For instance, <sup>x</sup> <sup>=</sup> {xf|<sup>f</sup> <sup>∈</sup> <sup>Z</sup>[**x**]} and -<sup>6</sup> <sup>+</sup> -<sup>10</sup> <sup>=</sup> -<sup>2</sup>. Given <sup>f</sup> <sup>∈</sup> <sup>Z</sup>[**x**] and an ideal <sup>I</sup>, the *ideal membership problem* is to decide whether f ∈ I.

A *bit-vector* is a bit sequence of a *width* w. A bit-vector denotes an integer between <sup>0</sup> and <sup>2</sup><sup>w</sup> <sup>−</sup> <sup>1</sup> inclusively using the most-significant-bit-first representation. The SMT QF BV logic defines bit-vector functions. Assume bv<sup>0</sup> and bv<sup>1</sup> are bit-vectors of width w. The addition (*bvadd* bv<sup>0</sup> bv1) and subtraction (*bvsub* bv<sup>0</sup> bv1) functions return bitvectors of width w representing the sum and difference respectively. The multiplication function (*bvmul* bv<sup>0</sup> bv1) returns the least significant w bits of the product. The left shift function (*bvshl* bv<sup>0</sup> n) shifts bv<sup>0</sup> to the left by n bits; the logical right shift function (*bvlshr* bv<sup>0</sup> n) shifts bv<sup>0</sup> to the right by n bits. The zero extension function (*zero extend* bv<sup>0</sup> n) appends n most significant 0's to bv0. The extraction function (*bvextract* h l bv0) extracts bits indexed <sup>h</sup> to <sup>l</sup> from bv<sup>0</sup> (w>h <sup>≥</sup> <sup>l</sup> <sup>≥</sup> <sup>0</sup>). An SMT QF BV *expression* is constructed from bit-vector values, variables, and functions. An SMT QF BV *assertion* is of the form (*assert* <sup>⊥</sup>), (*assert* (= be be )), or (*assert* (*not* (= be be ))) with SMT QF BV expressions be and be . An SMT QF BV *query* is a set of SMT QF BV assertions. A *store* is a mapping from bit-vector variables to bit-vector values. An SMT QF BV expression *evaluates* to a bit-vector value on a store. An SMT QF BV assertion (*assert* (= be be )) is *satisfied* by a store if be and be evaluate to the same bit-vector value on the store, and otherwise (*assert* (*not* (= be be ))) is satisfied. The SMT QF BV assertion (*assert* <sup>⊥</sup>) is never satisfied. An SMT QF BV query is *satisfiable* if all assertions are satisfied by a store.

## **3** TOYLANG

We consider a register transfer language called TOYLANG to illustrate algebraic abstraction. For clarity, many programming constructs are removed from TOYLANG. The language nevertheless is sufficiently expressive to implement Montgomery reduction [19], an indispensable algorithm found in real-world cryptographic programs.

### **3.1 Syntax and Semantics**

The syntax of TOYLANG is shown in Fig. 1. For simplicity, we assume all numbers are unsigned and all variables are of widths 1 or w. Variables of width 1 are also called *bit* variables. An *atom* is a number or a variable.

TOYLANG supports several arithmetic instructions: addition (ADD), carrying addition (ADDS), addition-with-carry (ADC), carrying addition-with-carry (ADCS), subtraction (SUB), borrowing subtraction (SUBS), half- (MUL) and full-multiplication (MULL). Moreover, logical left shift (SHL) and logical right shift (SHR) instructions are allowed. In addition to assignments, (modular) equations can be specified in assumption (ASSUME) or assertion (ASSERT) instructions. A program is a sequence of instructions. We assume ASSERT instructions can only appear at the end of programs. They specify a (modular) equation to be verified and thus are emphasized with a framed box.

**Fig. 2.** TOYLANG – Semantics

Let <sup>σ</sup> be a store. We write <sup>σ</sup>[<sup>v</sup> <sup>→</sup> bv] for the store obtained by mapping <sup>v</sup> to the bit-vector bv and other variables u to σ(u). [[v]]<sup>σ</sup> represents the bit-vector σ(v) for any variable v; otherwise, [[n]]<sup>σ</sup> is the bit-vector representing the number n of width w.

The semantics of TOYLANG is defined with SMT QF BV bit-vector functions (Fig. 2). In the figure, (|σ, s, σ <sup>|</sup>) denotes that the store <sup>σ</sup> is obtained after executing the instruction s on the store σ. The addition instruction ADD corresponds to the bit-vector addition function. For the addition with carry instruction, the carry bit is extended with <sup>w</sup> <sup>−</sup> <sup>1</sup> zeros and added to the sum of the first two operands. The two carrying addition instructions compute the bit-vector sums of width w + 1. The most significant bit is stored in the output carry bit. Subtraction instructions are similar; their semantics are defined with the bit-vector subtraction function *bvsub* instead. The semantics of SHL and SHR instructions are defined by corresponding bit-vector functions *bvshl* and *bvlshr* respectively. The semantics of half-multiplication instruction MUL uses the bit-vector multiplication function *bvmul*. For full-multiplication, both operands are extended to width 2w before computing their product.

$$\begin{aligned} \{n\}\_{\sigma} &= n & \{\v \: \mathbb{I} \!\!/ \!e \!\!/ \!e \!\!/ \!e \!\!/ \!e \!\!/ \!e \} \\ \{\!\!\!e\_0 \pm e\_1\}\_{\sigma} &= \{\!\!\!e\_0\}\_{\sigma} \pm \{\!\!e\_1\}\_{\sigma} & \{\!\!\!e\_0 \times e\_1\}\_{\sigma} &= \{\!\!\!e\_0\}\_{\sigma} \cdot \{\!\!\!e\_1\}\_{\sigma} \\ \{\!\!\!\!e\_0\}\_{\sigma} &= \{\!\!\!e\_1\}\_{\sigma} & \{\!\!\!e\_0\}\_{\sigma} - \{\!\!\!e\_1\}\_{\sigma} \in \langle\!\!\!\!f\_0\!\!\!/ \!e\_1, \{\!\!\!\!f\_1\}\_{\sigma}, \ldots, \{\!\!\!\!f\_m\!\!\!\!/ \!e\_1\} \\ \end{aligned}$$

**Fig. 3.** Semantics of (Modular) Equations

The ASSUME instruction filters computations by (modular) equations. Figure 3 defines when a store satisfies a (modular) equation. A number n denotes a non-negative integer. A variable denotes the integer toZ([[v]]σ) represented by the corresponding bitvector [[v]]<sup>σ</sup> in the store. Arithmetic operations denote corresponding integer operations. Particularly, the integer {|e|}<sup>σ</sup> is exact and not necessarily less than <sup>2</sup><sup>w</sup>. Equality denotes integer equality. <sup>σ</sup> satisfies <sup>e</sup><sup>0</sup> <sup>≡</sup> <sup>e</sup><sup>1</sup> mod [f0, f1,...,fm] if {|e0|}<sup>σ</sup> − {|e1|}<sup>σ</sup> is in the ideal generated by {|f0|}σ, {|f1|}σ,..., {|fm|}σ. The ASSERT instruction checks if the current store satisfies the given (modular) equation. The computation resumes if it *succeeds*. It is an error if the ASSERT instruction *fails*.

$$\begin{array}{ll} \text{(\* } R = 2^{64}, 0 \le T < R^{2} \text{ \*)}\\ \text{(\* } N \cdot N' + 1 \equiv 0 \text{ mod } R^{\*}) & \text{ASUME } N \times N' + 1 \equiv 0 \text{ mod } [2^{64}]\\ m \leftarrow ((T \bmod R) \cdot N') \text{ mod } R & m & \leftarrow \text{NULL } T\_{L} \text{ \*)}\\ t \leftarrow (T + m \cdot N) / R & m N\_{H} : m N\_{L} \leftarrow \text{NULL } m \text{ \*}\\ & \text{carry :} t\_{L} & \text{\rightarrow \text{ADCS } T\_{H} \text{ } m N\_{L}}\\ & \text{\\_ASEART } t\_{L} \equiv 0 \text{ mod } [2^{64}]\\ \text{(\* } t \cdot R \equiv T \bmod N \text{ \*)}\\ (\* \text{ \\_it Algorithm } & \text{\\_(\text{)} \text{ \\_itType} \text{ \\_of \text{\\_}}{\text{ }} \text{ \\_} 2^{64} \equiv T\_{H} \times 2^{64} + T\_{L} \text{ mod } [N])\\ \text{(\* } t \cdot R \equiv T \bmod N \text{ \*)}\\ & \text{(\*\* } \text{\\_it Algorithm } & \text{\\_(\text{)} \text{ \\_TyrL \"M\"C } \text{\\_} \text{\\_of \text{\\_}} \end{array}$$

**Fig. 4.** Simplified Montgomery Reduction

Montgomery reduction algorithm is widely used to compute remainders without division [19]. Figure 4a shows a simplified unsigned Montgomery reduction algorithm.<sup>1</sup> Suppose we want to compute the remainder of a number <sup>0</sup> <sup>≤</sup> T <R<sup>2</sup> modulo <sup>N</sup> on 64 bit architectures with R = 2<sup>64</sup>. Montgomery reduction algorithm needs another number <sup>N</sup> with NN+1 <sup>≡</sup> 0 mod <sup>R</sup> as an input. It first computes <sup>m</sup> = ((<sup>T</sup> mod <sup>R</sup>)N ) mod R and then t = (T + mN)/R. Observe that the remainder and quotient divided by

<sup>1</sup> The complete algorithm requires range analysis not discussed in this work.

R = 2<sup>64</sup> amount to bit masking and shifting respectively. Arithmetic division is never used. To prove tR <sup>≡</sup> <sup>T</sup> mod <sup>N</sup>, we first show <sup>T</sup> <sup>+</sup> mN <sup>≡</sup> 0 mod <sup>R</sup>. Observe <sup>T</sup> <sup>+</sup> mN = T + (((T mod R)N ) mod <sup>R</sup>)<sup>N</sup> <sup>≡</sup> <sup>T</sup> <sup>+</sup> T N <sup>N</sup> <sup>≡</sup> <sup>T</sup>(1 + <sup>N</sup> <sup>N</sup>) <sup>≡</sup> 0 mod <sup>R</sup>. Therefore, T + mN is a multiple of R and t = (T + mN)/R is an integer. Hence tR <sup>=</sup> <sup>T</sup> <sup>+</sup> mN <sup>≡</sup> <sup>T</sup> mod <sup>N</sup>.

In the TOYLANG implementation (Fig. 4b), we represent T by two 64-bit variables T<sup>H</sup> and T<sup>L</sup> with T = 264T<sup>H</sup> +TL. Hence T<sup>L</sup> = T mod 264. m is computed by the halfmultiplication instruction MUL. The full-multiplication computes the product mN of m and N. The following two addition instructions compute the sum of T and the product mN. After adding T, the least significant 64 bits (tL) should be zeros. We hence assert <sup>t</sup><sup>L</sup> <sup>≡</sup> 0 mod [264]. If the assertion succeeds, <sup>t</sup><sup>L</sup> is in fact <sup>0</sup> since it is a 64-bit variable. We thus assume t<sup>L</sup> = 0. The last assertion checks that the result 264(2<sup>64</sup>c + tH) is indeed congruent to T modulo N.

### **4 Algebraic Abstraction**

Algebraic abstraction is a technique to lift computation to an algebraic domain. In the abstract algebraic domain, program instructions are transformed to polynomial equations. Computation in turn is characterized by the roots of systems of polynomial equations. Algebraic abstraction hence allows us to apply algebraic tools from commutative algebra. The abstraction technique requires programs in the static single assignment form. We hence assume input programs are in the static single assignment form.

$$
\begin{aligned}
\left[v \leftarrow \text{ADD } a\_0 \, a\_1\right] &= \left\{v = a\_0 + a\_1\right\} \quad \left[v \leftarrow \text{ADC } a\_0 \, a\_1 \, d\right] = \left\{v = a\_0 + a\_1 + d\right\} \\
&\left[c : v \leftarrow \text{ADDS } a\_0 \, a\_1\right] = \left\{c \cdot (c - 1) = 0, c \cdot 2^w + v = a\_0 + a\_1\right\} \\
&\left[c : v \leftarrow \text{ADCS } a\_0 \, a\_1 \, d\right] = \left\{c \cdot (c - 1) = 0, c \cdot 2^w + v = a\_0 + a\_1 + d\right\} \\
&\left[v \leftarrow \text{SUB } a\_0 \, a\_1\right] = \left\{v = a\_0 - a\_1\right\} \quad \left[v \leftarrow \text{MUL } a\_0 \, a\_1\right] = \left\{v = a\_0 \cdot a\_1\right\} \\
&\left[c : v \leftarrow \text{SUB } a\_0 \, a\_1\right] = \left\{c \cdot (c - 1) = 0, v = a\_0 - a\_1 + c \cdot 2^w\right\} \\
&\left[v \leftarrow \text{SHL } a\_1 \, a\_1\right] = \left\{v\_H \cdot 2^w + v\_L = a\_0 \cdot a\_1\right\} \\
&\left[v \leftarrow \text{SHL } a\_1\right] = \left\{v = a \cdot 2^n\right\} \quad \left[v \leftarrow \text{SH } a \, a\right] = \left\{v \cdot 2^n = a\right\} \\
&\left[\text{ASSME } q\right] = \left\{q\right\} \quad \left[\,^p F\right] = \left[\,^q\right] \cup \left[\,^p\right]
\end{aligned}
$$

#### **Fig. 5.** Algebraic Abstraction

Figure 5 lifts TOYLANG instructions to polynomial equations. Intuitively, we would like the semantics of each instruction characterized by roots of corresponding polynomial equations. For instance, <sup>v</sup> <sup>←</sup> ADD <sup>a</sup><sup>0</sup> <sup>a</sup><sup>1</sup> is lifted to <sup>v</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> <sup>+</sup> <sup>a</sup>1. The ADC instruction is similar. The carrying addition instruction <sup>c</sup> : <sup>v</sup> <sup>←</sup> ADDS <sup>a</sup><sup>0</sup> <sup>a</sup><sup>1</sup> is lifted to two equations: <sup>c</sup> · (<sup>c</sup> <sup>−</sup> 1) = 0 and <sup>c</sup> · <sup>2</sup><sup>w</sup> <sup>+</sup> <sup>v</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> <sup>+</sup> <sup>a</sup>1. Since <sup>c</sup> is a carry, it must be <sup>0</sup> or <sup>1</sup>, and hence a root of <sup>c</sup> · (<sup>c</sup> <sup>−</sup> 1) = 0. The carrying addition-with-carry instruction ADCS is similar, as well as subtraction instructions SUB and SUBS.

The half-multiplication instruction <sup>v</sup> <sup>←</sup> MUL <sup>a</sup><sup>0</sup> <sup>a</sup><sup>1</sup> is lifted to <sup>v</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> · <sup>a</sup>1; the fullmultiplication instruction <sup>v</sup><sup>H</sup> : <sup>v</sup><sup>L</sup> <sup>←</sup> MULL <sup>a</sup><sup>0</sup> <sup>a</sup><sup>1</sup> corresponds to <sup>v</sup><sup>H</sup> ·2<sup>w</sup> <sup>+</sup>v<sup>L</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> ·a1.

**Fig. 6.** Abstract Montgomery Reduction

The logical left shift instruction <sup>v</sup> <sup>←</sup> SHL a n corresponds to <sup>v</sup> <sup>=</sup> <sup>a</sup> · <sup>2</sup><sup>n</sup>; the logical right shift instruction <sup>v</sup> <sup>←</sup> SHR a n is lifted to <sup>v</sup> · <sup>2</sup><sup>n</sup> <sup>=</sup> <sup>a</sup>. The ASSUME <sup>q</sup> instruction is lifted to the (modular) equation q. All computations thus must satisfy q.ATOYLANG program is lifted to the system of (modular) equations from its instructions. The system of (modular) equations is called the *abstract polynomial program*. Figure 6 shows the abstract polynomial program for the Montgomery reduction program.

#### **4.1 Soundness Conditions**

Algebraic abstraction in Fig. 5 however is unsound. The TOYLANG semantics is defined over bounded integers of bit width w. Polynomial equations in algebraic abstraction are interpreted over integers. When overflow occurs in TOYLANG instructions, for instance, its computation is not captured by corresponding polynomial equations. Consider the instruction <sup>v</sup> <sup>←</sup> ADD <sup>2</sup><sup>w</sup>−<sup>1</sup> <sup>2</sup><sup>w</sup>−<sup>1</sup>. By the TOYLANG semantics, <sup>v</sup> has the bit-vector value *bvadd* [[2<sup>w</sup>−<sup>1</sup>]]σ[[2<sup>w</sup>−<sup>1</sup>]]<sup>σ</sup> = 0 after execution. Clearly, 0 is not a root of the equation v = 2<sup>w</sup>−<sup>1</sup> + 2<sup>w</sup>−<sup>1</sup>. The abstraction is unsound.

In order to check soundness for algebraic abstraction, we define soundness conditions for TOYLANG instructions to ensure that all computations are captured by corresponding polynomial equations. Intuitively, we give an SMT QF BV query for each instruction in a TOYLANG program such that the query is satisfiable if and only if the computation at the instruction can overflow.

To this end, we first use SMT QF BV logic to characterize computations in TOY-LANG programs. Recall TOYLANG programs are in the static single assignment form. Figure 7 defines an SMT QF BV query P for any TOYLANG program P. Except the ASSUME instruction, the figure follows the semantics of TOYLANG. For instance, v ← ADC a<sup>0</sup> a<sup>1</sup> d asserts v equal to the bit-vector sum of a<sup>0</sup> and a<sup>1</sup> with d extended by <sup>w</sup>−<sup>1</sup> zeros in the SMT QF BV query. Others are similar. It is not hard to see that all computations of a TOYLANG program satisfy the corresponding SMT QF BV query.

**Lemma 1.** *Let* P *be a* TOYLANG *program without* ASSERT *instructions and* σ, σ *stores with* (|σ, P, σ <sup>|</sup>)*. Then the SMT QF BV query* P *is satisfied by the store* <sup>σ</sup> *.*

**Fig. 7.** Soundness Conditions I

Our next task is to define SMT QF BV queries for instructions such that their algebraic abstraction is unsound if and only if the corresponding SMT QF BV query is satisfiable (Fig. 8). The instruction <sup>v</sup> <sup>←</sup> ADD <sup>a</sup><sup>0</sup> <sup>a</sup><sup>1</sup> is lifted to <sup>v</sup> <sup>=</sup> <sup>a</sup>0+a1. The abstraction is unsound when there is carry. That is, (*bvextract* w w (*bvadd* (*zero extend* a<sup>0</sup> 1) (*zero extend* a<sup>1</sup> 1))) is 1. The instructions ADC and SUB are similar. Algebraic abstraction for the instructions ADDS, ADCS and SUBS is always sound. Their corresponding SMT QF BV queries are not satisfiable (*assert* <sup>⊥</sup>). For the half-multiplication <sup>v</sup> <sup>←</sup> MUL <sup>a</sup><sup>0</sup> <sup>a</sup>1, its abstraction <sup>v</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> · <sup>a</sup><sup>1</sup> is unsound when the most significant w bits of the product of a<sup>0</sup> and a<sup>1</sup> are not all zeros. The corresponding SMT QF BV query is hence (*assert* (*not* (= 0 (*bvextract* (2<sup>w</sup> <sup>−</sup> 1) w bvx)))) where bvx is the bit-vector product of a<sup>0</sup> and a1. The abstraction for full-multiplication instruction is never unsound. For the v ← SHL a<sup>0</sup> n instruction, its algebraic abstraction is unsound if the most significant n bits of a<sup>0</sup> are not zeros. The algebraic abstraction of the v ← SHR a<sup>0</sup> n instruction is unsound when the least significant n bits of a<sup>0</sup> are not zeros. Relevant bits are obtained by *bvextract* respectively. The abstraction for ASSUME is always sound.

To check soundness of the algebraic abstraction s for the instruction s in the TOYLANG program P s, we apply Lemma 1 to obtain a computation of P through P and check if s for s is unsatisfiable. We say the soundness condition for the instruction s in the TOYLANG program P s *holds* if P s is unsatisfiable. In order to ensure the soundness of the abstract polynomial program P for the TOYLANG program P, soundness conditions for all instructions in P must hold. That is, soundness conditions

**Fig. 8.** Soundness Conditions II

for s in all prefixes P s of P must hold. Define the valuation ρ<sup>σ</sup> of the store σ by <sup>ρ</sup>σ(v) = toZ([[v]]σ) for every <sup>v</sup> <sup>∈</sup> **<sup>x</sup>**. The next theorem gives the soundness condition.

**Proposition 1 (Soundness).** *Let* P *be a* TOYLANG *program without* ASSERT *instructions and* σ, σ *stores with* (|σ, P, σ <sup>|</sup>)*.* <sup>ρ</sup><sup>σ</sup> *is a root of the system of (modular) equations* P *if soundness conditions for* s *in every prefix* P s *of* P *hold.*

We say that the soundness condition for P *holds* if soundness conditions for s in all prefixes P s of P hold. Let us take a closer look at the abstract Montgomery reduction program (Fig. 6). The half-multiplication instruction m ← MUL T<sup>L</sup> N is lifted to <sup>m</sup> <sup>=</sup> <sup>T</sup><sup>L</sup> · <sup>N</sup> . However, the soundness condition for the instruction requires the most significant 64 bits of the product to be zeros (Fig. 8). Since T<sup>L</sup> is arbitrary, the soundness condition does not hold in general. To obtain a sound algebraic abstraction for Montgomery reduction, we modify the TOYLANG program slightly (Fig. 9).

In the revised program, the first full-multiplication instruction is used to compute the least significant <sup>64</sup> bits of the product of <sup>T</sup><sup>L</sup> and <sup>N</sup> (marked by <sup>√</sup>). The most significant 64 bits of the product are stored in the variable dc (for *don't care*). Note that the soundness condition of the revised program holds trivially. The algebraic abstraction for the revised Montgomery reduction program is sound by Proposition 1.

#### **4.2 Polynomial Program Verification**

Let P be a TOYLANG program without ASSERT instructions. Our goal is to verify P ASSERT φ with algebraic abstraction. Consider the system of (modular) equations <sup>Φ</sup> <sup>=</sup> P. For any stores <sup>σ</sup> and <sup>σ</sup> with (|σ, P, σ <sup>|</sup>), <sup>ρ</sup><sup>σ</sup>is a root of Φ if the soundness

$$\begin{array}{ll} \text{ASSUME } N \times N' + 1 \equiv 0 \bmod \left[ 2^{64} \right] & N \times N' + 1 \equiv 0 \bmod \left[ 2^{64} \right] \\ \sqrt{\qquad} & dc: m \!\!m \!\!M \!\!Lash T\_{L} \!\!N' & dc \cdot 2^{64} + m \!\!m \!\!Lash T\_{L} \!\!M \\ & m N\_{H}: m N\_{L} \gets \text{MUL } m \; N & m N\_{H} \cdot 2^{64} + m N\_{L} = m \cdot N, \\ & cary: t\_{L} \; \leftarrow \text{ADUS } T\_{L} \; m N\_{L} & carray \cdot (carry - 1) = 0, \\ & cary: 2^{64} + t\_{L} = T\_{L} + m N\_{L}, \\ & c: t\_{H} \; \leftarrow \text{ADCS } T\_{H} \; m N\_{H} \; carray \; r & c \; \!c \; (c - 1) = 0, \\ & \begin{matrix} c \cdot (c - 1) = 0, \\ \text{ASENT } t\_{L} \equiv 0 \text{ mod } \left[ 2^{64} \right] \end{matrix} \\ \text{ASSUME } t\_{L} = 0 \\ & \begin{matrix} \text{ASERT } (c \times 2^{64} + t\_{H}) \times 2^{64} \equiv T\_{H} \times 2^{64} + T\_{L} \; \text{mod } \left[ N \right] \end{matrix} \end{array}$$

**Fig. 9.** Abstract Montgomery Reduction (Revised)

condition for P holds by Proposition 1. To verify ASSERT φ on σ , we need to check if ρ<sup>σ</sup>is also a root of the (modular) equation <sup>φ</sup>. That is, we want to show if <sup>∀</sup>**x**.Φ <sup>=</sup><sup>⇒</sup> <sup>φ</sup>.

**Proposition 2.** *Let* P *be a* TOYLANG *program without* ASSERT *instructions and* φ *a (modular) equation. Suppose the soundness condition for* P *holds. The assertion in* <sup>P</sup> ASSERT <sup>φ</sup> *succeeds if* <sup>∀</sup>**x**.P <sup>=</sup><sup>⇒</sup> <sup>φ</sup>*.*

We extend [14] to check the root entailment problem. Recall that Φ is a system of (modular) equations. We first simplify it to a system of equations. This is best seen by an example. Consider <sup>∀</sup>x y u v.x <sup>≡</sup> <sup>y</sup> mod [3u<sup>2</sup>, u <sup>+</sup> <sup>v</sup>] =<sup>⇒</sup> 0=0. We have

$$\begin{aligned} \forall x \, y \, u \, v. x &\equiv y \text{ mod } [3u^2, u+v] \implies 0 = 0\\ \text{iff } \forall x \, y \, u \, v. [\exists k\_0 \, k\_1 (x-y = 3u^2 \cdot k\_0 + (u+v) \cdot k\_1)] &\implies 0 = 0\\ \text{iff } \forall x \, y \, u \, v \, k\_0 \, k\_1. x - y &= 3u^2 \cdot k\_0 + (u+v) \cdot k\_1 \implies 0 = 0. \end{aligned}$$

Therefore, it suffices to consider the problem of checking <sup>∀</sup>**x**.Ψ <sup>=</sup><sup>⇒</sup> <sup>φ</sup> where <sup>Ψ</sup> is a system of equations and φ is a (modular) equation. We solve the simplified problem by constructing instances of the ideal membership problem.

Let <sup>Ψ</sup> <sup>=</sup> {e<sup>0</sup> <sup>=</sup> <sup>e</sup> <sup>0</sup>, e<sup>1</sup> = e <sup>1</sup>,...,e<sup>n</sup> = e <sup>n</sup>}. Consider the ideal <sup>I</sup> <sup>=</sup> e<sup>0</sup> − e <sup>0</sup>, e<sup>1</sup> − e <sup>1</sup>,...,e<sup>n</sup> − e <sup>n</sup> generated by the polynomial equations in Ψ. Suppose the polynomial <sup>e</sup> <sup>−</sup> <sup>e</sup> <sup>∈</sup> <sup>I</sup>. We claim <sup>∀</sup>**x**.Ψ <sup>=</sup><sup>⇒</sup> <sup>e</sup> <sup>=</sup> <sup>e</sup> . Indeed, <sup>e</sup> <sup>−</sup> <sup>e</sup> = (e<sup>0</sup> <sup>−</sup> <sup>e</sup> <sup>0</sup>) · <sup>h</sup><sup>0</sup> + (e<sup>1</sup> <sup>−</sup> e <sup>1</sup>) · <sup>h</sup><sup>1</sup> <sup>+</sup> ··· + (e<sup>n</sup> <sup>−</sup> <sup>e</sup> <sup>n</sup>) · <sup>h</sup><sup>n</sup> for some <sup>h</sup>0, h1,...,h<sup>n</sup> <sup>∈</sup> <sup>Z</sup>[**x**] since <sup>e</sup> <sup>−</sup> <sup>e</sup> <sup>∈</sup> <sup>I</sup>. For any root <sup>ρ</sup> of <sup>Ψ</sup>, (e<sup>0</sup> <sup>−</sup> <sup>e</sup> <sup>0</sup>)[ρ]=(e<sup>1</sup> <sup>−</sup> <sup>e</sup> <sup>1</sup>)[ρ] = ··· = (e<sup>n</sup> <sup>−</sup> <sup>e</sup> <sup>n</sup>)[ρ]=0. Hence (<sup>e</sup> <sup>−</sup> <sup>e</sup> )[ρ] = ((e<sup>0</sup> <sup>−</sup> <sup>e</sup> <sup>0</sup>) · <sup>h</sup>0)[ρ] + ((e<sup>1</sup> <sup>−</sup> <sup>e</sup> <sup>1</sup>) · <sup>h</sup>1)[ρ] + ··· + ((e<sup>n</sup> <sup>−</sup> <sup>e</sup> <sup>n</sup>) · <sup>h</sup>n)[ρ]=0. <sup>ρ</sup> is also a root of <sup>e</sup> <sup>−</sup> <sup>e</sup> = 0 and thus <sup>∀</sup>**x**.Ψ <sup>=</sup><sup>⇒</sup> <sup>e</sup> <sup>=</sup> <sup>e</sup> .

Now suppose the polynomial <sup>e</sup> <sup>−</sup> <sup>e</sup> <sup>∈</sup> <sup>I</sup> <sup>+</sup> <sup>f</sup>0, f1,...,fm. We claim <sup>∀</sup>**x**.Ψ <sup>=</sup><sup>⇒</sup> <sup>e</sup> <sup>≡</sup> <sup>e</sup> mod [f0, f1,...,fm]. Since <sup>e</sup> <sup>−</sup> <sup>e</sup> <sup>∈</sup> <sup>I</sup> <sup>+</sup> <sup>f</sup>0, f1,...,fm, <sup>e</sup> <sup>−</sup> <sup>e</sup> = (e<sup>0</sup> <sup>−</sup> e <sup>0</sup>) · <sup>h</sup><sup>0</sup> + (e<sup>1</sup> <sup>−</sup> <sup>e</sup> <sup>1</sup>) · <sup>h</sup><sup>1</sup> <sup>+</sup> ··· + (e<sup>n</sup> <sup>−</sup> <sup>e</sup> <sup>n</sup>) · <sup>h</sup><sup>n</sup> <sup>+</sup> <sup>f</sup><sup>0</sup> · <sup>k</sup><sup>0</sup> <sup>+</sup> <sup>f</sup><sup>1</sup> · <sup>k</sup><sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>f</sup><sup>m</sup> · <sup>k</sup><sup>m</sup> for some <sup>h</sup>0, h1,...,hn, k0, k1,...,k<sup>m</sup> <sup>∈</sup> <sup>Z</sup>[**x**]. For any root <sup>ρ</sup> of <sup>Ψ</sup>, (<sup>e</sup> <sup>−</sup> <sup>e</sup> )[ρ] = ((e<sup>0</sup> <sup>−</sup> <sup>e</sup> <sup>0</sup>) · <sup>h</sup>0)[ρ] + ((e<sup>1</sup> <sup>−</sup> <sup>e</sup> <sup>1</sup>) · <sup>h</sup>1)[ρ] + ··· + ((e<sup>n</sup> <sup>−</sup> <sup>e</sup> <sup>n</sup>) · <sup>h</sup>n)[ρ] + <sup>f</sup><sup>0</sup> · <sup>k</sup>0[ρ] + <sup>f</sup><sup>1</sup> · <sup>k</sup>1[ρ] + ··· <sup>+</sup> <sup>f</sup><sup>m</sup> · <sup>k</sup>m[ρ]=0+ <sup>f</sup>0[ρ]k0[ρ] + <sup>f</sup>1[ρ]k1[ρ] + ··· <sup>+</sup> <sup>f</sup>m[ρ]km[ρ]. We again have <sup>∀</sup>**x**.Ψ <sup>=</sup><sup>⇒</sup> <sup>e</sup> <sup>≡</sup> <sup>e</sup> mod [f0, f1,...,fm] as required.

Our discussion is summarized as follows.

$$e = e' \leadsto \langle e - e' \rangle$$

$$e \equiv e' \bmod \left[ f\_0, f\_1, \dots, f\_m \right] \leadsto \left\langle e - e' - f\_0 \cdot k\_0 - f\_1 \cdot k\_1 - \dots - f\_m \cdot k\_m \right\rangle$$

$$k\_0, k\_1, \dots, k\_m \text{ : fresh variables}$$

$$\overbrace{\Phi \leadsto \langle 0 \rangle}^{\Phi \leadsto} \underbrace{I \quad \Phi \leadsto J}\_{\{\phi\} \cup \Phi \leadsto I + J}$$

#### **Fig. 10.** Polynomial Programs to Ideals

**Proposition 3.** *Let* P *be a* TOYLANG *program without* ASSERT *instructions and* I *the ideal with* P -I *(Fig. 10). Then*

*1.* <sup>∀</sup>**x**.P <sup>=</sup><sup>⇒</sup> <sup>e</sup> <sup>=</sup> <sup>e</sup> *if* <sup>e</sup> <sup>−</sup> <sup>e</sup> <sup>∈</sup> <sup>I</sup>*; 2.* <sup>∀</sup>**x**.P <sup>=</sup><sup>⇒</sup> <sup>e</sup> <sup>≡</sup> <sup>e</sup> mod [f0, f1,...,fm] *if* <sup>e</sup> <sup>−</sup> <sup>e</sup> <sup>∈</sup> <sup>I</sup> <sup>+</sup> f0, f1,...,fm*.*

In order to verify (modular) equations with algebraic abstraction, Proposition 1 is applied to ensure the soundness of abstraction. Proposition 3 then checks whether (modular) equations indeed are satisfied for abstract polynomial programs. The main theorem summarizes our theoretical developments.

**Theorem 1.** *Let* P *be a* TOYLANG *program without* ASSERT *instructions,* σ, σ *stores with* (|σ, P, σ <sup>|</sup>) *and* <sup>I</sup> *the ideal with* P -I*. If the soundness condition for* P *holds,*


The ideal membership problem can be solved by computing Grobner bases for ide- ¨ als [7]. Many computer algebra systems compute Grobner bases for ideals with simple ¨ commands. For instance, the **groebner** command in SINGULAR [13] computes a Grobner basis for any ideal by a user-specified monomial ordering. The ¨ **reduce** command then checks if a polynomial belongs to the ideal via its Grobner basis. ¨

Recall the abstract polynomial program for revised Montgomery reduction in Fig. 9. Figure 11a shows the ideal for the abstract polynomial program before ASSUME t<sup>L</sup> = 0. To verify the two ASSERT instructions, Figs. 11b and 11c show the instances of the ideal membership problem corresponding to the two assertions. Observe the ideal tL corresponds to ASSUME t<sup>L</sup> = 0 in Fig. 11c. Since the soundness condition for the abstract polynomial program holds trivially (Sect. 4.1), it remains to check the ideal membership problem. Both instances are verified immediately.

#### **5 Certified Verification**

In TOYLANG, we only highlight necessary instructions to verify unsigned Montgomery reduction. For real-world programs performing non-linear computation, more instructions are needed and the signed representation of bit-vectors is also used. In order to ver-

$$I = \left\langle \begin{array}{c} N \cdot N' + 1 - k\_0 \cdot 2^{64}, dc \cdot 2^{64} + m - T\_L \cdot N', mN\_H \cdot 2^{64} + mN\_L - m \cdot N, \\\\ carry \cdot (carry - 1), carry \cdot 2^{64} + t\_L - (T\_L + mN\_L), c \cdot (c - 1), \\\\ c \cdot 2^{64} + t\_H - (T\_H + mN\_H + cary) \end{array} \right\rangle$$
 
$$\text{(a) Ideal } I$$
 
$$t\_L \in I + \langle 2^{64} \rangle$$
 
$$\text{(b) } \boxed{\text{ASSERT } t\_L \equiv 0 \bmod \left[ 2^{64} \right]}$$
 
$$\left\langle c \cdot 2^{64} + t\_H \right\rangle \cdot 2^{64} - (T\_H \cdot 2^{64} + T\_L) \in I + \langle t\_L \rangle + \langle N \rangle$$
 
$$\text{(c) } \boxed{\text{ASSERT } (c \times 2^{64} + t\_H) \times R \equiv T\_H \times 2^{64} + T\_L \bmod \left[ N \right]}$$

**Fig. 11.** Instances of Ideal Membership Problem

ify real-world cryptographic programs, we extend algebraic abstraction with these features found in CRYPTOLINE [9,29]. For such complicated languages, algebraic abstraction can be tedious to implement. Its verification algorithm moreover relies on complex algorithms from computer algebra systems and SMT QF BV solvers. It is unclear whether these external tools function correctly on given instances. In order to improve the quality of verification results, we have verified algebraic abstraction with the proof assistant COQ, and certified results from external tools with COQ and a verified certificate checker. We briefly describe how to verify our algorithms and certify results from external tools. Please see the technical report [28] for details.

#### **5.1 Verified Abstraction Algorithm**

The proof assistant COQ with the SSREFLECT library [4,11] is used to verify our algebraic abstraction technique. We define the TOYLANG syntax as a COQ data type (Fig. 1). The COQ-NBITS theory [26] is adopted to formalize the semantics of TOY-LANG (Fig. 2). The COQ binary integer theory Z is used to formalize the semantics of (modular) equations (Fig. 3). We formalize polynomial expressions with integral coefficients by the COQ polynomial expression theory PExpr Z.

To see how our algebraic abstraction algorithm is verified, consider Proposition 2. Let program be the COQ data type for TOYLANG programs and meqn the data type for (modular) equations. We define the predicate algsnd : program → Prop for the soundness condition for a given program (Figs. 7 and 8). Similarly, we define the function algabs : program → seq meqn for our algebraic abstraction algorithm where seq meqn is the COQ data type for sequences of meqn (Fig. 5). To write down the formal statement for Proposition 2, it remains to formalize the root entailment. Let exp and valuation be the data types for expressions and valuations respectively. Define the function eval exp : exp → valuation → Z which evaluates an expression to an integer on a valuation; and eval exps : seq exp → valuation → seq Z evaluates expressions to integers on a valuation. Consider the predicate eval bexp : meqn → valuation → Prop defined by

eval bexp (e = e') rho := eval exp e rho = eval exp e' rho eval bexp (e = e' mod fs) rho := ∃ks, (eval exp e rho) - (eval exp e' rho) = zadds (zmuls ks (eval exps fs rho))

where zadds zs := foldl Z.add 0 zs and zmuls xs ys := map2 Z.mul xs ys. The predicate eval bexp (e = e') rho checks if the expressions e and e' evaluate to the same integer on the valuation rho; eval bexp (e = e' mod fs) rho checks if the difference of eval exp e rho and eval exp e' rho is equal to a linear combination of the integers eval exps fs rho. The predicate eval bexp meq rho thus checks if rho is a root of the (modular) equation meq.

We are ready to formalize the root entailment. Consider the predicate entails (Phi : seq meqn) (psi : meqn) : Prop defined by

<sup>∀</sup>rho,(∀phi, phi <sup>∈</sup> Phi <sup>→</sup> eval bexp phi rho) <sup>→</sup> eval bexp psi rho.

That is, every common root of the system Phi is also a root of psi. The following proposition formalizes Proposition 2 and is proved in COQ.

**Proposition 4.** *Let P : program be without assert instructions and psi : meqn. If algsnd P and entails (algabs P) psi, then the assertion in P assert psi succeeds.*

To apply this proposition to a given program P and a (modular) equation psi, one needs to show algsnd P and entails (algabs P) psi in COQ. In principle, both predicates algsnd P and entails (algabs P) psi could be proved manually in COQ. However, it would be impractical even for programs of moderate sizes. To address this problem, we establish these predicates through certificates computed by external tools.

#### **5.2 Verification through Certification**

To show algsnd P for an arbitrary program P, we follow the certified verification technique developed in the SMT QF BV solver COQQFBV [26]. More concretely, we specify our bit-blasting algorithm for soundness conditions in COQ (Figs. 7 and 8). The algorithm converts soundness conditions to Boolean formulae in the conjunctive normal form. We then formally verify that soundness conditions hold if and only if the corresponding Boolean formulae are unsatisfiable in COQ. The constructed Boolean formulae are sent to the SAT solver KISSAT [5]. For each Boolean formula, KISSAT checks its satisfiability with a certificate. We then use the verified certificate checker GRATCHK [16] to validate these certificates.

Our next goal is to show entails (algabs P) psi. More generally, we show entails Phi psi with arbitrary Phi : seq meqn and psi : meqn via the COQ polynomial ring theory and the computer algebra system SINGULAR [13]. To this end, we first formulate the root entailment of polynomial expressions in the COQ polynomial ring theory. Recall PExpr Z is the COQ data type for polynomial expressions with integral coefficients. Given integers, the function zpeval : PExpr Z → seq Z → Z evaluates a polynomial expression to an integer. We formalize the root entailment of polynomial expressions by the predicate zpentails (Pi : seq (PExpr Z)) (tau : PExpr Z):

<sup>∀</sup>zs,(∀pi, pi <sup>∈</sup> Pi <sup>→</sup> zpeval pi zs = 0) <sup>→</sup> zpeval tau zs = 0.

We proceed to connect the root entailment of (modular) equations to the root entailment of polynomial expressions. Let the functions zpexprs of exprs : seq expr → seq (PExpr Z) and zpexprs of meqns : seq meqn → seq (PExpr Z) convert expressions and (modular) equations to polynomial expressions respectively (Fig. 10). When the consequence of root entailment is a modular equation, recall that moduli in the consequence become ideal generators (Proposition 3). To extract moduli from consequences, define zpexpr of conseq : meqn → PExpr Z × seq (PExpr Z) by

zpexpr of conseq (e = e') := (e - e', [::]) zpexpr of conseq (e = e' mod fs) := (e - e', zpexprs of exprs fs)

The following COQ lemma shows how to check the root entailment of (modular) equations through the root entailment of polynomial expressions:

**Lemma 2.** ∀ *(Phi : seq meqn) (psi : meqn), zpentails (Pi ++ zpexprs of meqns Phi) tau implies entails Phi psi where (tau, Pi) = zpexpr of conseq psi.*

Note that moduli in the consequence psi are added to the antecedents Phi.

Our last step is to show zpentails (Pi ++ zpexprs of meqns Phi) tau. Again, we establish the generalized form zpentails Pi tau for polynomial expressions Pi and a polynomial expression tau. We prove the predicate by showing that tau can be expressed as a combination of expressions in Pi. Consider the predicate validate zpentails (Xi : seq (PExpr Z)) (Pi : seq (PExpr Z)) (tau : PExpr Z) defined by

size Xi = size Pi ∧ ZPeq (ZPnorm tau) (ZPnorm (foldl ZPadd 0 (map2 ZPmul Xi Pi))).

The predicate validate zpentails checks if the Xi and Pi are of the same size. It then normalizes the polynomials tau and foldl ZPadd 0 (map2 ZPmul Xi Pi) using ZPnorm. If normalized polynomials are equal (ZPeq), the predicate is true. In foldl ZPadd 0 (map2 ZPmul Xi Pi), ZPadd and ZPmul are the constructors for polynomial expression addition and multiplication respectively. The expression map2 ZPmul Xi Pi hence returns products of elements in Xi with corresponding elements in Pi. The expression foldl ZPadd 0 (map2 ZPmul Xi Pi) then computes the sum of these products. The predicate validate zpentails Xi Pi tau therefore checks if tau is equal to a polynomial combination of expressions in Pi. In other words, tau belongs to the ideal generated by Pi. Using Lemma 2, we prove the following variant of Proposition 3 in COQ:

#### **Proposition 5.** ∀ *Phi psi Xi, validate zpentails Xi (Pi ++ zpexprs of meqns Phi) tau implies entails Phi psi where (tau, Pi) = zpexpr of conseq psi.*

The main difference between Propositions 3 and 5 lies in certifiability. There are many ways to establish ideal membership. Proposition 5 asks for witnesses Xi to justify ideal membership explicitly. Most importantly, such Xi need not be constructed manually. They are in fact computed by external tools. Precisely, these polynomial expressions are computed by the **lift** command in the computer algebra system SIN-GULAR [13]. The **lift** command computes polynomial expressions representing tau in the ideal generated by Pi ++ zpexprs of meqns Phi. After SINGULAR computes these polynomial expressions, we convert them to polynomial expressions Xi in COQ. The predicate validate zpentails Xi (Pi ++ zpexprs of meqns Phi) tau checks if tau is indeed represented by Xi using the COQ polynomial ring theory. If the check succeeds, we obtain entails Phi psi by Proposition 5. Otherwise, the predicate entails Phi psi is not established. Note that SINGULAR need not be trusted. If Xi is computed incorrectly, the check validate zpentails Xi (Pi ++ zpexprs of meqns Phi) tau will fail in COQ. Proposition 5 allows us to show entails Phi psi with certification.

#### **5.3 Optimization**

Lots of optimizations are needed and verified to make algebraic abstraction feasible for TOYLANG programs with thousands of instructions. For instance, the static single assignment transformation and program slicing algorithms are both specified and verified in COQ. Furthermore, the bit blasting algorithm is extended significantly to check soundness conditions effectively. For example, the soundness condition for the half-multiplication instruction MUL requires *bvmul* (Fig. 8). This could not work well because of complicated non-linear bit-vector computation. To reduce the complexity of overflow checking in half-multiplication, we implement and verify the algorithm from [10]. Last but not least, algebraic abstraction almost surely induces ideals with hundreds of polynomial generators if not thousands. Computing Grobner bases for such ¨ ideals is infeasible. To address this problem, we develop heuristics to reduce the number of generators in ideals through rewriting. Our heuristics are also specified and verified in COQ. These optimizations are essential in our experiments.

## **6 Evaluation**

We have implemented certified algebraic abstraction in the tool COQCRYPTOLINE [1]. COQCRYPTOLINE is built upon OCAML codes extracted from our COQ development. It calls the computer algebra system SINGULAR [13] and certifies answers from the algebraic tool. The certified SMT QF BV solver COQQFBV [26] is used to verify soundness conditions. We choose two classes of real-world cryptographic programs in experiments. For elliptic curve cryptography, we verify various field or group operations from BITCOIN [27], BORINGSSL [8,12], NSS [20], OPENSSL [23], and PQCRYPTO-SIDH [18]. For post-quantum cryptography, we verify the C reference and optimized Intel avx2 implementations of the Number-Theoretic Transform in the cryptosystem KYBER [6]. Experiments are conducted on an Ubuntu 22.04.1 Linux server with 3.20 GHz 32-core Xeon Gold 6134M and 1TB RAM.

We compare COQCRYPTOLINE with the uncertified CRYPTOLINE [9,24]. Table 1 shows the experimental results. LCL shows the number of instructions. TCCL and TCL give the verification time of COQCRYPTOLINE and CRYPTOLINE in seconds respectively. %Int shows the percentage of time spent in extracted OCAML programs in COQCRYPTOLINE. %CAS and %SMT give the percentages of time spent on SINGU-LAR and COQQFBV respectively.


#### **Table 1.** Experimental Results on Industrial Cryptographic Programs

<sup>a</sup> One (out of three) modular polynomial equation in post-conditions fails to certify due to stack overflow.

#### **6.1 Field and Group Operation in Elliptic Curves**

In elliptic curve cryptography, a rational point on a curve is represented by field elements from a large finite field. Rational points on the curve form a group. The group operation in turn is computed by operations in the underlying finite field. In BITCOIN, the finite field is <sup>Z</sup>p256k<sup>1</sup> with <sup>p</sup>256k1=2<sup>256</sup> <sup>−</sup> <sup>2</sup><sup>32</sup> <sup>−</sup> <sup>2</sup><sup>9</sup> <sup>−</sup> <sup>2</sup><sup>8</sup> <sup>−</sup> <sup>2</sup><sup>7</sup> <sup>−</sup> <sup>2</sup><sup>6</sup> <sup>−</sup> <sup>2</sup><sup>4</sup> <sup>−</sup> <sup>1</sup>. The underlying field for Curve25519 is <sup>Z</sup>p<sup>25519</sup> with <sup>p</sup>25519 = 2<sup>255</sup>−19. PQCRYPTO-SIDH however uses slightly more complicated fields <sup>Z</sup>p<sup>434</sup>/<sup>x</sup><sup>2</sup>+1 and <sup>Z</sup>p<sup>503</sup>/<sup>x</sup><sup>2</sup>+1 with <sup>p</sup>434 = 2<sup>216</sup> · <sup>3</sup><sup>137</sup> <sup>−</sup> <sup>1</sup> and <sup>p</sup>503 = 2<sup>250</sup> · <sup>3</sup><sup>159</sup> <sup>−</sup> <sup>1</sup>. Field elements in <sup>Z</sup>p256k<sup>1</sup> and Zp<sup>25519</sup> are represented by multiple *limbs* of 64-bit numbers. Field multiplication, for instance, is implemented by a number of 64-bit arithmetic instructions. Field elements in <sup>Z</sup>p434/<sup>x</sup><sup>2</sup> + 1 and <sup>Z</sup>p503/<sup>x</sup><sup>2</sup> + 1 are of the form <sup>u</sup> <sup>+</sup> vx where u, v <sup>∈</sup> <sup>Z</sup>p<sup>434</sup> or <sup>Z</sup>p<sup>503</sup> and <sup>x</sup><sup>2</sup> <sup>=</sup> <sup>−</sup>1. Two moduli are used to specify multiplication for such fields: <sup>p</sup>434, <sup>x</sup><sup>2</sup> + 1 for <sup>Z</sup>p434/<sup>x</sup><sup>2</sup> + 1, and <sup>p</sup>503, <sup>x</sup><sup>2</sup> + 1 for <sup>Z</sup>p503/<sup>x</sup><sup>2</sup> + 1. Multiplication of PQCRYPTO-SIDH is easily specified by modular equations with multiple moduli.

COQCRYPTOLINE verifies every field operation with certification within 12.1 min. Group operations are implemented by field operations. Their certified verification thus takes more time. The most complicated case x25519 scalar mult generic (3287 instructions) from BORINGSSL takes about 1.3 h.<sup>a</sup> In comparison, CRYPTOLINE verifies the same program in 4 min without certification. In almost all cases, a majority of time is spent on COQQFBV. Running time for extracted OCAML programs is negligible. Interestingly, COQCRYPTOLINE finds a bug in the arm64 multiplication code for <sup>Z</sup>p<sup>503</sup>/<sup>x</sup><sup>2</sup> + 1 from PQCRYPTO-SIDH. Towards the end of multiplication, the programmer incorrectly stores the register x25 in memory *before* adding a carry. After fixing the bug, COQCRYPTOLINE finishes certified verification in about 5 min.

#### **6.2 Number-Theoretic Transform in Kyber**

The United States National Institute of Standards and Technology (NIST) is currently determining next-generation post-quantum cryptography (PQC) standards. In July 2022, Crystals-KYBER (or simply KYBER) was announced to be the winner for key establishment mechanisms.

One of the most critical steps in KYBER is modular polynomial multiplication over the polynomial ring <sup>R</sup><sup>q</sup> <sup>=</sup> <sup>Z</sup>q[x]/<sup>x</sup><sup>256</sup> + 1 with <sup>q</sup> = 3329. In <sup>R</sup>q, coefficients are elements in the field <sup>Z</sup>q. A polynomial in <sup>R</sup><sup>q</sup> is obtained by modulo <sup>x</sup><sup>256</sup> + 1 and hence has a degree less than <sup>256</sup>. Consider <sup>x</sup><sup>256</sup> <sup>∈</sup> <sup>Z</sup>q[x]. Since <sup>x</sup><sup>256</sup> ≡ −1 mod (x<sup>256</sup> <sup>+</sup> 1), <sup>x</sup><sup>256</sup> is <sup>−</sup><sup>1</sup> in <sup>R</sup>q. Unsurprisingly, polynomial multiplication is one of the most expensive computations in KYBER. An efficient way to multiply polynomials is through a discretized Fast Fourier Transform called the Number-Theoretic Transform (NTT).

Recall the Chinese remainder theorem for integers is but a ring isomorphism between residue systems. For instance, <sup>Z</sup><sup>42</sup> <sup>∼</sup><sup>=</sup> <sup>Z</sup><sup>6</sup> <sup>×</sup> <sup>Z</sup>7. For polynomial rings, we have the following ring isomorphism

$$\mathbb{Z}\_q[x]/\langle x^{2n} - \omega^2 \rangle \cong \mathbb{Z}\_q[x]/\langle x^n - \omega \rangle \times \mathbb{Z}\_q[x]/\langle x^n + \omega \rangle \qquad (\omega \in \mathbb{Z}\_q).$$

Observe that <sup>x</sup><sup>n</sup> is equal to <sup>ω</sup> in <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup> <sup>−</sup>ω for <sup>x</sup><sup>n</sup> <sup>≡</sup> <sup>ω</sup> mod (x<sup>n</sup> <sup>−</sup>ω). Similarly, <sup>x</sup><sup>n</sup> is equal to <sup>−</sup><sup>ω</sup> in <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup> <sup>+</sup> <sup>ω</sup>. Recall polynomials in <sup>Z</sup>q[x]/<sup>x</sup><sup>2</sup><sup>n</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> have degrees less than <sup>2</sup>n. We can rewrite any polynomial in <sup>Z</sup>q[x]/<sup>x</sup><sup>2</sup><sup>n</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> as <sup>f</sup>(x) + g(x)x<sup>n</sup> where degrees of f and g are both less than n. The polynomial f(x) + g(x)x<sup>n</sup> is then equal to <sup>f</sup>(x) + ωg(x) in <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup> <sup>−</sup> <sup>ω</sup>; and it is equal to <sup>f</sup>(x) <sup>−</sup> ωg(x) in <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup>+ω. NTT computes the following ring isomorphism between <sup>Z</sup>q[x]/<sup>x</sup><sup>2</sup><sup>n</sup><sup>−</sup> <sup>ω</sup><sup>2</sup> and <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup>−ω×Zq[x]/<sup>x</sup><sup>n</sup>+ω by substituting <sup>±</sup><sup>ω</sup> for <sup>x</sup><sup>n</sup> in <sup>f</sup>(x)+g(x)x<sup>n</sup>:

$$f(x) + g(x)x^n \leftarrow \left(f(x) + \omega g(x), f(x) - \omega g(x)\right). \tag{1}$$

Multiplication in <sup>Z</sup>q[x]/<sup>x</sup><sup>2</sup><sup>n</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> can therefore be computed by respective multiplications in <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup> <sup>±</sup> <sup>ω</sup> through the isomorphism. That is, a multiplication for polynomials of degrees less than <sup>2</sup><sup>n</sup> (in <sup>Z</sup>q[x]/<sup>x</sup>2<sup>n</sup> <sup>−</sup> <sup>ω</sup>2) is replaced by two multiplications for polynomials of degrees less than <sup>n</sup> (in <sup>Z</sup>q[x]/<sup>x</sup><sup>n</sup> <sup>±</sup> <sup>ω</sup>).

In KYBER, ring isomorphisms are applied repeatedly until linear polynomials are obtained. That is, KYBER NTT computes the isomorphism

$$\mathcal{R}\_q = \mathbb{Z}\_q[x] / \langle x^{256} + 1 \rangle \cong \mathbb{Z}\_q[x] / \langle x^2 - \zeta\_0 \rangle \times \dots \times \mathbb{Z}\_q[x] / \langle x^2 - \zeta\_{127} \rangle \tag{2}$$

where ζ<sup>j</sup> 's are the principal 256-th roots of unity. A polynomial of a degree less than 256 is hence mapped via KYBER NTT to 128 linear polynomials, each modulo a different <sup>x</sup><sup>2</sup> <sup>−</sup> <sup>ζ</sup><sup>j</sup> . In PQCLEAN [25], a reference C implementation and a hand-optimized Intel avx2 assembly implementation of KYBER NTT are provided. In addition to degree reduction, the two implementations utilize signed Montgomery reduction extensively for efficient multiplication over Zq. We verify whether the two NTT implementations compute the ring isomorphism correctly.

To specify the correctness requirements of KYBER NTT, one could write down modular equations (1) according to its computation. Each equation would require explicit substitution. Thanks to modular equations with multiple moduli, a more intuitive and mathematical specification based on (2) is also expressible. Let F = Σ<sup>255</sup> <sup>k</sup>=0fkx<sup>k</sup> denote the input polynomial in <sup>R</sup><sup>q</sup> <sup>=</sup> <sup>Z</sup>q[x]/<sup>x</sup><sup>256</sup> + 1 and the coefficients <sup>f</sup>k's are input variables with <sup>−</sup>q<f<sup>k</sup> < q (<sup>0</sup> <sup>≤</sup> k < <sup>256</sup>). Let <sup>G</sup><sup>j</sup> <sup>=</sup> <sup>g</sup>j,<sup>0</sup> <sup>+</sup>gj,<sup>1</sup><sup>x</sup> be the j-th final output linear polynomial from the implementations. The modular equations

$$F \equiv G\_j \bmod [q, x^2 - \zeta\_j], \text{ for all } 0 \le j < 128$$

specify the correctness of the KYBER NTT implementations. Observe that our specification is almost identical to (2). Modular equations with multiple moduli allow cryptographic programmers to express mathematical specification naturally. They greatly improve usability and reduce specification efforts in algebraic abstraction.

COQCRYPTOLINE verifies the C reference implementation in about 18.6 min. The highly optimized avx2 implementation is verified in about 7.2 min. Observe that each layer of ring isomorphism requires 128 signed Montgomery reductions. KYBER NTT therefore has <sup>7</sup> <sup>×</sup> 128 = 896 Montgomery reductions similar to the running example in Fig. 4b. Algebraic abstraction successfully verifies the two KYBER NTT implementations within 20 min. In comparison, CRYPTOLINE verifies both NTT implementations in 1 min without certification.

#### **7 Conclusion**

Verification through algebraic abstraction combines both algebraic and bit-accurate analyses. Non-linear computation is analyzed algebraically; soundness conditions are checked with bit-accurate SMT QF BV solvers. We describe how to verify the technique and certify its results. In the experiments, the hybrid technique successfully verifies non-linear integer computation found in cryptographic programs from elliptic curve and post-quantum cryptography with certification. We plan to explore more applications of algebraic abstraction in programs from post-quantum cryptography in near future.

**Acknowledgments.** The authors in Academia Sinica are partially funded by National Science and Technology Council grants NSTC110-2221-E-001-008-MY3, NSTC111-2221-E-001- 014-MY3, NSTC111-2634-F-002-019, the Sinica Investigator Award AS-IA-109-M01, the Data Safety and Talent Cultivation Project AS-KPQ-109-DSTCP, and the Intel Fast Verified Postquantum Software Project. The authors in Shenzhen University and ISCAS are partially funded by Shenzhen Science and Technology Innovation Commission (JCYJ20210324094202008), the National Natural Science Foundation of China (62002228, 61836005), and the Natural Science Foundation of Guangdong Province (2022A1515011458, 2022A1515010880).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Complete Multiparty Session Type Projection with Automata

Elaine Li<sup>1</sup> , Felix Stutz2(B) , Thomas Wies<sup>1</sup> , and Damien Zufferey<sup>3</sup>

<sup>1</sup> New York University, New York, USA efl9013@nyu.edu, wies@cs.nyu.edu <sup>2</sup> Max Planck Institute for Software Systems, Kaiserslautern, Germany fstutz@mpi-sws.org <sup>3</sup> SonarSource, Geneva, Switzerland damien.zufferey@sonarsource.com

Abstract. Multiparty session types (MSTs) are a type-based approach to verifying communication protocols. Central to MSTs is a *projection operator*: a partial function that maps protocols represented as global types to correct-by-construction implementations for each participant, represented as a communicating state machine. Existing projection operators are syntactic in nature, and trade efficiency for completeness. We present the first projection operator that is sound, complete, and efficient. Our projection separates synthesis from checking implementability. For synthesis, we use a simple automata-theoretic construction; for checking implementability, we present succinct conditions that summarize insights into the property of implementability. We use these conditions to show that MST implementability is PSPACE-complete. This improves upon a previous decision procedure that is in EXPSPACE and applies to a smaller class of MSTs. We demonstrate the effectiveness of our approach using a prototype implementation, which handles global types not supported by previous work without sacrificing performance.

Keywords: Protocol verification · Multiparty session types · Communicating state machines · Protocol fidelity · Deadlock freedom

## 1 Introduction

Communication protocols are key components in many safety and operation critical systems, making them prime targets for formal verification. Unfortunately, most verification problems for such protocols (e.g. deadlock freedom) are undecidable [11]. To make verification computationally tractable, several restrictions have been proposed [2,3,10,14,33,42]. In particular, multiparty session types (MSTs) [24] have garnered a lot of attention in recent years (see, e.g., the survey by Ancona et al. [6]). In the MST setting, a protocol is specified as a global

E. Li and F. Stutz—equal contribution.

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 350–373, 2023. https://doi.org/10.1007/978-3-031-37709-9\_17

type, which describes the desired interactions of all roles involved in the protocol. Local implementations describe behaviors for each individual role. The implementability problem for a global type asks whether there exists a collection of local implementations whose composite behavior when viewed as a communicating state machine (CSM) matches that of the global type and is deadlock-free. The synthesis problem is to compute such an implementation from an implementable global type.

MST-based approaches typically solve synthesis and implementability simultaneously via an efficient syntactic *projection operator* [18,24,34,41]. Abstractly, a projection operator is a partial map from global types to collections of implementations. A projection operator proj is sound when every global type **G** in its domain is implemented by proj(**G**), and complete when every implementable global type is in its domain. Existing practical projection operators for MSTs are all incomplete (or unsound). Recently, the implementability problem was shown to be decidable for a class of MSTs via a reduction to safe realizability of globally cooperative high-level message sequence charts (HMSCs) [38]. In principle, this result yields a complete and sound projection operator for the considered class. However, this operator would not be practical. In particular, the proposed implementability check is in EXPSPACE.

Contributions. In this paper, we present the first practical sound and complete projection operator for general MSTs. The synthesis problem for implementable global types is conceptually easy [38] – the challenge lies in determining whether a global type *is* implementable. We thus separate synthesis from checking implementability. We first use a standard automata-theoretic construction to obtain a candidate implementation for a potentially non-implementable global type. However, unlike [38], we then verify the correctness of this implementation directly using efficiently checkable conditions derived from the global type. When a global type is not implementable, our constructive completeness proof provides a counterexample trace.

The resulting projection operator yields a PSPACE decision procedure for implementability. In fact, we show that the implementability problem is PSPACE-complete. These results both generalize and tighten the decidability and complexity results obtained in [38].

We evaluate a prototype of our projection algorithm on benchmarks taken from the literature. Our prototype benefits from both the efficiency of existing lightweight but incomplete syntactic projection operators [18,24,34,41], and the generality of heavyweight automata-based model checking techniques [28,36]: it handles protocols rejected by previous practical approaches while preserving the efficiency that makes MST-based techniques so attractive.

#### 2 Motivation and Overview

Incompleteness of Existing Projection Operators. A key limitation of existing projection operators is that the implementation for each role is obtained

Fig. 1. Odd-even: An implementable but not (yet) projectable protocol and its local implementations

via a linear traversal of the global type, and thus shares its structure. The following example, which is not projectable by any existing approach, demonstrates how enforcing structural similarity can lead to incompleteness.

*Example 2.1 (Odd-even).* Consider the following global type **G**oe:

+ p→q:o. q→r:o. μt1. (p→q:o. q→r:o. q→r:o. t<sup>1</sup> + p→q: b. q→r: b. r→p:o. 0) p→q:m. μt2. (p→q:o. q→r:o. q→r:o. t<sup>2</sup> + p→q: b. q→r: b. r→p:m. 0)

A term <sup>p</sup> <sup>→</sup> <sup>q</sup> : m specifies the exchange of message m between sender <sup>p</sup> and receiver q. The term represents two local events observed separately due to asynchrony: a send event <sup>p</sup> <sup>q</sup>!m observed by role <sup>p</sup>, and a receive event <sup>q</sup> <sup>p</sup>?m observed by role <sup>q</sup>. The + operator denotes choice, μt. G denotes recursion, and 0 denotes protocol termination.

Figure 1a visualizes **G**oe as an HMSC. The left and right sub-protocols respectively correspond to the top and bottom branches of the protocol. Role p chooses a branch by sending either o or m to <sup>q</sup>. On the left, <sup>q</sup> echoes this message to <sup>r</sup>. Both branches continue in the same way: <sup>p</sup> sends an arbitrary number of o messages to q, each of which is forwarded twice from q to r. Role p signals the end of the loop by sending b to <sup>q</sup>, which <sup>q</sup> forwards to <sup>r</sup>. Finally, depending on the branch, <sup>r</sup> must send o or m to <sup>p</sup>.

Figures 1b and 1c depict the structural similarity between the global type **G**oe and the implementations for p and q. For the "choicemaker" role p, the reason is evident. Role q's implementation collapses the continuations of both branches in the protocol into a single sub-component. For r (Fig. 1d), the situation is more complicated. Role r does not decide on or learn directly which branch is taken, but can deduce it from the parity of the number of o messages received from <sup>q</sup>: odd means left and even means right. The resulting local implementation features transitions going back and forth between the two branches that do not exist in the global type. Syntactic projection operators fail to create such transitions. -

One response to the brittleness of existing projection operators has been to give up on global type specifications altogether and instead revert to model checking

Fig. 2. High-level message sequence charts for the global types of Example 2.2.

user-provided implementations [28,36]. We posit that what needs rethinking is not the concept of global types, but rather how projections are computed and how implementability is checked.

Our Automata-Theoretic Approach. The synthesis step in our projection operator uses textbook automata-theoretic constructions. From a given global type, we derive a finite state machine, and use it to define a homomorphism automaton for each role. We then determinize this homomorphism automaton via subset construction to obtain a local candidate implementation for each role. If the global type is implementable, this construction always yields an implementation. The implementations shown in Figs. 1b to 1d are the result of applying this construction to **G**oe from Example 2.1. Notice that the state labels in Fig. 1d correspond to sets of labels in the global protocol.

Unfortunately, not all global types are implementable.

*Example 2.2.* Consider the following four global types also depicted in Fig. 2:

$$\begin{aligned} \mathbf{G}\_{r} &= + \begin{cases} \mathfrak{p} \multimap \mathbf{q} : o.\mathbf{q} \multimap \mathbf{r} : o.\mathbf{p} \multimap \mathbf{r} : o.\mathbf{0} \\ \mathfrak{p} \multimap \mathbf{q} : m.\mathbf{p} \multimap \mathbf{r} : o.\mathbf{q} \multimap \mathbf{r} : o.\mathbf{0} \end{cases} & \mathbf{G}\_{s} &= + \begin{cases} \mathfrak{p} \multimap \mathbf{q} : o.\mathbf{r} \multimap \mathbf{q} : o.\mathbf{0} \\ \mathfrak{p} \multimap \mathbf{q} : m.\mathbf{r} \multimap \mathbf{q} : m.\mathbf{0} \end{cases} \\ \mathbf{G}'\_{r} &= + \begin{cases} \mathfrak{p} \multimap \mathbf{q} : o.\mathbf{q} \multimap \mathbf{r} : o.\mathbf{0} \multimap \mathbf{q} : \mathbf{0} \multimap \mathbf{r} : o.\mathbf{0} \\ \mathfrak{p} \multimap \mathbf{q} : m.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{r} : o.\mathbf{0} \multimap \mathbf{0} \end{cases} \\ \mathbf{G}'\_{s} &= + \begin{cases} \mathfrak{p} \multimap \mathbf{q} : o.\mathbf{r} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q} : o.\mathbf{0} \multimap \mathbf{q$$

Similar to **<sup>G</sup>**oe, in all four examples, <sup>p</sup> chooses a branch by sending either o or <sup>m</sup> to <sup>q</sup>. The global type **<sup>G</sup>**<sup>r</sup> is not implementable because <sup>r</sup> cannot learn which branch was chosen by p. For any local implementation of r to be able to execute both branches, it must be able to receive o from <sup>p</sup> and <sup>q</sup> in any order. Because the two send events <sup>p</sup> <sup>r</sup>!o and <sup>q</sup> <sup>r</sup>!o are independent of each other, they may be reordered. Consequently, any implementation of **G**<sup>r</sup> would have to permit executions that are consistent with global behaviors not described by **G**r, such as <sup>p</sup>→q:m· <sup>q</sup>→r:o· <sup>p</sup>→r:o. Contrast this with **<sup>G</sup>** <sup>r</sup>, which is implementable. In the top branch of **G** <sup>r</sup>, role p can only send to r after it has received from r, which prevents the reordering of the send events <sup>p</sup><sup>r</sup>!o and <sup>q</sup><sup>r</sup>!o. The bottom branch is symmetric. Hence, r learns p's choice based on which message it receives first.

For the global type **G**s, role r again cannot learn the branch chosen by p. That is, <sup>r</sup> cannot know whether to send o or m to <sup>q</sup>, leading inevitably to deadlocking executions. In contrast, **G** <sup>s</sup> is again implementable because the expected behavior of r is independent of the choice by p. -

These examples show that the implementability question is non-trivial. To check implementability, we present conditions that precisely characterize when the subset construction for **G** yields an implementation.

Overview. The rest of the paper is organized as follows. Section 3 contains relevant definitions for our work. Section 4 describes the synthesis step of our projection. Section 5 presents the two conditions that characterize implementability of a given global type. In Sect. 6, we prove soundness of our projection via a stronger inductive invariant guaranteeing per-role agreement on a global run of the protocol. In Sect. 7, we prove completeness by showing that our two conditions hold if a global type is implementable. In Sect. 8, we discuss the complexity of our construction and condition checks. Section 9 presents our artifact and evaluation, and Sect. 10 as well as Sect. 11 discuss related work. Additional details including omitted proofs can be found in the extended version of the paper [29].

## 3 Preliminaries

*Words.* Let Σ be a finite alphabet. Σ<sup>∗</sup> denotes the set of finite words over Σ, <sup>Σ</sup><sup>ω</sup> the set of infinite words, and Σ<sup>∞</sup> their union Σ<sup>∗</sup> <sup>∪</sup> Σ<sup>ω</sup>. A word u <sup>∈</sup> Σ<sup>∗</sup> is a *prefix* of word v <sup>∈</sup> Σ<sup>∞</sup>, denoted <sup>u</sup> <sup>≤</sup> <sup>v</sup>, if there exists <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∞</sup> with <sup>u</sup> · <sup>w</sup> <sup>=</sup> <sup>v</sup>.

*Message Alphabet.* Let P be a set of roles and V be a set of messages. We define the set of *synchronous events* <sup>Σ</sup>*sync* := {p→<sup>q</sup> :<sup>m</sup> <sup>|</sup> <sup>p</sup>, <sup>q</sup> ∈ P and <sup>m</sup> ∈ V} where <sup>p</sup>→<sup>q</sup> : m denotes that message m is sent by <sup>p</sup> to <sup>q</sup> atomically. This is split for *asynchronous events*. For a role <sup>p</sup> ∈ P, we define the alphabet <sup>Σ</sup><sup>p</sup>,! <sup>=</sup> {<sup>p</sup> <sup>q</sup>!m <sup>|</sup> <sup>q</sup> ∈ P, m ∈ V} of *send* events and the alphabet <sup>Σ</sup><sup>p</sup>,? <sup>=</sup> {pq?<sup>m</sup> <sup>|</sup> <sup>q</sup> ∈ P, m ∈ V} of *receive* events. The event <sup>p</sup> <sup>q</sup>!m denotes role <sup>p</sup> sending a message m to <sup>q</sup>, and <sup>p</sup> <sup>q</sup>?<sup>m</sup> denotes role <sup>p</sup> receiving a message <sup>m</sup> from <sup>q</sup>. We write <sup>Σ</sup><sup>p</sup> <sup>=</sup> <sup>Σ</sup><sup>p</sup>,! <sup>∪</sup> <sup>Σ</sup><sup>p</sup>,?, <sup>Σ</sup>! <sup>=</sup> - <sup>p</sup>∈P <sup>Σ</sup><sup>p</sup>,!, and <sup>Σ</sup>? <sup>=</sup> - <sup>p</sup>∈P <sup>Σ</sup><sup>p</sup>,?. Finally, <sup>Σ</sup>*async* <sup>=</sup> <sup>Σ</sup>! <sup>∪</sup> <sup>Σ</sup>?. We say that <sup>p</sup> is *active* in <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>*async* if <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>p</sup>. For each role <sup>p</sup> ∈ P, we define a homomorphism ⇓<sup>Σ</sup><sup>p</sup> , where <sup>x</sup>⇓<sup>Σ</sup><sup>p</sup> <sup>=</sup> <sup>x</sup> if <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>p</sup> and <sup>ε</sup> otherwise. We write <sup>V</sup>(w) to project the send and receive events in w onto their messages. We fix <sup>P</sup> and <sup>V</sup> in the rest of the paper.

*Global Types – Syntax.* Global types for MSTs [31] are defined by the grammar:

$$G ::= 0 \quad | \quad \sum\_{i \in I} \mathbf{p} \to \mathbf{q}\_i : m\_i.G\_i \quad | \quad \mu t.G \quad | \quad t$$

where <sup>p</sup>, <sup>q</sup><sup>i</sup> range over <sup>P</sup>, <sup>m</sup><sup>i</sup> over <sup>V</sup>, and <sup>t</sup> over a set of recursion variables.

We require each branch of a choice to be distinct: <sup>∀</sup>i, j <sup>∈</sup> I. i = j <sup>⇒</sup> (q<sup>i</sup>, m<sup>i</sup>) <sup>=</sup> (q<sup>j</sup> , m<sup>j</sup> ), the sender and receiver of an atomic action to be distinct: <sup>∀</sup><sup>i</sup> <sup>∈</sup> I. <sup>p</sup> <sup>=</sup> <sup>q</sup>i, and recursion to be guarded: in μt. G, there is at least one message between μt and each t in G. When <sup>|</sup>I<sup>|</sup> = 1, we omit . For readability, we sometimes use the infix operator + for choice, instead of . When working with a protocol described by a global type, we write **G** to refer to the top-level type, and we use G to refer to its subterms. For the size of a global type, we disregard multiple occurrences of the same subterm.

We use the extended definition of global types from [31] that allows a sender to send messages to different roles in a choice. We call this *sender-driven choice*, as in [38], while it was called generalized choice in [31]. This definition subsumes classical MSTs that only allow *directed choice* [24]. The types we use focus on communication primitives and omit features like delegation or parametrization. We defer a detailed discussion of different MST frameworks to Sect. 11.

*Global Types – Semantics.* As a basis for the semantics of a global type **G**, we construct a finite state machine GAut(**G**)=(Q**<sup>G</sup>**, Σ*sync*, δ**<sup>G</sup>**, q<sup>0</sup>,**<sup>G</sup>**, F**<sup>G</sup>**) where


$$-\ q\_{0, \mathbf{G}} = \mathbf{G} \text{ and } F\_{\mathbf{G}} = \{0\}.$$

We define a homomorphism split onto the asynchronous alphabet:

$$\mathtt{sp1it(p \to q:m)} := \mathtt{p} \rhd \mathtt{q}! m. \mathtt{q} \lhd \mathtt{p}? m.$$

The semantics <sup>L</sup>(**G**) of a global type **<sup>G</sup>** is given by <sup>C</sup><sup>∼</sup>(split(L(GAut(**G**)))) where C<sup>∼</sup> is the closure under the indistinguishability relation ∼ [31]. Two events are independent if they are not related by the *happened-before* relation [26]. For instance, any two send events from distinct senders are independent. Two words are indistinguishable if one can be reordered into the other by repeatedly swapping consecutive independent events. The full definition is in the extended version [29].

*Communicating State Machine* [11]*.* <sup>A</sup> <sup>=</sup> {{A<sup>p</sup>}}<sup>p</sup>∈P is a CSM over <sup>P</sup> and <sup>V</sup> if <sup>A</sup><sup>p</sup> is a finite state machine over <sup>Σ</sup><sup>p</sup> for every <sup>p</sup> ∈ P, denoted by (Q<sup>p</sup>, Σ<sup>p</sup>, δ<sup>p</sup>, q<sup>0</sup>,<sup>p</sup>, F<sup>p</sup>). Let <sup>p</sup>∈P <sup>s</sup><sup>p</sup> denote the set of global states and Chan <sup>=</sup> {(p, <sup>q</sup>) <sup>|</sup> <sup>p</sup>, <sup>q</sup> ∈ P, <sup>p</sup> <sup>=</sup> <sup>q</sup>} denote the set of channels. A *configuration* of <sup>A</sup> is a pair (s, ξ), where s is a global state and ξ : Chan → V<sup>∗</sup> is a mapping from each channel to a sequence of messages. We use s<sup>p</sup> to denote the state of <sup>p</sup> in s. The CSM transition relation, denoted →, is defined as follows.

$$\begin{array}{l} \mathsf{T} \quad (\vec{s},\xi) \xrightarrow{\mathsf{p\multimap\mathsf{q}\,m}} (\vec{s}',\xi') \text{ if } (\vec{s}\_{\mathsf{p}},\mathsf{p}\wedge\mathsf{q}\,m,\vec{s}\_{\mathsf{p}}') \in \delta\_{\mathsf{p}}, \vec{s}\_{\mathsf{r}} = \vec{s}\_{\mathsf{r}}' \text{ for every role } \mathsf{r} \neq \mathsf{p},\\ \xi'(\mathsf{p},\mathsf{q}) = \xi(\mathsf{p},\mathsf{q}) \cdot m \text{ and } \xi'(\mathsf{c}) = \xi(\mathsf{c}) \text{ for every other channel } \mathsf{c} \in \mathsf{Chan}.\\ \mathsf{T} \quad (\vec{s},\xi) \xrightarrow{\mathsf{q\multimap\mathsf{p}\,m}} (\vec{s}',\xi') \text{ if } (\vec{s}\_{\mathsf{q}},\mathsf{q} \wedge\mathsf{p}\,m,\vec{s}\_{\mathsf{q}}') \in \delta\_{\mathsf{q}}, \vec{s}\_{\mathsf{r}} = \vec{s}\_{\mathsf{r}}' \text{ for every role } \mathsf{r} \neq \mathsf{q},\\ \xi(\mathsf{p},\mathsf{q}) = m \cdot \xi'(\mathsf{p},\mathsf{q}) \text{ and } \xi'(\mathsf{c}) = \xi(\mathsf{c}) \text{ for every other channel } \mathsf{c} \in \mathsf{Chan}. \end{array}$$

In the initial configuration (s<sup>0</sup>, ξ<sup>0</sup>), each role's state in s<sup>0</sup> is the initial state <sup>q</sup><sup>0</sup>,<sup>p</sup> of <sup>A</sup><sup>p</sup>, and <sup>ξ</sup><sup>0</sup> maps each channel to <sup>ε</sup>. A configuration (s, ξ) is said to be *final* iff s<sup>p</sup> is final for every <sup>p</sup> and <sup>ξ</sup> maps each channel to <sup>ε</sup>. Runs and traces are defined in the expected way. A run is *maximal* if either it is finite and ends in a final configuration, or it is infinite. The language <sup>L</sup>(A) of the CSM <sup>A</sup> is defined as the set of maximal traces. A configuration (s, ξ) is a *deadlock* if it is not final and has no outgoing transitions. A CSM is *deadlock-free* if no reachable configuration is a deadlock.

Finally, implementability is formalized as follows.

Definition 3.1 (Implementability [31]). *A global type* **G** *is* implementable *if there exists a CSM* {{Ap}}p∈P *such that the following two properties hold: (i)* protocol fidelity*:* <sup>L</sup>({{A<sup>p</sup>}}<sup>p</sup>∈P ) = <sup>L</sup>(**G**)*, and (ii)* deadlock freedom*:* {{A<sup>p</sup>}}<sup>p</sup>∈P *is deadlock-free. We say that* {{A<sup>p</sup>}}<sup>p</sup>∈P *implements* **<sup>G</sup>***.*

## 4 Synthesizing Implementations

The construction is carried out in two steps. First, for each role <sup>p</sup> ∈ P, we define an intermediate state machine GAut(**G**)↓<sup>p</sup> that is a homomorphism of GAut(**G**). We call GAut(**G**)↓<sup>p</sup> the *projection by erasure* for <sup>p</sup>, defined below.

Definition 4.1 (Projection by Erasure). *Let* **G** *be some global type with its state machine* GAut(**G**)=(Q**<sup>G</sup>**, Σ*sync*, δ**<sup>G</sup>**, q<sup>0</sup>,**<sup>G</sup>**, F**<sup>G</sup>**)*. For each role* <sup>p</sup> ∈ P*, we define the state machine* GAut(**G**)↓<sup>p</sup> = (Q**<sup>G</sup>**, Σ<sup>p</sup>  {ε}, δ<sup>↓</sup>, q<sup>0</sup>,**<sup>G</sup>**, F**<sup>G</sup>**) *where* <sup>δ</sup><sup>↓</sup> := {<sup>q</sup> *split*(a)⇓Σ<sup>p</sup> −−−−−−−−→ q <sup>|</sup> q <sup>a</sup> −→ q <sup>∈</sup> δ**<sup>G</sup>**}*. By definition of split*(*-*)*, it holds that split*(a)⇓<sup>Σ</sup><sup>p</sup> <sup>∈</sup> <sup>Σ</sup><sup>p</sup>  {ε}*.*

Then, we determinize GAut(**G**)↓<sup>p</sup> via a standard subset construction to obtain a deterministic local state machine for p.

Definition 4.2 (Subset Construction). *Let* **G** *be a global type and* p *be a role. Then, the* subset construction *for* p *is defined as*

$$\mathcal{C}(\mathbf{G}, \mathbf{p}) = \left(Q\_{\mathbf{p}}, \Sigma\_{\mathbf{p}}, \delta\_{\mathbf{p}}, s\_{0, \mathbf{p}}, F\_{\mathbf{p}}\right) \text{ } where \mathbf{p}$$

*–* <sup>δ</sup>(s, a) := {q <sup>∈</sup> <sup>Q</sup>**<sup>G</sup>** | ∃<sup>q</sup> <sup>∈</sup> s, q <sup>a</sup> −→ <sup>ε</sup> −→<sup>∗</sup> <sup>q</sup> <sup>∈</sup> <sup>δ</sup><sup>↓</sup>}, *for every* <sup>s</sup> <sup>⊆</sup> <sup>Q</sup>**<sup>G</sup>** *and* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup><sup>p</sup> *–* <sup>s</sup><sup>0</sup>,<sup>p</sup> := {<sup>q</sup> <sup>∈</sup> <sup>Q</sup>**<sup>G</sup>** <sup>|</sup> <sup>q</sup><sup>0</sup>,**<sup>G</sup>** ε −→<sup>∗</sup> q <sup>∈</sup> δ<sup>↓</sup>}*, –* <sup>Q</sup><sup>p</sup> := lfp<sup>⊆</sup> {s0,p}λQ. Q ∪ {δ(s, a) <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>Q</sup> <sup>∧</sup> <sup>a</sup> <sup>∈</sup> <sup>Σ</sup><sup>p</sup>} \ {∅} *, and –* <sup>δ</sup><sup>p</sup> := <sup>δ</sup>|<sup>Q</sup>p×Σ<sup>p</sup> *–* <sup>F</sup><sup>p</sup> := {<sup>s</sup> <sup>∈</sup> <sup>Q</sup><sup>p</sup> <sup>|</sup> <sup>s</sup> <sup>∩</sup> <sup>F</sup>**<sup>G</sup>** <sup>=</sup> ∅}

Note that the construction ensures that <sup>Q</sup><sup>p</sup> only contains subsets of <sup>Q</sup>**<sup>G</sup>** whose states are reachable via the same traces, i.e. we typically have <sup>|</sup>Q<sup>p</sup>| <sup>2</sup><sup>|</sup>Q**G**<sup>|</sup> .

The following characterization is immediate from the subset construction; the proof can be found in the extended version [29].

Lemma 4.3. *Let* **<sup>G</sup>** *be a global type,* <sup>r</sup> *be a role, and <sup>C</sup>* (**G**, <sup>r</sup>) *be its* subset construction*. If* <sup>w</sup> *is a trace of* GAut(**G**)*, split*(w)⇓<sup>Σ</sup><sup>r</sup> *is a trace of <sup>C</sup>* (**G**, <sup>r</sup>)*. If* <sup>u</sup> *is a trace of <sup>C</sup>* (**G**, <sup>r</sup>)*, there is a trace* <sup>w</sup> *of* GAut(**G**) *such that split*(w)⇓<sup>Σ</sup><sup>r</sup> <sup>=</sup> <sup>u</sup>*. It holds that* <sup>L</sup>(**G**)⇓<sup>Σ</sup><sup>r</sup> <sup>=</sup> <sup>L</sup>(*<sup>C</sup>* (**G**, <sup>r</sup>))*.*

Using this lemma, we show that the CSM {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}p∈P preserves all behaviors of **G**.

## Lemma 4.4. *For all global types* **<sup>G</sup>***,* <sup>L</sup>(**G**) ⊆ L({{*<sup>C</sup>* (**G**, <sup>p</sup>)}}p∈P )*.*

We briefly sketch the proof here. Given that {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}p∈P is deterministic, to prove language inclusion it suffices to prove the inclusion of the respective prefix sets:

$$\text{pref}(\mathcal{L}(\mathbf{G})) \subseteq \text{pref}(\mathcal{L}\{\mathcal{C}(\mathbf{G}, \mathbf{p})\}\_{\mathbf{p} \in \mathcal{P}})$$

Let <sup>w</sup> be a word in <sup>L</sup>(**G**). If <sup>w</sup> is finite, membership in <sup>L</sup>({{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P ) is immediate from the claim above. If w is infinite, we show that w has an infinite run in {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P using König's Lemma. We construct an infinite graph <sup>G</sup><sup>w</sup>(V,E) with <sup>V</sup> := {v<sup>ρ</sup> <sup>|</sup> trace(ρ) <sup>≤</sup> <sup>w</sup>} and <sup>E</sup> := {(v<sup>ρ</sup><sup>1</sup> , v<sup>ρ</sup><sup>2</sup> ) | ∃ <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>*async*. trace(ρ<sup>2</sup>) = trace(ρ<sup>1</sup>) · <sup>x</sup>}. Because {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P is deterministic, <sup>G</sup><sup>w</sup> is a tree rooted at v<sup>ε</sup>, the vertex corresponding to the empty run. By König's Lemma, every infinite tree contains either a vertex of infinite degree or an infinite path. Because {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P consists of a finite number of communicating state machines, the last configuration of any run has a finite number of next configurations, and G<sup>w</sup> is finitely branching. Therefore, there must exist an infinite path in G<sup>w</sup> representing an infinite run for <sup>w</sup>, and thus <sup>w</sup> ∈ L({{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P ).

The proof of the inclusion of prefix sets proceeds by structural induction and primarily relies on Lemma 4.3 and the fact that all prefixes in <sup>L</sup>(**G**) respect the order of send before receive events.

#### 5 Checking Implementability

We now turn our attention to checking implementability of a CSM produced by the subset construction. We revisit the global types from Example 2.2 (also shown in Fig. 2), which demonstrate that the naive subset construction does not always yield a sound implementation. From these examples, we distill our conditions that precisely identify the implementable global types.

In general, a global type **G** is not implementable when the agreement on a global run of GAut(**G**) among all participating roles cannot be conveyed via sending and receiving messages alone. When this happens, roles can take locally permitted transitions that commit to incompatible global runs, resulting in a trace that is not specified by **G**. Consequently, our conditions need to ensure that when a role <sup>p</sup> takes a transition in *<sup>C</sup>* (**G**, <sup>p</sup>), it only commits to global runs that are consistent with the local views of all other roles. We discuss the relevant conditions imposed on send and receive transitions separately.

Send Validity. Consider **<sup>G</sup>**<sup>s</sup> from Example 2.2. The CSM {{*<sup>C</sup>* (**G**<sup>s</sup>, <sup>p</sup>)}}<sup>p</sup>∈P has an execution with the trace <sup>p</sup><sup>q</sup>!o·qp?o·r<sup>q</sup>!m. This trace is possible because the initial state of *<sup>C</sup>* (**G**<sup>s</sup>, <sup>r</sup>), <sup>s</sup><sup>0</sup>,<sup>r</sup>, contains two states of GAut(**G**<sup>s</sup>)↓r, each of which has a single outgoing send transition labeled with <sup>r</sup><sup>q</sup>!o and <sup>r</sup><sup>q</sup>!m respectively. Both of these transitions are always enabled in s<sup>0</sup>,<sup>r</sup>, meaning that <sup>r</sup> can send r <sup>q</sup>!m even when <sup>p</sup> has chosen the top branch and <sup>q</sup> expects to receive o instead of m from <sup>r</sup>. This results in a deadlock. In contrast, while the state <sup>s</sup>0,<sup>r</sup> in *<sup>C</sup>* (**G** <sup>s</sup>, <sup>r</sup>) likewise contains two states of GAut(**G** <sup>s</sup>)↓r, each with a single outgoing send transition, now both transitions are labeled with <sup>r</sup> <sup>q</sup>!b. These two transitions collapse to a single one in *<sup>C</sup>* (**G** <sup>s</sup>, <sup>r</sup>). This transition is consistent with both possible local views that p and q might hold on the global run.

Intuitively, to prevent the emergence of inconsistent local views from send transitions of *<sup>C</sup>* (**G**, <sup>p</sup>), we must enforce that for every state <sup>s</sup> <sup>∈</sup> <sup>Q</sup><sup>p</sup> with an outgoing send transition labeled x, a transition labeled x must be enabled in all states of GAut(**G**)↓<sup>p</sup> represented by <sup>s</sup>. We use the following auxiliary definition to formalize this intuition subsequently.

Definition 5.1 (Transition Origin and Destination). *Let* s <sup>x</sup> −→ <sup>s</sup> <sup>∈</sup> <sup>δ</sup><sup>p</sup> *be a transition in <sup>C</sup>* (**G**, <sup>p</sup>) *and* <sup>δ</sup><sup>↓</sup> *be the transition relation of* GAut(**G**)↓p*. We define the set of* transition origins tr-orig(s <sup>x</sup> −→ s ) *and* transition destinations tr-dest(s <sup>x</sup> −→ s ) *as follows:*

$$\begin{aligned} \text{tr-orig}(s \xrightarrow{x} s') &:= \{ G \in s \mid \exists G' \in s'. G \xrightarrow{x} S' \in \delta\_{\downarrow} \} \text{ and} \\ \text{tr-dest}(s \xrightarrow{x} s') &:= \{ G' \in s' \mid \exists G \in s. G \xrightarrow{x} S \text{ \* } G \in \delta\_{\downarrow} \} \ . \end{aligned}$$

Our condition on send transitions is then stated below.

Definition 5.2 (Send Validity). *<sup>C</sup>* (**G**, <sup>p</sup>) *satisfies* Send Validity *iff every send transition* s <sup>x</sup> −→ <sup>s</sup> <sup>∈</sup> <sup>δ</sup><sup>p</sup> *is enabled in all states contained in* <sup>s</sup>*:*

$$\forall s \xrightarrow{x} s' \in \delta\_{\mathbb{P}}. \ x \in \Sigma\_{\mathbb{P}, \mathbb{I}} \implies \text{tr-orig}(s \xrightarrow{x} s') = s \quad .$$

Receive Validity. To motivate our condition on receive transitions, let us revisit **<sup>G</sup>**<sup>r</sup> from Example 2.2. The CSM {{*<sup>C</sup>* (**G**<sup>r</sup>, <sup>p</sup>)}}<sup>p</sup>∈P recognizes the following trace not in the global type language <sup>L</sup>(**G**<sup>r</sup>):

$$\mathbf{p} \rhd \mathbf{q} ! o \cdot \mathbf{q} \lhd \mathbf{p} ? o \cdot \mathbf{q} \rhd \mathbf{r} ! o \cdot \mathbf{p} \lhd \mathbf{r} ! o \cdot \mathbf{r} \lhd \mathbf{p} ? o \cdot \mathbf{r} \lhd \mathbf{q} ? o \dots$$

The issue lies with r which cannot distinguish between the two branches in **G**r. The initial state <sup>s</sup><sup>0</sup>,<sup>r</sup> of *<sup>C</sup>* (**G**<sup>r</sup>, <sup>r</sup>) has two states of GAut(**G**<sup>r</sup>) corresponding to the subterms <sup>G</sup><sup>t</sup> := <sup>q</sup> <sup>→</sup> <sup>r</sup> : o. <sup>p</sup> <sup>→</sup> <sup>r</sup> : o. <sup>0</sup> and <sup>G</sup><sup>b</sup> := <sup>p</sup> <sup>→</sup> <sup>r</sup> : o. <sup>q</sup> <sup>→</sup> <sup>r</sup> : o. <sup>0</sup> . Here, <sup>G</sup><sup>t</sup> and <sup>G</sup><sup>b</sup> are the top and bottom branch of **<sup>G</sup>**<sup>r</sup> respectively. This means that there are outgoing transitions in <sup>s</sup><sup>0</sup>,<sup>r</sup> labeled with <sup>r</sup> <sup>p</sup>?<sup>o</sup> and <sup>r</sup> <sup>q</sup>?o. If <sup>r</sup> takes the transition labeled <sup>r</sup> <sup>p</sup>?o, it commits to the bottom branch G<sup>b</sup>. However, observe that the message o from <sup>p</sup> can also be available at this time point if the other roles follow the top branch G<sup>t</sup>. This is because <sup>p</sup> can send <sup>o</sup> to <sup>r</sup> without waiting for r to first receive from q. In this scenario, the roles disagree on which global run of GAut(**G**<sup>r</sup>) to follow, resulting in the violating trace above.

Contrast this with **G** <sup>r</sup>. Here, <sup>s</sup><sup>0</sup>,<sup>r</sup> again has outgoing transitions labeled with <sup>r</sup> <sup>p</sup>?o and <sup>r</sup> <sup>q</sup>?o. However, if <sup>r</sup> takes the transition labeled <sup>r</sup> <sup>p</sup>?o, committing to the bottom branch, no disagreement occurs. This is because if the other roles are following the top branch, then p is blocked from sending to r until after it has received confirmation that r has received its first message from q.

For a receive transition s <sup>x</sup> −→ <sup>s</sup><sup>1</sup> in *<sup>C</sup>* (**G**, <sup>p</sup>) to be safe, we must enforce that the receive event x cannot also be available due to reordered sent messages in the continuation <sup>G</sup><sup>2</sup> <sup>∈</sup> <sup>s</sup><sup>2</sup> of another outgoing receive transition <sup>s</sup> <sup>y</sup> −→ s2. To formalize this condition, we use the set M<sup>B</sup> (G...) of *available messages* for a syntactic subterm G of **<sup>G</sup>** and a set of *blocked* roles <sup>B</sup>. This notion was already defined in [31, Sec. 2.2]. Intuitively, M<sup>B</sup> (G...) consists of all send events <sup>q</sup> <sup>r</sup>!m that can occur on the traces of G such that m will be the first message added to channel (q, <sup>r</sup>) before any of the roles in <sup>B</sup> takes a step.

*Available Messages.* The set of available messages is recursively defined on the structure of the global type. To obtain all possible messages, we need to unfold the distinct recursion variables once. For this, we define a map *get*μ from variable to subterms and write *get*μ**<sup>G</sup>** for *get*μ(**G**):

$$\begin{aligned} \left[ \operatorname{get} \mu(0) := [] \right] \quad & \stackrel{\cdot}{\operatorname{get}} \mu(t) := [] \quad & \stackrel{\cdot}{\operatorname{get}} \mu(\mu t. G) := [t \longmapsto G] \cup \operatorname{get} \mu(G) \\ & \stackrel{\cdot}{\operatorname{get}} \mu(\sum\_{i \in I} \mathbf{p} \to \mathbf{q}\_i : m\_i. G\_i) := \bigcup\_{i \in I} \operatorname{get} \mu(G\_i) \end{aligned}$$

The function M<sup>B</sup>,T (-...) keeps a set of unfolded variables <sup>T</sup>, which is empty initially.

$$M\_{\{0\dots\}}^{\mathcal{B},T} := \emptyset \qquad \qquad M\_{\{\mu t.G.\dots\}}^{\mathcal{B},T} := M\_{\{G\dots\}}^{\mathcal{B},T\cup\{t\}} \qquad \qquad M\_{\{t\dots\}}^{\mathcal{B},T} := \begin{cases} \emptyset & \text{if } t \in T\\ M\_{\{\gcd\mu\_{\mathcal{G}}\{t\},\dots\}}^{\mathcal{B},T\cup\{t\}} & \text{if } t \notin T \end{cases}$$

$$M^{\mathcal{B},T}\_{\left(\sum\_{i\in I}\mathbf{p}\cdot\mathbf{q},m\_{i},G\_{i},\ldots\right)} := \begin{cases} \bigcup\_{i\in I, m\in \mathcal{V}} \left(M^{\mathcal{B},T}\_{\left(G\_{i},\ldots\right)} \mid \{\mathbf{q}\_{i}\circ\mathbf{p}\, ?m\}\right) \cup \{\mathbf{q}\_{i}\circ\mathbf{p}\, ?m\_{i}\} & \text{if } \mathbf{p}\notin \mathcal{B} \\ \bigcup\_{i\in I} M^{\mathcal{B}\cup\{\mathbf{q}\_{i}\}}\_{\left(G\_{i},\ldots\right)} \mathcal{T} & \text{if } \mathbf{p}\in \mathcal{B} \end{cases}$$

We write M<sup>B</sup> (G...) for <sup>M</sup><sup>B</sup>,<sup>∅</sup> (G...). If B is a singleton set, we omit set notation and write M<sup>p</sup> (G...) for <sup>M</sup>{p} (G...). The set of available messages captures the possible states of all channels before a given receive transition is taken.

Definition 5.3 (Receive Validity). *<sup>C</sup>* (**G**, <sup>p</sup>) *satisfies* Receive Validity *iff no receive transition is enabled in an alternative continuation that originates from the same source state:*

$$\begin{aligned} \forall s \ \xrightarrow{\mathtt{p} \not\cong \mathtt{q}\_{1}?m\_{1}} s\_{1}, \ s \ \xrightarrow{\mathtt{p} \not\cong \mathtt{q}\_{2}?m\_{2}} s\_{2} \in \delta\_{\mathtt{p}}.\\ \mathsf{q}\_{1} \neq \mathsf{q}\_{2} \implies \forall \; G\_{2} \in \mathsf{tr}\text{-dist}(s \ \xrightarrow{\mathtt{p} \not\cong \mathtt{q}\_{2}?m\_{2}} s\_{2}). \ \mathsf{q}\_{1} \rhd \mathsf{p}!m\_{1} \notin M^{\mathtt{p}}\_{\left(G\_{2}\ldots\right)}. \end{aligned}$$

Subset Projection. We are now ready to define our projection operator.

Definition 5.4 (Subset Projection of **<sup>G</sup>**). *The* subset projection *<sup>P</sup>*(**G**, <sup>p</sup>) *of* **<sup>G</sup>** *onto* <sup>p</sup> *is <sup>C</sup>* (**G**, <sup>p</sup>) *if it satisfies Send Validity and Receive Validity. We lift this operation to a partial function from global types to CSMs in the expected way.*

We conclude our discussion with an observation about the syntactic structure of the subset projection: Send Validity implies that no state has both outgoing send and receive transitions (also known as mixed choice).

Corollary 5.5 (No Mixed Choice). *If <sup>P</sup>*(**G**, <sup>p</sup>) *satisfies Send Validity, then for all* <sup>s</sup> <sup>x</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup>, s <sup>x</sup><sup>2</sup> −→ <sup>s</sup><sup>2</sup> <sup>∈</sup> <sup>δ</sup><sup>p</sup>*,* <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>Σ</sup>! *iff* <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>Σ</sup>!*.*

### 6 Soundness

In this section, we prove the soundness of our subset projection, stated as follows.

Theorem 6.1. *Let* **<sup>G</sup>** *be a global type and* {{*P*(**G**, <sup>p</sup>)}}p∈P *be the subset projection. Then,* {{*P*(**G**, <sup>p</sup>)}}p∈P *implements* **<sup>G</sup>***.*

Recall that implementability is defined as protocol fidelity and deadlock freedom. Protocol fidelity consists of two language inclusions. The first inclusion, <sup>L</sup>(**G**) ⊆ L({{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P ), enforces that the subset projection generates at least all behaviors of the global type. We showed in Lemma 4.4 that this holds for the subset construction alone (without Send and Receive Validity).

The second inclusion, <sup>L</sup>({{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P ) ⊆ L(**G**), enforces that no new behaviors are introduced. The proof of this direction relies on a stronger inductive invariant that we show for all traces of the subset projection. As discussed in Sect. 5, violations of implementability occur when roles commit to global runs that are inconsistent with the local views of other roles. Our inductive invariant states the exact opposite: that all local views are consistent with one another. First, we formalize the local view of a role.

Definition 6.2 (Possible run sets). *Let* **<sup>G</sup>** *be a global type and* GAut(**G**) *be the corresponding state machine. Let* <sup>p</sup> *be a role and* w <sup>∈</sup> Σ<sup>∗</sup> *async be a word. We define the set of possible runs* R**<sup>G</sup>** <sup>p</sup> (w) *as all maximal runs of* GAut(**G**) *that are consistent with* <sup>p</sup>*'s local view of* w*:*

R**G** <sup>p</sup> (w) := {<sup>ρ</sup> *is a maximal run of* GAut(**G**) <sup>|</sup> <sup>w</sup>⇓<sup>Σ</sup><sup>p</sup> <sup>≤</sup> *split*(*trace*(ρ))⇓<sup>Σ</sup><sup>p</sup> } .

While Definition 6.2 captures the set of maximal runs that are consistent with the local view of a single role, we would like to refer to the set of runs that is consistent with the local view of all roles. We formalize this as the intersection of the possible run sets for all roles, which we denote as

$$I(w) := \bigcap\_{\mathbb{P} \in \mathcal{P}} \mathrm{R}\_{\mathbb{P}}^{\mathbf{G}}(w) \; .$$

With these definitions in hand, we can now formulate our inductive invariant:

Lemma 6.3. *Let* **<sup>G</sup>** *be a global type and* {{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P *be the subset projection. Let* <sup>w</sup> *be a trace of* {{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P *. It holds that* <sup>I</sup>(w) *is non-empty.*

The reasoning for the sufficiency of Lemma 6.3 is included in the proof of Theorem 6.1, found in the extended version [29]. In the rest of this section, we focus our efforts on how to show this inductive invariant, namely that the intersection of all roles' possible run sets is non-empty.

We begin with the observation that the empty trace ε is consistent with all runs. As a result, I(ε) = <sup>p</sup>∈P <sup>R</sup>**<sup>G</sup>** <sup>p</sup> (ε) contains all maximal runs in GAut(**G**). By definition, state machines for global types include at least one run, and the base case is trivially discharged. Intuitively, I(w) shrinks as more events are appended

Fig. 3. Evolution of <sup>R</sup>**<sup>G</sup>** - (-) sets when p sends a message m and q receives it.

to w, but we show that at no point does it shrink to <sup>∅</sup>. We consider the cases where a send or receive event is appended to the trace separately, and show that the intersection set shrinks in a principled way that preserves non-emptiness. In fact, when a trace is extended with a receive event, Receive Validity guarantees that the intersection set does not shrink at all.

Lemma 6.4. *Let* **<sup>G</sup>** *be a global type and* {{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P *be the subset projection. Let* wx *be a trace of* {{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P *such that* <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>?*. Then,* <sup>I</sup>(w) = <sup>I</sup>(wx)*.*

To prove this equality, we further refine our characterization of intersection sets. In particular, we show that in the receive case, the intersection between the sender and receiver's possible run sets stays the same, i.e.

$$\mathbf{R\_{p}^{\mathbf{G}}}(w) \cap \mathbf{R\_{q}^{\mathbf{G}}}(w) = \mathbf{R\_{p}^{\mathbf{G}}}(wx) \cap \mathbf{R\_{q}^{\mathbf{G}}}(wx) \text{ .}$$

Note that it is not the case that the receiver only follows a subset of the sender's possible runs. In other words, R**<sup>G</sup>** <sup>q</sup> (w) <sup>⊆</sup> <sup>R</sup>**<sup>G</sup>** <sup>p</sup> (w) is not inductive. The equality above simply states that a receive action can only eliminate runs that have already been eliminated by its sender. Figure 3 depicts this relation.

Given that the intersection set strictly shrinks, the burden of eliminating runs must then fall upon send events. We show that send transitions shrink the possible run set of the sender in a way that is *prefix-preserving*. To make this more precise, we introduce the following definition on runs.

Definition 6.5 (Unique splitting of a possible run). *Let* **G** *be a global type,* <sup>p</sup> *a role, and* w <sup>∈</sup> Σ<sup>∗</sup> *async a word. Let* <sup>ρ</sup> *be a possible run in* <sup>R</sup>**<sup>G</sup>** <sup>p</sup> (w)*. We define the longest prefix of* ρ *matching* w*:*

$$\alpha' := \max \{ \rho' \mid \rho' \le \rho \land \text{ sp} \mathsf{l} \, \mathsf{it} (\mathsf{trace}(\rho')) \Downarrow\_{\Sigma\_{\mathbb{P}}} \le w \Downarrow\_{\Sigma\_{\mathbb{P}}} \} \dots$$

*If* α <sup>=</sup> <sup>ρ</sup>*, we can split* <sup>ρ</sup> *into* <sup>ρ</sup> <sup>=</sup> <sup>α</sup> · <sup>G</sup> <sup>l</sup> −→ G · β *where* α = α · G*,* G *denotes the state following* G*, and* β *denotes the suffix of* ρ *following* α · G · G *. We call* α · G <sup>l</sup> −→ G · β *the unique splitting of* ρ *for* <sup>p</sup> *matching* w*. We omit the role* <sup>p</sup> *when obvious from context. This splitting is always unique because the maximal prefix of any* ρ <sup>∈</sup> R**<sup>G</sup>** <sup>p</sup> (w) *matching* <sup>w</sup> *is unique.*

When role <sup>p</sup> fires a send transition <sup>p</sup> <sup>q</sup>!m, any run ρ = α · G <sup>l</sup> −→ G · β in <sup>p</sup>'s possible run with split(l)⇓Σ<sup>p</sup> <sup>=</sup> <sup>p</sup> <sup>q</sup>!m is eliminated. While the resulting possible run set could no longer contain runs that end with G · <sup>β</sup>, Send Validity guarantees that it must contain runs that begin with α · G. This is formalized by the following lemma.

Lemma 6.6. *Let* **<sup>G</sup>** *be a global type and* {{*P*(**G**, <sup>p</sup>)}}p∈P *be the subset projection. Let* wx *be a trace of* {{*P*(**G**, <sup>p</sup>)}}p∈P *such that* <sup>x</sup> <sup>∈</sup> <sup>Σ</sup>! <sup>∩</sup> <sup>Σ</sup><sup>p</sup> *for some* <sup>p</sup> ∈ P*. Let* ρ *be a run in* I(w)*, and* α · G <sup>l</sup> −→ G · β *be the unique splitting of* ρ *for* <sup>p</sup> *with respect to* w*. Then, there exists a run* ρ *in* <sup>I</sup>(wx) *such that* <sup>α</sup> · <sup>G</sup> <sup>≤</sup> <sup>ρ</sup> *.*

This concludes our discussion of the send and receive cases in the inductive step to show the non-emptiness of the intersection of all roles' possible run sets. The full proofs and additional definitions can be found in the extended version [29].

## 7 Completeness

In this section, we prove completeness of our approach. While soundness states that if a global type's subset projection is defined, it then implements the global type, completeness considers the reverse direction.

Theorem 7.1 (Completeness). *If* **<sup>G</sup>** *is implementable, then* {{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P *is defined.*

We sketch the proof and refer to the extended version [29] for the full proof. From the assumption that **G** is implementable, we know there exists a witness CSM that implements **G**. While the soundness proof picks our subset projection as the existential witness for showing implementability – thereby allowing us to reason directly about a particular implementation – completeness only guarantees the existence of some witness CSM. We cannot assume without loss of generality that this witness CSM is our subset construction; however, we must use the fact that it implements **G** to show that Send and Receive Validity hold on our subset construction.

We proceed via proof by contradiction: we assume the negation of Send and Receive Validity for the subset construction, and show a contradiction to the fact that this witness CSM implements **G**. In particular, we contradict protocol fidelity (Definition 3.1(i)), stating that the witness CSM generates precisely the language <sup>L</sup>(**G**). To do so, we exploit a simulation argument: we first show that the negation of Send and Receive Validity forces the subset construction to recognize a trace that is not a prefix of any word in <sup>L</sup>(**G**). Then, we show that this trace must also be recognized by the witness CSM, under the assumption that the witness CSM implements **G**.

To highlight the constructive nature of our proof, we convert our proof obligation to a witness construction obligation. To contradict protocol fidelity, it suffices to construct a witness trace <sup>v</sup><sup>0</sup> satisfying two properties, where {{B<sup>p</sup>}}<sup>p</sup>∈P is our witness CSM:


We first establish the sufficiency of conditions (a) and (b). Because {{Bp}}p∈P is deadlock-free by assumption, every prefix extends to a maximal trace. Thus, to prove the inequality of the two languages <sup>L</sup>({{Bp}}p∈P ) and <sup>L</sup>(**G**), it suffices to prove the inequality of their respective prefix sets. In turn, it suffices to show the existence of a prefix of a word in one language that is not a prefix of any word in the other. We choose to construct a prefix in the CSM language that is not a prefix in <sup>L</sup>(**G**). We again leverage the definition of intersection sets (Definition 6.2) to weaken the property of language non-membership to the property of having an empty intersection set as follows. By the semantics of <sup>L</sup>(**G**), for any w ∈ L(**G**), there exists w <sup>∈</sup> split(L(GAut(**G**))) with <sup>w</sup> <sup>∼</sup> <sup>w</sup> . For any w <sup>∈</sup> split(L(GAut(**G**))), it trivially holds that w has a non-empty intersection set. Because intersection sets are invariant under the indistinguishability relation <sup>∼</sup>, w must also have a non-empty intersection set. Since intersection sets are monotonically decreasing, if the intersection set of w is non-empty, then for any v <sup>≤</sup> w, the intersection set of v is also non-empty. Modus tollens of the chain of reasoning above tells us that in order to show a word is not a prefix in <sup>L</sup>(**G**), it suffices to show that its intersection set is empty.

Having established the sufficiency of properties (a) and (b) for our witness construction, we present the steps to construct <sup>v</sup><sup>0</sup> from the negation of Send and Receive Validity respectively. We start by constructing a trace in {{*<sup>C</sup>* (**G**, <sup>p</sup>)<sup>p</sup>}}<sup>p</sup>∈P that satisfies (b), and then show that {{B<sup>p</sup>}}<sup>p</sup>∈P also recognizes the trace, thereby satisfying (a). In both cases, let <sup>p</sup> be the role and s be the state for which the respective validity condition is violated.

Send Validity (Definition 5.2). Let s pq!m −−−−→ <sup>s</sup> <sup>∈</sup> <sup>δ</sup><sup>p</sup> be a transition such that

$$\text{tr\text{-}orig}(s \xrightarrow{\text{p\textquotesingle} \text{q} \text{\textquotesingle} m} s') \neq s \quad .$$

First, we find a trace <sup>u</sup> of {{*<sup>C</sup>* (**G**, <sup>p</sup>)<sup>p</sup>}}<sup>p</sup>∈P that satisfies: (1) role <sup>p</sup> is in state <sup>s</sup> in the CSM configuration reached via u, and (2) the run of GAut(**G**) on u visits a state in s \ tr-orig(s pq!m −−−−→ s ). We obtain such a witness u from the split(trace(−)) of a run prefix of GAut(**G**) that ends in some state in s \ tr-orig(s pq!m −−−−→ s ). Any prefix thus obtained satisfies (1) by definition of *<sup>C</sup>* (**G**, <sup>p</sup>), and satisfies (2) by construction. Due to the fact that send transitions are always enabled in a CSM, u · <sup>p</sup> <sup>q</sup>!<sup>m</sup> must also be a trace of {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P , thus satisfying property (a) by a simulation argument. We then argue that u· <sup>p</sup> <sup>q</sup>!m satisfies property (b), stating that I(u· <sup>p</sup> <sup>q</sup>!m) is empty: the negation of Send Validity gives that there exist no run extensions from our candidate state in s \ tr-orig(s pq!m −−−−→ s ) with the immediate next action <sup>p</sup> −→ <sup>q</sup> : m, and therefore there exists no maximal run in GAut(**G**) consistent with u · <sup>p</sup> <sup>q</sup>!m.

Receive Validity (Definition 5.3). Let s pq1?m1 −−−−−→ <sup>s</sup><sup>1</sup> and <sup>s</sup> pq2?m2 −−−−−→ <sup>s</sup><sup>2</sup> <sup>∈</sup> <sup>δ</sup><sup>p</sup> be two transitions, and let <sup>G</sup><sup>2</sup> <sup>∈</sup> tr-dest(<sup>s</sup> pq2?m2 −−−−−→ s2) such that

> <sup>q</sup><sup>1</sup> <sup>=</sup> <sup>q</sup><sup>2</sup> and <sup>q</sup><sup>1</sup> <sup>p</sup>!m<sup>1</sup> <sup>∈</sup> <sup>M</sup><sup>p</sup> (G2...) .

Constructing the witness <sup>v</sup><sup>0</sup> pivots on finding a trace <sup>u</sup> of {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}p∈P such that both <sup>u</sup>·pq1?m<sup>1</sup> and <sup>u</sup>·pq2?m<sup>2</sup> are traces of {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}p∈P . Equivalently, we show there exists a reachable configuration of {{*<sup>C</sup>* (**G**, <sup>p</sup>)}}<sup>p</sup>∈P in which <sup>p</sup> can receive either message from distinct senders q<sup>1</sup> and q2. Formally, the local state of <sup>p</sup> has two outgoing states labeled with <sup>p</sup> <sup>q</sup><sup>1</sup>?m<sup>1</sup> and <sup>p</sup> <sup>q</sup><sup>2</sup>?m<sup>2</sup>, and the channels <sup>q</sup><sup>1</sup>, <sup>p</sup> and <sup>q</sup><sup>2</sup>, <sup>p</sup> have <sup>m</sup><sup>1</sup> and <sup>m</sup><sup>2</sup> at their respective heads. We construct such a u by considering a run in GAut(**G**) that contains two transitions labeled with <sup>q</sup><sup>1</sup> −→ <sup>p</sup> : <sup>m</sup><sup>1</sup> and <sup>q</sup><sup>2</sup> −→ <sup>p</sup> : <sup>m</sup><sup>2</sup>. Such a run must exist due to the negation of Receive Validity. We start with the split trace of this run, and argue that, from the definition of M(-) and the indistinguishability relation <sup>∼</sup>, we can perform iterative reorderings using <sup>∼</sup> to bubble the send action <sup>q</sup><sup>1</sup> <sup>p</sup>!m<sup>1</sup> to the position before the receive action <sup>p</sup>q<sup>2</sup>?m<sup>2</sup>. Then, (a) for <sup>u</sup>·pq<sup>1</sup>?m<sup>1</sup> holds by a simulation argument. We then separately show that (b) holds for <sup>p</sup> <sup>q</sup><sup>1</sup>?m<sup>1</sup> using similar reasoning as the send case to complete the proof that <sup>u</sup> · <sup>p</sup> <sup>q</sup><sup>1</sup>?m<sup>1</sup> suffices as a witness for v<sup>0</sup>.

It is worth noting that the construction of the witness prefix <sup>v</sup><sup>0</sup> in the proof immediately yields an algorithm for computing counterexample traces to implementability.

*Remark 7.2 (Mixed Choice is Not Needed to Implement Global Types).* Theorem 7.1 basically shows the necessity of Send Validity for implementability. Corollary 5.5 shows that Send Validity precludes states with both send and receive outgoing transitions. Together, this implies that an implementable global type can always be implemented without mixed choice. Note that the syntactic restrictions on global types do not inherently prevent mixed choice states from arising in a role's subset construction, as evidenced by r in the following type: <sup>p</sup>→<sup>q</sup> : l. <sup>q</sup>→<sup>r</sup> : m. 0 + <sup>p</sup>→<sup>q</sup> : r. <sup>r</sup>→<sup>q</sup> : m. <sup>0</sup>. Our completeness result thus implies that this type is not implementable. Most MST frameworks [18,24,31] implicitly force *no mixed choice* through syntactic restrictions on local types. We are the first to prove that mixed choice states are indeed not necessary for completeness. This is interesting because mixed choice is known to be crucial for the expressive power of the synchronous π-calculus compared to its asynchronous variant [32].

### 8 Complexity

In this section, we establish PSPACE-completeness of checking implementability for global types.

Theorem 8.1. *The MST implementability problem is PSPACE-complete.*

*Proof.* We first establish the upper bound. The decision procedure enumerates for each role <sup>p</sup> the subsets of GAut(**G**)↓p. This can be done in polynomial space and exponential time. For each <sup>p</sup> and s <sup>⊆</sup> Q**G**, it then (i) checks membership of <sup>s</sup> in <sup>Q</sup><sup>p</sup> of *<sup>C</sup>* (**G**, <sup>p</sup>), and (ii) if <sup>s</sup> <sup>∈</sup> <sup>Q</sup>p, checks whether all outgoing transitions of <sup>s</sup> in *<sup>C</sup>* (**G**, <sup>p</sup>) satisfy Send and Receive Validity. Check (i) can be reduced to the intersection non-emptiness problem for nondeterministic finite state machines, which is in PSPACE [44]. It is easy to see that check (ii) can be done in polynomial time. In particular, the computation of available messages for Receive Validity only requires a single unfolding of every loop in **G**.

Note that the synthesis problem has the same complexity. The subset construction to determinize GAut(**G**)↓<sup>p</sup> can be done using a PSPACE transducer. While the output can be of exponential size, it is written on an extra tape that is not counted towards memory usage. However, this means we need to perform the validity checks as described above instead of using the computed deterministic state machines.

Second, we prove the lower bound. The proof is inspired by the proof for Theorem 4 [4] in which Alur et al. prove that checking safe realizability of bounded HMSCs is PSPACE-hard. We reduce the PSPACE-complete problem of checking universality of an NFA M = (Q, Δ, δ, q<sup>0</sup>, F) to checking implementability. Without loss of generality, we assume that every state can reach a final state. We construct a global type **<sup>G</sup>** for <sup>p</sup>, <sup>q</sup> and <sup>r</sup> that is implementable iff <sup>L</sup>(M) = Δ<sup>∗</sup>. For this, we define subterms <sup>G</sup><sup>l</sup> and <sup>G</sup><sup>r</sup> as well as <sup>G</sup><sup>q</sup> for every <sup>q</sup> <sup>∈</sup> <sup>Q</sup> and <sup>G</sup><sup>∗</sup>. We use a fresh letter <sup>⊥</sup> to handle final states of M. We also define <sup>p</sup>↔<sup>q</sup> :m as an abbreviation for <sup>p</sup>→q:m . <sup>q</sup>→p:m.

$$\mathbf{G} := G\_l + G\_r$$

$$G\_l := \mathbf{p} \leftrightarrow \mathbf{q} : l \, \mathbf{p} \leftrightarrow \mathbf{r} : go \, \, G\_{q\_0}$$

$$\begin{aligned} G\_q &:= \begin{cases} \sum\_{(a,q') \in \delta(q)} (\mathbf{r} \mapsto \mathbf{q}; a . G\_{q'}) & \text{if } q \notin F \\ \mathbf{r} \mapsto \mathbf{q}; \perp .0 & + \sum\_{(a,q') \in \delta(q)} (\mathbf{r} \mapsto \mathbf{q}; a . G\_{q'}) & \text{if } q \in F \end{cases} \\\\ G\_r &:= \mathbf{p} \mapsto \mathbf{q}; r . p \mapsto \mathbf{r} : g o . G\_\* \\\\ G\_\* &:= \mathbf{r} \mapsto \mathbf{q}; \perp .0 + \sum\_{r' \in A} (\mathbf{r} \mapsto \mathbf{q}; a . G\_\*) \end{aligned}$$

The global type **G** is constructed such that p first decides whether words from <sup>L</sup>(M) or from <sup>Δ</sup><sup>∗</sup> are sent subsequently. This decision is known to <sup>p</sup> and <sup>q</sup> but not to <sup>r</sup>. The protocol then continues with <sup>r</sup> sending letters from Δ to <sup>q</sup>, and <sup>p</sup> is not involved. Intuitively, <sup>q</sup> is able to receive these letters if and only if <sup>L</sup>(M) = Δ<sup>∗</sup>. From Theorems 6.1 and 7.1, we know that {{*<sup>C</sup>* (**G**, <sup>p</sup>)<sup>p</sup>}}<sup>p</sup>∈P implements **<sup>G</sup>** if **<sup>G</sup>** is implementable.

a∈Δ

We claim that {{*<sup>C</sup>* (**G**, <sup>p</sup>)<sup>p</sup>}}<sup>p</sup>∈P implements **<sup>G</sup>** if and only if <sup>L</sup>(M) = <sup>Δ</sup><sup>∗</sup>.

First, assume that <sup>L</sup>(M) = Δ∗. Then, there exists w /∈ L(M). We can construct the following run of {{*<sup>C</sup>* (**G**, <sup>p</sup>)p}}p∈P that deadlocks. Role <sup>p</sup> chooses the left subterm <sup>G</sup><sup>l</sup> and, subsequently, <sup>r</sup> sends <sup>w</sup> to <sup>q</sup>. We do a case analysis on whether w contains a prefix w such that <sup>w</sup> <sup>∈</sup>/ pref(L(M)). If so, sending the last letter of a minimal prefix leads to a deadlock in {{*<sup>C</sup>* (**G**, <sup>p</sup>)p}}p∈P , contradicting deadlock freedom. If not, it holds that w is a prefix of a word in <sup>L</sup>(M). Still, role <sup>r</sup> can send <sup>⊥</sup>, which cannot be received, also contradicting deadlock freedom.

Second, assume that <sup>L</sup>(M) = Δ<sup>∗</sup>. With this, it is fine that <sup>r</sup> does not know the branch. Role <sup>q</sup> will be able to receive all messages since *<sup>C</sup>* (**G**, <sup>q</sup>) can receive, letter by letter, w.<sup>⊥</sup> for every w ∈ L(M) from <sup>r</sup>. Thus, protocol fidelity and deadlock freedom hold, concluding the proof.

Note that PSPACE-hardness only holds if the size of **G** does not account for common subterms multiple times. Because every message is immediately acknowledged, the constructed global type specifies a universally 1-bounded [23] language, proving that PSPACE-hardness persists for such a restriction. For our construction, it does not hold that <sup>V</sup>(L(G<sup>l</sup>)⇓<sup>Σ</sup>q,? ) = <sup>L</sup>(M). We chose so to have a more compact protocol. However, we can easily fix this by sending the decision of <sup>r</sup> first to <sup>p</sup>, allowing to omit the messages <sup>⊥</sup> to <sup>q</sup>.

This result and the fact that local languages are preserved by the subset projection (Lemma 4.3) leads to the following observation.

Corollary 8.2. *Let* **G** *be an implementable global type. Then, the subset projection* {{*P*(**G**, <sup>p</sup>)}}<sup>p</sup>∈P *is a local language preserving implementation for* **<sup>G</sup>***, i.e.,* <sup>L</sup>(*P*(**G**, <sup>p</sup>)) = <sup>L</sup>(**G**)⇓<sup>Σ</sup><sup>p</sup> *for every* <sup>p</sup>*, and can be computed in PSPACE.*

*Remark 8.3 (MST implementability with directed choice is PSPACE-hard).* Theorem 8.1 is stated for global types with sender-driven choice but the provided type is in fact directed. Thus, the PSPACE lower bound also holds for implementability of types with directed choice.

## 9 Evaluation

We consider the following three aspects in the evaluation of our approach: (E1) difficulty of implementation (E2) completeness, and (E3) comparison to state of the art.

For this, we implemented our subset projection in a prototype tool [1,37]. It takes a global type as input and computes the subset projection for each role. It was straightforward to implement the core functionality in approximately 700 lines of Python3 code closely following the formalization (E1).

We consider global types (and communication protocols) from seven different sources as well as all examples from this work (cf. 1st column of Table 1). Our experiments were run on a computer with an Intel Core i7-1165G7 CPU and used at most 100MB of memory. The results are summarized in Table 1. The reported size is the number of states and transitions of the respective state machine, which Table 1. Projecting Global Types. For every protocol, we report whether it is implementable or not ×, the time to compute our subset projection and the generalized projection by Majumdar et al. [31] as well as the outcome as for "implementable", × for "not implementable" and (×) for "not known". We also give the size of the protocol (number of states and transitions), the number of roles, the combined size of all subset projections (number of states and transitions).


allows not to account for multiple occurrences of the same subterm. As expected, our tool can project every implementable protocol we have considered (E2).

Regarding the comparison against the state of the art (E3), we directly compared our subset projection to the incomplete approach by Majumdar et al. [31], and found that the run times are in the same order of magnitude in general (typically a few milliseconds). However, the projection of [31] fails to project four implementable protocols (including Example 2.1). We discuss some of the other examples in more detail in the next section. We further note that most of the run times reported by Scalas and Yoshida [36] on their model checking based tool are around 1 s and are thus two to three orders of magnitude slower.

### 10 Discussion

Success of Syntactic Projections Depends on Representation. Let us illustrate how unfolding recursion helps syntactic projection operators to succeed. Consider this implementable global type, which is not syntactically projectable:

$$\mathbf{G}\_{\text{fold}} := + \begin{cases} \mathsf{p} \multimap \mathbf{q} \colon \mathsf{o} \colon \mathsf{o} \,\,\mu t\_1 . \left( \mathsf{p} \multimap \mathbf{q} \colon \mathsf{o} \,\,\mathsf{q} \to \mathsf{r} \colon \mathsf{o} \,\,t\_1 \,\, + \,\mathsf{p} \multimap \mathbf{q} \colon b \,\,\mathsf{q} \to \mathsf{r} \colon b . \mathbf{0} \right) \\ \mathbf{p} \multimap \mathbf{q} \colon m. \mathbf{q} \multimap \mathbf{r} \colon m. \,\mu t\_2. \left( \mathbf{p} \multimap \mathbf{q} \,\,\mathbf{q} \,\,\mathbf{q} \to \mathsf{r} \colon \mathbf{o} \,\,t\_2 \,\, + \,\mathsf{p} \multimap \mathbf{q} \,\,b. \,\mathbf{q} \to \mathsf{r} \colon b . \mathbf{0} \right) \end{cases} \text{ }.$$

Similar to projection by erasure, a syntactic projection erases events that a role is not involved in and immediately tries to *merge* different branches. The merge operator is a partial operator that checks sufficient conditions for implementability. Here, the merge operator fails for r because it cannot merge a recursion variable binder and a message reception. Unfolding the global type preserves the represented protocol and resolves this issue:

$$\mathbf{G}\_{\texttt{unf}} := + \left\{ \mathsf{p} \to \mathsf{q} : \alpha \colon \begin{cases} \mathsf{p} \to \mathsf{q} : b. \,\mathsf{q} \to \mathsf{r} : b. \,\mathsf{0} \\ \mathsf{p} \to \mathsf{q} : \alpha. \,\mathsf{q} \to \mathsf{r} : \alpha. \,\mu t\_1. \,(\mathsf{p} \to \mathsf{q} : \alpha. \,\mathsf{q} \to \mathsf{r} : \alpha. t\_1 \,\mathsf{q} \to \mathsf{q} : b. \,\mathsf{q} \to \mathsf{r} : b. \,\mathsf{0} \right) \right\}$$

.

(We refer to [29] for visual representations of both global types.) This global type can be projected with most syntactic projection operators and shows that the representation of the global type matters for syntactic projectability. However, such unfolding tricks do not always work, e.g. for the odd-even protocol (Example 2.1). We avoid this brittleness using automata and separating the synthesis from checking implementability.

Entailed Properties from the Literature. We defined implementability for a global type as the question of whether there exists a deadlock-free CSM that generates the same language as the global type. Various other properties of implementations and protocols have been proposed in the literature. Here, we give a brief overview and defer to the extended version [29] for a detailed analysis. *Progress* [18], a common property, requires that every sent message is eventually received and every expected message will eventually be sent. With deadlock freedom, our subset projection trivially satisfies progress for finite traces. For infinite traces, as expected, fairness assumptions are required to enforce progress. Similarly, our subset projection prevents *unspecified receptions* [14] and *orphan messages* [9,21], respectively interpreted in our multiparty setting with senderdriven choice. We also ensure that every local transition of each role is *executable* [14], i.e. it is taken in some run of the CSM. Any implementation of a global type has the *stable property* [28], i.e., one can always reach a configuration with empty channels from every reachable configuration. While the properties above are naturally satisfied by our subset projection, the following ones can be checked directly on an implementable global type without explicitly constructing the implementation. A global type is *terminating* [36] iff it does not contain recursion and *never-terminating* [36] iff it does not contain term 0.

#### 11 Related Work

MSTs were introduced by Honda et al. [24] with a process algebra semantics, and the connection to CSMs was established soon afterwards [20].

In this work, we present a complete projection procedure for global types with sender-driven choice. The work by Castagna et al. [13] is the only one to present a projection that aims for completeness. Their semantic conditions, however, are not effectively computable and their notion of completeness is "less demanding than the classical ones" [13]. They consider multiple implementations, generating different sets of traces, to be sound and complete with regard to a single global type [13, Sec. 5.3]. In addition, the algorithmic version of their conditions does not use global information as our message availability analysis does.

MST implementability relates to safe realizability of HMSCs, which is undecidable in general but decidable for certain classes [30]. Stutz [38] showed that implementability of global types that are always able to terminate is decidable.<sup>1</sup> The EXPSPACE decision procedure is obtained via a reduction to safe realizability of globally-cooperative HMSCs, by proving that the HMSC encoding [39] of any implementable global type is globally-cooperative and generalizing results for infinite executions. Thus, our PSPACE-completeness result both generalizes and tightens the earlier decidability result obtained in [38]. Stutz [38] also investigates how HMSC techniques for safe realizability can be applied to the MST setting – using the formal connection between MST implementability and safe realizability of HMSCs – and establishes an undecidability result for a variant of MST implementability with a relaxed indistinguishability relation.

Similar to the MST setting, there have been approaches in the HMSC literature that tie branching to a role making a choice. We refer the reader to the work by Majumdar et al. [31] for a survey.

Standard MST frameworks project a global type to a set of *local types* rather than a CSM. Local types are easily translated to FSMs [31, Def.11]. Our projection operator, though, can yield FSMs that cannot be expressed with the limited syntax of local types. Consider this implementable global type: <sup>p</sup>→q:o. 0 + <sup>p</sup>→q:m. <sup>p</sup>→r: b. 0. The subset projection for <sup>r</sup> has two final states connected by a transition labeled <sup>r</sup>p?b. In the syntax of local types, 0 is the only term indicating termination, which means that final states with outgoing transitions cannot be expressed. In contrast to the syntactic restrictions for global types, which are key to effective verification, we consider local types unnecessarily restrictive. Usually, local implementations are type-checked against their local types and subtyping gives some implementation freedom [12,16,17,27]. However, one can also view our subset projection as a local specification of the actual implementation. We conjecture that subtyping would then amount to a variation of alternating refinement [5].

CSMs are Turing-powerful [11] but decidable classes were obtained for different semantics: restricted communication topology [33,42], half-duplex communication (only for two roles) [14], input-bounded [10], and unreliable channels [2,3].

<sup>1</sup> This syntactic restriction is referred to as 0-reachability in [38].

Global types (as well choreography automata [7]) can only express existentially 1-bounded, 1-synchronizable and half-duplex communication [39]. Key to this result is that sending and receiving a message is specified atomically in a global type — a feature Dagnino et al. [19] waived for their deconfined global types. However, Dagnino et al. [19] use deconfined types to capture the behavior of a given system rather than projecting to obtain a system that generates specified behaviors.

This work relies on reliable communication as is standard for MST frameworks. Work on fault-tolerant MST frameworks [8,43] attempts to relax this restriction. In the setting of reliable communication, both context-free [25,40] and parametric [15,22] versions of session types have been proposed to capture more expressive protocols and entire protocol families respectively. Extending our approach to these generalizations is an interesting direction for future work.

Acknowledgements. This work is funded in part by the National Science Foundation under grant 1815633. Felix Stutz was supported by the Deutsche Forschungsgemeinschaft project 389792660 TRR 248—CPEC.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Early Verification of Legal Compliance via Bounded Satisfiability Checking**

Nick Feng1(B) , Lina Marsso1, Mehrdad Sabetzadeh2, and Marsha Chechik<sup>1</sup>

> <sup>1</sup> University of Toronto, Toronto, Canada {fengnick,lmarsso,chechik}@cs.toronto.edu <sup>2</sup> University of Ottawa, Ottawa, Canada m.sabetzadeh@uottawa.ca

**Abstract.** Legal properties involve reasoning about data values and time. Metric first-order temporal logic (MFOTL) provides a rich formalism for specifying legal properties. While MFOTL has been successfully used for verifying legal properties over operational systems via runtime monitoring, no solution exists for MFOTL-based verification in earlystage system development captured by requirements. Given a legal property and system requirements, both formalized in MFOTL, the compliance of the property can be verified on the requirements via satisfiability checking. In this paper, we propose a practical, sound, and complete (within a given bound) satisfiability checking approach for MFOTL. The approach, based on satisfiability modulo theories (SMT), employs a counterexample-guided strategy to incrementally search for a satisfying solution. We implemented our approach using the Z3 SMT solver and evaluated it on five case studies spanning the healthcare, business administration, banking and aviation domains. Our results indicate that our approach can efficiently determine whether legal properties of interest are met, or generate counterexamples that lead to compliance violations.

### **1 Introduction**

Software systems, such as medical systems, are increasingly required to comply with laws and regulations aimed at ensuring safety, security, and data privacy [1,36]. The properties stipulated by these laws and regulations – which we refer to as *legal properties* (LP) hereafter – typically involve reasoning about actions, ordering and time. As an example, consider the following LP, P1, derived from a health-data regulation (s.11, PHIPA [20]): "If personal health information is not accurate or not up-to-date, it should not be accessed". In this property, the accuracy and the freshness of the data depend on how and when the data was collected and updated before being accessed. Specifically, this property constrains the data action *access* to have accurate and up-to-date data values, which further constrains the order and time of *access* with respect to other data actions.

System compliance with LPs can be checked on the system design or on an operational model of a system implementation. In this paper, we focus on the early stage, where one can check whether a formalization of the system requirements satisfies an LP. The formalization can be done using a descriptive formalism like temporal logic [24,35]. For instance, the requirement (req0) of a data collection system: "no data can be accessed prior to 15 days after the data has been collected" needs to be formalized for verifying compliance of P1. It is important to formalize the data and time constraints of both the system requirements and LPs, such as the ones of P1 and req0.

*Metric first-order temporal logic (MFOTL)* enables the specification of data and time constraints [3] and has an expressive formalism for capturing LPs and the related system requirements that constrain data and time [1]. Existing work on MFOTL verification focuses on detecting violations at run-time through monitoring [1,19], with MFOTL formulas being checked on execution logs. There is an unmet need for determining the *satisfiability* of MFOTL specifications, i.e., looking for LP violations possible in MFOTL specification. This is important for designing systems that comply with their legal requirements.

MFOTL satisfiability checking is generally undecidable since MFOTL is an extension of first-order logic (FOL). Restrictions are thus necessary for making the problem decidable. In this paper, we restrict ourselves to safety properties. For safety properties, LP violations are finite sequences of data actions, captured via a finite-length counterexample. For example, a possible violation of P1 is a sequence consisting of storing a value v in a variable d, updating d's value to v , then reading d again and not obtaining v . Since we are interested in finite counterexamples, bounded verification is a natural strategy to pursue for achieving decidability. SAT solvers have been previously used for bounded satisfiability checking of metric temporal logic (MTL) [24,35]. However, MTL cannot effectively capture quantified data constraints in LPs, hence the solution is not applicable directly. As an extension to MTL, MFOTL can effectively capture data constraints used in LP. Yet, to the best of our knowledge, there has not been any prior work on bounded MFOTL satisfiability checking.

To establish a *bound* in bounded verification, researchers have predominantly relied on bounding the *size of the universe* [13]. Bounding the universe would be too restrictive because LPs routinely refer to variables with large ranges, e.g., timed actions spanning several years. Instead, we bound the *number of data actions in a run*, which bounds the number of actions in the counterexample.

Equipped with our proposed notion of a bound, we develop an incremental approach (IBS) for bounded satisfiability checking of MFOTL. We first translate the MFOTL property and requirements into first-order logic formulas with quantified relational objects (FOL∗). We then incrementally ground the FOL<sup>∗</sup> constraints to eliminate the quantifiers by considering an increasing number of relational objects. Subsequently, we check the satisfiability of the resulting constraints using an SMT solver. Specifically, we make the following contributions: (1) we propose a translation of MFOTL formulas to FOL∗; (2) we provide a novel bounded satisfiability checking solution, IBS, for the translated FOL<sup>∗</sup> formulas with incremental and counterexample-guided over/ under-approximation. Note that while our solution to MFOTL satisfibility checking can be applied to a broader set of applications, in this paper we focus on the legal domain. We


**Fig. 1.** Example requirements and legal property <sup>P</sup>1 of DCC, with signature <sup>S</sup>*data* <sup>=</sup> (∅, {*Collect*, *Update*, *Access*}, ι*data*), where <sup>ι</sup>*data*(*Collect*) = <sup>ι</sup>*data*(*Update*) = <sup>ι</sup>*data*(*Access*) = 2.

**Fig. 2.** Five traces from the DCC example.

empirically evaluate IBS on five case studies with a total of 24 properties showing that it can effectively and efficiently find LP violations or prove satisfiability.

The rest of this paper is organized as follows. Sect. 2 provides background and establishes our notation. Sect. 3 defines the bounded satisfiability checking (BSC) problem. Sect. 4 provides an overview of our solution and the translation of MFOTL to FOL∗. Sect. 5 presents our solution; proofs of soundness, termination and optimality are available in the extended version [11]. Sect. 6 reports on the experiments performed to validate our bounded satisfiability checking solution for MFOTL. Sect. 7 discusses related work. Sect. 8 concludes the paper.

## **2 Preliminaries**

In this section, we describe metric first-order temporal logic (MFOTL) [3]. **Syntax.** Let <sup>I</sup> be a set of non-empty intervals over <sup>N</sup>. An *interval* <sup>I</sup> <sup>∈</sup> <sup>I</sup> can be expressed as [b, b ) where <sup>b</sup> <sup>∈</sup> <sup>N</sup> and <sup>b</sup> <sup>∈</sup> <sup>N</sup> ∪ ∞. A *signature* <sup>S</sup> is a tuple (C, R, ι), where C is a set of constants and R is a finite set of predicate symbols (for relation), respectively. Without loss of generality, we assume all constants are from the integer domain Z where the theory of linear integer arithmetic (LIA) holds. The function <sup>ι</sup> : <sup>R</sup> <sup>→</sup> <sup>N</sup> associates each predicate symbol <sup>r</sup> <sup>∈</sup> <sup>R</sup> with an arity <sup>ι</sup>(r) <sup>∈</sup> <sup>N</sup>. Let *Var* be a countable infinite set of variables from domain <sup>Z</sup> and a term t is defined inductively as t : c | v | t + t | c × t. We denote t ¯ as a vector of terms and t ¯<sup>k</sup> <sup>x</sup> as the vector that contains x at index k. The syntax of MFOTL formulas is defined as follows: *(1)* and ⊥, representing values "true" and "false"; *(2)* t = t and t>t , for terms t and t ; *(3)* r(t1...tι(r)) for r ∈ R and terms t1...tι(r); *(4)* φ ∧ ψ, ¬φ for MFOTL formulas φ and ψ; *(5)* ∃x.(r(t ¯<sup>k</sup> <sup>x</sup> ) ∧ φ) for MFOTL formula φ, relation symbol r ∈ R, variable x ∈ *Var* and a vector of terms t ¯k <sup>x</sup> s.t. x = t ¯k <sup>x</sup>[k]; and *(6)* φ U<sup>I</sup> ψ (until), φ S<sup>I</sup> ψ (since), -<sup>I</sup> φ (next), <sup>I</sup> φ (previous) for MFOTL formulas <sup>φ</sup> and <sup>ψ</sup>, and an interval <sup>I</sup> <sup>∈</sup> <sup>I</sup>.

We consider a restricted form of quantification (syntax rule *(5)*, above) similar to guarded quantification [18]. Every existentially quantified variable x must be guarded by some relation r (i.e., for some t ¯, r(t ¯) holds and x appears in t ¯). Similarly, universal quantification must be guarded as <sup>∀</sup>x.(r(<sup>t</sup> ¯) <sup>⇒</sup> <sup>φ</sup>) where x ∈ t ¯. Thus, ¬∃x.¬r(x) (and <sup>∀</sup>x.r(x)) are not allowed.

The temporal operators U<sup>I</sup> , S<sup>I</sup> , <sup>I</sup> and -<sup>I</sup> require the satisfaction of the formula within the time interval given by I. We write [b,) as a shorthand for [b,∞); if I is omitted, then the interval is assumed to be [0,∞). Other classical unary temporal operators ♦<sup>I</sup> (eventually), <sup>I</sup> (always), and <sup>I</sup> (once) are defined as follows: ♦<sup>I</sup> φ = U<sup>I</sup> φ, <sup>I</sup> φ = ¬♦<sup>I</sup> ¬φ, and <sup>I</sup> φ = S<sup>I</sup> φ. Other common logical operator such as ∨ (disjunction) and ∀ (universal quantification) are expressed through negation of ∧ and ∃, respectively.

**Example 1.** Suppose a data collection centre (DCC) *collect*s and *access*es personal data information with three requirements: req<sup>0</sup> stating that no data is allowed to be accessed before the data ID has been collected for 15 days (360 hours); req1: data can only be updated after having been collected or last updated for more than a week (168 hours); and req2: data value can only be accessed if the value has been collected or updated within a week (168 hours). The signature Sdata for DCC contains three binary relations (Rdata): *Collect*, *Update*, and *Access*, such that *Collect*(d, v), *Update*(d, v) and *Access*(d, v) hold at a given time point if and only if data at id d is collected, updated, and accessed with value v at this time point, respectively. The MFOTL formulas for P1, req0, req<sup>1</sup> and req<sup>2</sup> are shown in Fig. 1. For instance, the formula req<sup>0</sup> specifies that if a data value stored at id d is accessed, then some data must have been collected and stored at id d at least 360 hours ago ([360,)]).

**Semantics.** A first-order (FO) structure D over the signature S = (C, R, ι) is comprised of a non-empty domain *dom*(D) = ∅ and an interpretation for <sup>c</sup><sup>D</sup> <sup>∈</sup> *dom*(D) and <sup>r</sup><sup>D</sup> <sup>⊆</sup> *dom*(D)<sup>ι</sup>(r) for each <sup>c</sup> <sup>∈</sup> <sup>C</sup> and <sup>r</sup> <sup>∈</sup> <sup>R</sup>. The semantics of MFOTL formulas is defined over a sequence of FO structures D¯ = (D0, D1,...) and a sequence of natural numbers representing time ¯τ = (τ0, τ1,...), where (a) τ¯ is a monotonically increasing sequence; (b) *dom*(Di) = *dom*(Di+1) for all i ≥ 0 (all D<sup>i</sup> have a fixed domain); and (c) each constant symbol c ∈ C has the same interpretation across D¯ (i.e., c<sup>D</sup>*<sup>i</sup>* = c<sup>D</sup>*i*+1 ). Property (a) ensures that time never decreases as the sequence progresses; and (b) ensures that the domain is fixed (referred to as *dom*(D¯)) D¯ is similar to timed words in metric time logic (MTL), but instead of associating a set of propositions with each time point, MFOTL uses a structure D to interpret the symbols in the signature S. The semantics of MFOTL is defined over a trace of timed first-order structures σ = (D, ¯ τ¯), where every structure <sup>D</sup><sup>i</sup> <sup>∈</sup> <sup>D</sup>¯ specifies the set of tuples (r<sup>D</sup>*<sup>i</sup>* ) that hold for every relation <sup>r</sup> at time <sup>τ</sup><sup>i</sup> <sup>∈</sup> <sup>τ</sup>¯. Let (D, ¯ <sup>τ</sup>¯) denote an MFOTL trace.

**Fig. 3.** MFOTL semantics.

**Example 2.** Consider the signature Sdata in the DCC example. Let τ<sup>1</sup> = 0 and τ<sup>2</sup> = 361, and let D<sup>1</sup> and D<sup>2</sup> be two first-order structures with r<sup>D</sup><sup>1</sup> = *Collect*(0, 0) and r<sup>D</sup><sup>2</sup> = *Access*(0, 0), respectively. The trace σ<sup>1</sup> = ((D1, D2),(τ1, τ2)) is a valid trace shown in Fig. 2 and representing two timed relations: (1) data value 0 collected and stored at id 0 at hour 0 and (2) data value 0 is read by accessing id 0 at hour 361.

<sup>A</sup> *valuation function* <sup>v</sup> : *Var* <sup>→</sup> *dom*(D¯) maps a set *Var* of variables to their interpretations in the domain *dom*(D¯). For vectors ¯<sup>x</sup> = (x1,. . . ,xn) and ¯ <sup>d</sup> = (d1,. . . ,dn) <sup>∈</sup> *dom*(D¯)<sup>n</sup>, the *update operation* <sup>v</sup>[¯<sup>x</sup> <sup>→</sup> ¯ d] produces a new valuation function v s.t. v (xi) = d<sup>i</sup> for 1 ≤ i ≤ n, and v(x ) = v (x ) for every <sup>x</sup> <sup>∈</sup>/ <sup>x</sup>¯. For any constant <sup>c</sup>, <sup>v</sup>(c) = <sup>c</sup><sup>D</sup>. Let <sup>D</sup>¯ be a sequence of FO structures over signature S = (C, R, ι) and ¯τ be a sequence of natural numbers. Let φ be an MFOTL formula over <sup>S</sup>, <sup>v</sup> be a valuation function and <sup>i</sup> <sup>∈</sup> <sup>N</sup>. A fragment of the relation (D, ¯ τ, v, i ¯ ) <sup>|</sup><sup>=</sup> <sup>φ</sup> is defined in Fig. 3.

The operators <sup>I</sup> , -<sup>I</sup> , <sup>U</sup><sup>I</sup> and <sup>S</sup><sup>I</sup> are augmented with an interval <sup>I</sup> <sup>∈</sup> <sup>I</sup> which defines the satisfaction of the formula within a time range specified by I relative to the current time at step i, i.e., τi.

**Definition 1 (MFOTL Satisfiability).** *An MFOTL formula* φ *is* satisfiable *if there exists a sequence of FO structures* D¯ *and natural numbers* τ¯*, and a valuation function* <sup>v</sup> *such that* (D, ¯ τ, v, ¯ 0) <sup>|</sup><sup>=</sup> <sup>φ</sup>*.* <sup>φ</sup> *is* unsatisfiable *otherwise.*

**Example 3.** In the DCC example, the MFOTL formula req<sup>0</sup> is *satisfiable* because (D, ¯ τ, v, ¯ 0) <sup>|</sup><sup>=</sup> req<sup>0</sup> (where <sup>σ</sup><sup>1</sup> = (D, ¯ <sup>τ</sup>¯) in Fig. 2). Let req <sup>0</sup> be another MFOTL formula: ♦[0,359] ∃j.(*Access*(0, j)). The formula req <sup>0</sup> ∧ req<sup>0</sup> is *unsatisfiable* because if data stored at id 0 is accessed between 0 and 359 hours, then it is impossible to collect the data at least 360 hours prior to its access.

## **3 Bounded Satisfiability Checking Problem**

The satisfiability of MFOTL properties is generally undecidable since MFOTL is expressive enough to describe the blank tape problem [31] (which has been shown to be undecidable). Despite the undecidability result, we can derive a bounded version of the problem, *bounded satisfiability checking* (BSC), for which a sound and complete decision procedure exists. When facing a hard instance for satisfiability checking, the solution to BSC provides bounded guarantees (i.e., whether a solution exists within a given bound). In this section, we first define satisfiability checking and then the BSC problem for MFOTL formulas. *Satisfiability checking* [32] is a verification technique that extends model checking by replacing a state transition system with a set of temporal logic formulas. In the following, we define satisfiability checking of MFOTL formulas.

**Definition 2 (Satisfiability Checking of MFOTL Formulas).** *Let* P *be an MFOTL formula over a signature* S = (C, R, ι)*, and let Reqs be a set of MFOTL requirements over* S*. Reqs complies with* P *(denoted as Reqs* ⇒ P) *iff* - <sup>ψ</sup>∈*Reqs* <sup>ψ</sup> ∧ ¬<sup>P</sup> *is* unsatisfiable*. We call a solution to* - <sup>ψ</sup>∈*Reqs* <sup>ψ</sup> ∧ ¬P*, if one exists, a* counterexample *to Reqs* ⇒ P*.*

**Example 4.** Consider our DCC system requirements and the privacy data property P1 stating that if personal health information is not accurate or not upto-date, it should not be accessed (see Fig. 1). P1 is not respected by the set of DCC requirements {req0, req1, req2} because ¬P1 ∧ req<sup>0</sup> ∧ req<sup>1</sup> ∧ req<sup>2</sup> is *satisfiable*. The counterexample σ<sup>2</sup> (shown in Fig. 2) indicates that data can be re-collected, and the re-collection does not have the same time restriction as the updates. If a fourth policy requirement req<sup>3</sup> (Fig. 1) is added to prohibit re-collection of collected data, then property P1 would be respected (i.e., {req0, req1, req2, req3} ⇒ P1).

**Definition 3 (Finite trace and bounded trace).** *Given a trace* σ = (D, ¯ τ,v ¯ )*, we use* vol(σ) *(the* volume *of* σ*), to denote the total number of times that any relation holds across all FO structures in* D¯ *(i.e.,* r∈R <sup>D</sup>*i*∈D¯ (|r<sup>D</sup>*<sup>i</sup>* <sup>|</sup>)*). The trace* <sup>σ</sup> *is* finite *if* vol(σ) *is finite. The trace is* bounded by volume vb <sup>∈</sup> <sup>N</sup> *if and only if* vol(σ) ≤ vb*.*

**Example 5.** The volume of trace σ<sup>3</sup> in Fig. 2, vol(σ3) = 3 since there are three relations: *Collect*(1, 15), *Update*(1, 0), and *Access*(1, 15). Note that the volume is the total number of tuples that hold for any relation across all time points; multiple tuples can thus hold for multiple relations for a single time point.

**Definition 4 (Bounded satisfiability checking of MFOTL properties).** *Let* P *be an MFOTL property, Reqs be a set of MFOTL requirements, and* vb *be a natural number. The* bounded satisfiability checking problem *determines the existence of a counterexample* σ *to Reqs* ⇒ P *such that* vol(σ) ≤ vb*.*

## **4 Checking Bounded Satisfiability**

In this section, we present an overview of the bounded satisfiability checking (BSC) process that translates the MFOTL formula into *first-order logic with relational objects* (FOL∗) formulas, and looks for a satisfying solution for the FOL<sup>∗</sup> formulas. Then, we provide the translation of MFOTL formulas to FOL<sup>∗</sup> and discuss the process complexity.

**Fig. 4.** Overview of the naive and our incremental (IBS) MFOTL bounded satisfiability checking approaches. Solid boxes and arrows are shared between the two approaches. Blue dashed arrow is specific to the naive approach. Red dotted arrows and the additional red output in bracket are specific to IBS. (Color figure online)

#### **4.1 Overview of BSC for MFOTL Formulas**

We aim to address the bounded satisfiability checking problem (Definition 4), looking for a satisfying run σ within a given volume bound vb that limits the number of relations in σ. First, we translate the MFOTL formulas to FOL<sup>∗</sup> formulas. The considered constraints in the formulas include those of the system requirements and the legal property, and *optional* data constraints specifying the data value constraint for a datatype. The data constraints can be defined as a range, a "small" data set, or the union/intersection of other data constraints. If data constraints are not specified, then the data value comes from the domain Z. Note that the optional data constraints do not affect the complexity of BSC, but they do help prune unrealistic counterexamples. Second, we search for a satisfying solution to the FOL<sup>∗</sup> formula; an SMT solver is used here to determine the satisfiability of the FOL<sup>∗</sup> constraints and the data domain constraints. The answer from the SMT solver is analyzed to return an answer to the satisfiability checking problem (a counterexample σ, or"bounded-UNSAT").

#### **4.2 Translation of MFOTL to First-Order Logic**

In this section, we describe the translation target FOL∗, the translation rules and prove their correctness.

**FOL with Relational Object (FOL\*).** We start by introducing the syntax of FOL∗. A *signature* S is a tuple (C, R, ι), where C is a set of constants, R is a set of relation symbols, and <sup>ι</sup> : <sup>R</sup> <sup>→</sup> <sup>N</sup> is a function that maps a relation to its arity. We assume that the domain of constant C is Z, which matches the one for MFOTL, where the theory of linear integer arithmetic (LIA) holds. Let *Var* be a set of variables in the domain <sup>Z</sup>. A *relational object* <sup>o</sup> of class <sup>r</sup> <sup>∈</sup> <sup>R</sup> (denoted as o : r) is an object with ι(r) regular attributes and two special attributes, where every attribute is a variable. We assume that all regular attributes are ordered and denote o[i] to be the ith attribute of o. Some attributes are named, and o.x refers to o's attribute with the name 'x'. Each relational object o has two special attributes o.ext and o.time. The former is a boolean variable indicating whether o exists in a solution, and the latter is a variable representing the occurrence time of o. For convenience, we define a function cls(o) to return the relational object's class. Let *a* FOL<sup>∗</sup> *term* t be defined inductively as t : c | v | o[k] | o.x | t+t | c×t for any constant c ∈ C, any variable v ∈ *Var*, any relational object o : r, any index k ∈ [1, ι(r)] and any valid attribute name x. Given a signature S, the syntax of the FOL<sup>∗</sup> formulas is defined as follows: *(1)* and ⊥, representing values "true" and "false"; *(2)* t = t and t>t , for term t and t ; *(3)* φ<sup>f</sup> ∧ ψ<sup>f</sup> , ¬φ<sup>f</sup> for FOL<sup>∗</sup> formulas φ<sup>f</sup> and ψ<sup>f</sup> ; *(4)* ∃o : r · (φ<sup>f</sup> ) for an FOL<sup>∗</sup> formula φ<sup>f</sup> and a class r; *(5)* ∀o : r · (φ<sup>f</sup> ) for an FOL<sup>∗</sup> formula φ<sup>f</sup> and a class r. The quantifiers for FOL<sup>∗</sup> formulas are limited to relational objects, as shown by rules (4) & (5). Operators ∨ and ∀ can be defined in FOL<sup>∗</sup> as follows: φ<sup>f</sup> ∨ ψ<sup>f</sup> = ¬(¬φ<sup>f</sup> ∧ ¬ψ<sup>f</sup> ) and ∀o : r · φ<sup>f</sup> = ∃o : r · ¬φ<sup>f</sup> . We say an FOL<sup>∗</sup> formula is in a *negation normal form* (NNF) if negations (¬) do not appear in front of ¬, ∧, ∨, ∃ and ∀. For the rest of the paper, we assume that every FOL<sup>∗</sup> φ is in NNF.

Given a signature S, *a domain* D is a finite set of relational objects. An FOL<sup>∗</sup> formula *grounded* in the domain D (denoted by φD) is a quantifier-free FOL formula that eliminates quantifiers on relational objects using the following rules: (1) ∃o : r · (φ<sup>f</sup> ) to o-:r∈<sup>D</sup>(o .ext ∧ φ<sup>f</sup> [o ← o ]) and (2) ∀o : r · (φ<sup>f</sup> ) to - o-:r∈<sup>D</sup>(o .ext ⇒ φ<sup>f</sup> [o ← o ]). An FOL<sup>∗</sup> formula φ<sup>f</sup> is *satisfiable in* D if there exists a variable assignment v that evaluates φ<sup>D</sup> to according to the standard semantics of FOL. An FOL<sup>∗</sup> formula φ<sup>f</sup> is *satisfiable* if there exists a finite domain D such that φ<sup>f</sup> is satisfiable in D. We call σ = (D, v) *a satisfying solution to* φ<sup>f</sup> , denoted as σ |= φ<sup>f</sup> . Given a solution σ = (D, v), we say a relational object o is in σ, denoted as o ∈ σ, if o ∈ D and v(o.ext) is true. The *volume of the solution*, denoted as vol(σ), is |{o | o ∈ σ}|.

**Example 6.** Let a be a relational object of class *A* with attribute name val. The formula ∀a : A. (∃a : A · (a.val < a .val) ∧ ∃a : A · a.val = 0) has no satisfying solutions in any finite domain. On the other hand, the formula ∀a : A · (∃a , a : A · (a.val = a .val + a.val)∧ ∃a : A · a.val = 5) has a solution σ = (D, v) of volume 2, with the domain D = (a1, a2) and the value function v(a1.val) = 5, v(a2.val) = 0 because if a ← a<sup>1</sup> then the formula is satisfied by assigning a ← a1, a ← a2; and if a ← a2, then the formula is satisfied by assigning a ← a2, a ← a2.

**From MFOTL Formulas to FOL\* Formulas.** We now discuss the translation rule from the MFOTL formulas to FOL<sup>∗</sup> formulas. Recall that MFOTL semantics is defined for a time point i on a trace σ = (D, ¯ τ, v, i ¯ ), where D¯ = (D1, D2,...) is a sequence of FO structures and ¯τ = (τ1, τ2,...) is a sequence of time values. The time value of the time point i is given by τi, and if i is not specified, then i = 1. The semantics of the FOL<sup>∗</sup> formulas is defined for a domain D where the information of time is associated with relational objects in the domain. Therefore, the time point i (and its time value τi) should be considered during the translation from MFOTL to FOL<sup>∗</sup> since the same MFOTL formula at different time points represents different constraints on the trace σ. Formally, our translation function translate, abbreviated as T, translates an MFOTL formula <sup>φ</sup> into a function <sup>f</sup> : <sup>τ</sup> <sup>→</sup> <sup>φ</sup><sup>f</sup> , where <sup>τ</sup> <sup>∈</sup> <sup>N</sup> and <sup>φ</sup><sup>f</sup> is an FOL<sup>∗</sup> formula. The translation rules are stated in Fig. 5.

$$\begin{array}{llcl} T(t) & t = t', \tau\_i \\ T(t > t', \tau\_i) & \to t > t' \\ T(t < t\_i, \tau\_{i(\tau)}), \tau\_i) & \to \exists o: \bigwedge\_{j=1}^{s(\tau)} (o, j = t\_j) \land (\tau\_i = o.time) \\ T(\neg \phi, \tau\_i) & \to \neg T(\phi, \tau\_i) \\ T(\phi \land \psi, \tau\_i) & \to T(\phi, \tau\_i) \land T(\psi, \tau\_i) \\ T(\exists x \cdot (\mathsf{F}\_o^x) \land \phi, \tau\_i) & \to \exists o: \neg T((\neg \{\mathsf{F}\_o^x\} \land \phi)[x \to o[k]], \tau\_i) \\ T(\neg \Box\_I \phi, \tau\_i) & \to \exists o: \textsf{TP} \cdot \textrm{Nex}(o.time, \tau\_i) \land T(\phi, o.time) \land (o.time - \tau\_i) \in I \\ T(\neg \phi, \phi, \tau\_i) & \to \exists o: \textsf{TP} \cdot \textrm{Nex}(o.time, \tau\_i) \land T(\phi, o.time) \land (\tau\_i - o.time) \in I \\ T(\phi \land \psi, \tau\_i) & \to \exists o: \textsf{TP} \cdot (o.time \ge \tau\_i \land (o.time - \tau\_i) \in I \land T(\psi, \phi, \tau\_i)) \\ T(\phi \mathcal{S}\_I \mid \psi\_i) & \to \exists o: \textsf{TP} \cdot (o.time \le \tau\_i \land (\tau\_i - o.time) \in I \land T(\psi, o.time)) \\ T(\phi \mathcal{S}\_I \mid \psi\_i) & \to \exists o: \textsf{TP} \cdot (o.time \ge \tau\_i \land (\tau\_i - o.time) \in I \land T(\psi, o.time$$

**Fig. 5.** Translation rules from MFOTL to FOL∗. TP is an internal class of relational objects used to represent time values at different time points. The predicate Next(t<sup>1</sup>, t<sup>2</sup>) (Prev(t<sup>1</sup>, t<sup>2</sup>)) asserts that <sup>t</sup><sup>1</sup> is the next (previous) time value of <sup>t</sup><sup>2</sup>.

Representing time points in FOL∗. Since FOL<sup>∗</sup> quantifiers are limited to relational objects, to quantify over time points (which is necessary to capture the semantics of MFOTL temporal operators such as U), the translated FOL<sup>∗</sup> formulas use a special *internal* class of relational objects TP (e.g., <sup>∃</sup><sup>o</sup> : TP). Relational objects of class TP capture all possible time points in a trace, and they have two attributes, ext and time, to record the existence and the value of the time point, respectively. To ensure that every time value in a solution is represented by some relational object of TP, we introduce the *time coverage* FOL<sup>∗</sup> axiom.

**Axiom 1** (Time coverage)**.** Let φ<sup>f</sup> be an FOL<sup>∗</sup> formula and let σ be its solution. For every relational object <sup>o</sup> <sup>∈</sup> <sup>σ</sup>, there exists an object <sup>o</sup> of class TP s.t. <sup>o</sup> and <sup>o</sup> share the same time value. Formally, <sup>∀</sup><sup>o</sup> · (∃o : TP · o.time <sup>=</sup> <sup>o</sup> .time).

The translation of -<sup>I</sup> φ uses function Next(t1, t2) to assert that t<sup>1</sup> is the next time value of <sup>t</sup>2. Formally, Next(t1, t2) = <sup>∀</sup><sup>o</sup> : TP · o.time > t<sup>2</sup> <sup>⇒</sup> <sup>t</sup><sup>1</sup> <sup>≤</sup> o.time. Function Prev(t1, t2) for translation of <sup>I</sup> <sup>φ</sup> is defined similarly.

**Definition 5 (Mapping from MFOTL trace to FOL**<sup>∗</sup> **trace).** *Let an MFOTL trace* (D, ¯ τ¯) *and a valuation function* v *be given. A function* <sup>M</sup>((D, ¯ <sup>τ</sup>¯), v) <sup>→</sup> (D, v ) *is* a mapping between an *MFOTL* trace and an *FOL*<sup>∗</sup> trace *if* M *satisfies the following rules: (1) for every* τ<sup>i</sup> ∈ τ¯*, there exists a relational object* <sup>o</sup> : TP <sup>∈</sup> <sup>D</sup> *such that* <sup>τ</sup><sup>i</sup> <sup>=</sup> <sup>v</sup> (o.time)*; (2) for every structure* <sup>D</sup><sup>i</sup> <sup>∈</sup> <sup>D</sup>¯*, if a tuple* <sup>t</sup> ¯ *holds for a relation* r*, (i.e.,* t ¯ <sup>∈</sup> <sup>r</sup><sup>D</sup>*<sup>i</sup> ), then there exists a relational object* o : r *such that for* j ∈ ι(r)*,* t ¯[j] = v (o[j]) *and* v (o.time) = τ<sup>i</sup> ∧ v (o.ext) = *; (3) for every term* t *defined for* v*,* v(t) = v (T(t, τi))*.*

The inverse of M, denoted as M−<sup>1</sup>, is defined as follows: (1) ¯τ = sort({v (o.time) | o : T P ∈ D · v (o.ext)}) and (2) for every relational object o : r, if v (o.ext), then (v (o[1])...v (o[ι(r)])) <sup>∈</sup> <sup>r</sup><sup>D</sup>*<sup>i</sup>* , where <sup>i</sup> is the index of the time value v (o.time) in ¯τ .

**Lemma 1.** *Given an MFOTL formula* φ*, an MFOTL trace* (D, ¯ τ¯)*, a valuation function* <sup>v</sup>*, and a time point* <sup>i</sup>*, the relation* (D, ¯ τ, v, i ¯ ) <sup>|</sup><sup>=</sup> <sup>φ</sup> *holds iff there exists a satisfying trace* σ = (D, v ) *for the formula* T(φ, τi)*.*

*Proof Sketch.* In the proof, we use M and M−1 (see Definition 5) to transform an MFOTL solution into an FOL<sup>∗</sup> trace, and show that it is a solution to the translated FOL<sup>∗</sup> formula (and vice versa).

<sup>=</sup><sup>⇒</sup> : if (D, ¯ τ, v, i ¯ ) <sup>|</sup><sup>=</sup> <sup>φ</sup>, then it is sufficient to show (D, v ) <sup>←</sup> <sup>M</sup>(D, ¯ τ,v ¯ ) is an FOL<sup>∗</sup> solution. To prove (D, v ) is the solution to T(φ, τi), we consider all the translation rules in Fig. 5. The translated FOL<sup>∗</sup> matches the semantics (Fig. 3) of MFOTL except for the translation of temporal operators (e.g., T(-<sup>I</sup> φ, τi) and T(φ U<sup>I</sup> ψ, τi)) where instead of quantifying over time points (e.g., ∃j and ∀k), internal relational objects of class TP (o, o : TP) are quantified over. By rule (1) of Dec. 5, every time point and its time value are mapped to some relational object of class TP. Therefore, the quantifiers on time points can be translated into the quantifiers on the relational objects of TP. The mapped solution (D, v ) also satisfies Axiom 1 because if a tuple t ¯ holds for some relation r at some time <sup>τ</sup> in the MFOTL trace (D, ¯ <sup>τ</sup>¯), then there exists a time point <sup>i</sup> <sup>∈</sup> [1, <sup>|</sup>τ¯|] such that τ<sup>i</sup> = τ . Therefore, by rule (1) of M, τ<sup>i</sup> is represented by some o : TP.

⇐=: if (D, v ) |= T(φ, τi), then it is sufficient to show that the MFOTL trace (D, ¯ τ,v ¯ ) <sup>←</sup> <sup>M</sup>−<sup>1</sup>(D, v ) satisfies <sup>φ</sup> at point <sup>i</sup> (i.e., (D, ¯ τ, v, i ¯ ) <sup>|</sup><sup>=</sup> <sup>φ</sup>). To prove (D, ¯ τ, v, i ¯ ) <sup>|</sup><sup>=</sup> <sup>φ</sup>, we consider all the translation rules in Fig. 5. The translated FOL<sup>∗</sup> formula matches the semantics of MFOTL (Fig. 3) except for the difference between the time points and the relational objects of class TP. By Axiom 1, every relational object's time is captured by some time point, and by rule (2) of M−1, every relational object is mapped onto some structure D<sup>i</sup> at some time τ<sup>i</sup> by <sup>M</sup>. Therefore, (D, ¯ τ, v, i ¯ ) <sup>|</sup><sup>=</sup> <sup>φ</sup>.

**Theorem 1 (Translation Correctness).** *Given an MFOTL formula* φ *and an MFOTL trace* σ*, let* M(σ) *be the FOL*<sup>∗</sup> *solution mapped from* σ *using function* M *(Definition 5). Then (1)* σ |= φ *if and only if* M(σ) |= T(φ)*, and (2)* vol(σ) = vol(M(σ)) − |{<sup>o</sup> : TP <sup>∈</sup> <sup>M</sup>(σ)}|*, where* |{<sup>o</sup> : TP <sup>∈</sup> <sup>M</sup>(σ)}| *is the number of relational objects of the internal class* TP *in the solution* M(σ)*.*

*Proof.* Statement (1) of Thm. 1 is a direct consequence of Lemma 1. Statement (2) is the result of rule (2) in Definition 5 because every relational object in the FOL<sup>∗</sup> solution, except for the internal ones, i.e., o : TP, has a one-to-one correspondence to tuples that hold for some relation in the MFOTL solution.

For the rest of the paper, we assume that the internal relational objects of class TP do not count toward the volume of the FOL∗, i.e., vol(σ) = vol(T(σ)).

**Example 7.** Consider a formula exp = ∀d·(A(d) =⇒ ♦[5,10] B(d)), where A and <sup>B</sup> are unary relations. The translated FOL<sup>∗</sup> formula <sup>T</sup>(exp) is: <sup>∀</sup><sup>o</sup> : TP·∀<sup>a</sup> : <sup>A</sup>·(o.time <sup>=</sup> a.time ⇒ ∃o : TP·<sup>b</sup> : <sup>B</sup>·o .time = b.time∧a[1] = b[1]∧ o.time+5 ≤ o .time ≤ o.time + 10). Since o.time = a.time and o .time = b.time, we can substitute o.time and o .time with a.time and b.time in T(exp), respectively. Then, the formula contains no reference to o and o , and we can safely drop the quantified o and o (we can drop existential quantified TP relational object because of the time coverage axiom). The simplified formula is: ∀a : A · ∃b : B · a[1] = b[1] ∧ a.time + 5 ≤ b.time ≤ a.time + 10.

This is important for designing system requirements that comply with LPs.

Given an MFOTL property P and a set Reqs of MFOTL requirements, and a volume bound vb, the BSC problem can be solved by searching for a satisfying solution v for the FOL<sup>∗</sup> formula T(¬P) - <sup>ψ</sup>∈Reqs <sup>T</sup>(ψ) in a domain <sup>D</sup> with at most vb relational objects.

#### **4.3 Checking MFOTL Satisfiability: A Naive Approach**

Below, we define a naive procedure NBS (shown in Fig. 4) for checking satisfiability of MFOTL formulas translated into FOL∗. We then discuss the complexity of this naive procedure. Even though we do not use NBS in this paper, its complexity constitutes an upper bound for our approach proposed in Sect. 5.

**Searching for a satisfying solution.** Let φ<sup>f</sup> be an FOL<sup>∗</sup> formula translated from an MFOTL formula φ, and let vb be the volume bound. NBS solves φ<sup>f</sup> via quantifier elimination. The number of relational objects in any satisfying solution of φ<sup>f</sup> should be at most vb. Therefore, NBS grounds the FOL<sup>∗</sup> formulas within a domain of vb relational objects (see Sect. 4.2), and then uses an SMT solver to check satisfiability of the grounded formula. If the domain has multiple classes of relational objects, we can unify them by introducing a "superposition" class whose attributes are the union of the attributes of all classes and a special "name" attribute to indicate the class represented by the superposition.

**Complexity.** The size of the quantifier-free formula is O(vb<sup>k</sup>), where k is the maximum depth of quantifier nesting. Since the background theory used in φ is restricted to linear integer arithmetic, solving the formula is NP-hard [29]. Because T (Tab. 5) is linear in the size of the formula φ, NBS is NP-complete w.r.t. the size of the grounded formula, vb<sup>k</sup>.

### **5 Incremental Search for Bounded Counterexamples**

The naive BSC approach (NBS) proposed in Sect. 4.3 is inefficient for solving the translated FOL<sup>∗</sup> formulas given a large bound n due to the size of the ground formula. Moreover, NBS cannot detect unbounded unsatisfiability, and cannot provide optimality guarantees on the volume of counterexamples which are important for establishing the proof of unbounded correctness and localizing faults [15], respectively. In this section, we propose an incremental procedure IBS, which can detect unbounded unsatisfiability and provide the shortest counterexamples. An overview of IBS is given in Fig. 4.

IBS maintains an under-approximation of the search domain and the FOL<sup>∗</sup> constraints. It uses the search domain to ground the FOL<sup>∗</sup> constraints, and an SMT solver to determine the satisfiability of the grounded constraints. It analyzes the SMT result and accordingly either expands the search domain, refines the FOL∗ constraints, or returns an answer to the satisfiability checking problem (a counterexample σ, "bounded-UNSAT", or "UNSAT"). The procedure continues until an answer is obtained (σ or UNSAT), or until the domain exceeds the bound vb, in which case a "bounded-UNSAT" answer is returned.

In the following, we describe IBS in more detail. We explain the key component of IBS, computing over- and under-approximation queries, in Sect. 5.1. We discuss the algorithm itself in Sect. 5.2 and illustrate it in Sect. 5.3. We prove its soundness, completeness, and solution optimality in the extended version [11].

#### **5.1 Over- and Under-Approximation**

NBS grounds the input FOL<sup>∗</sup> formulas in a fixed domain D (fixed by the bound vb). Instead, IBS under-approximates <sup>D</sup> to <sup>D</sup><sup>↓</sup> such that <sup>D</sup><sup>↓</sup> <sup>⊆</sup> <sup>D</sup>. With <sup>D</sup>↓, we can create an over- and an under-approximation query to the bounded satisfiability checking problem. Such queries are used to check the satisfiability of FOL<sup>∗</sup> formulas with domain <sup>D</sup>↓. IBS starts with a small domain <sup>D</sup><sup>↓</sup> and gradually expands it until either SAT or UNSAT is returned, or the domain size exceeds some limit (bounded-UNSAT).

**Over-approximation.** Let φ<sup>f</sup> be an FOL<sup>∗</sup> formula, and D<sup>↓</sup> be a domain of relation objects. The procedure Ground, <sup>G</sup>(φ<sup>f</sup> , <sup>D</sup>↓), encodes <sup>φ</sup><sup>f</sup> into a quantifierfree FOL formula φ<sup>g</sup> s.t. the unsatisfiability of φ<sup>g</sup> implies the unsatisfiability of φ<sup>f</sup> . We call φ<sup>g</sup> an *over-approximation* of φ<sup>f</sup> . The procedure G (Algorithm 2) recursively traverses the syntax tree of the input FOL<sup>∗</sup> formula from top to bottom.

To eliminate the existential quantifier in ∃o : r · φ <sup>f</sup> (L:1), G creates a new relational object o of class r (L: 2), and replaces o with o in φ <sup>f</sup> (L:3). To eliminate the universal quantifier in ∀o : r · φ <sup>f</sup> (L: 4), G grounds the formula in D↓. More specifically, G expands the quantifier into a conjunction of clauses where each clause is o .ext ⇒ φ <sup>f</sup> [o ← o ] (i.e., o is replaced by o in φ <sup>f</sup> ) for each relational object o of class r in D<sup>↓</sup> (L: 5). Intuitively, an existentially quantified relational object is instantiated with a new relational object, and a universally quantified relational object is instantiated with every existing relational object of the same class in D↓, which does not include the ones instantiated during G.

**Lemma 2 (Over-approximation Query).** *For an FOL*<sup>∗</sup> *formula* φ<sup>f</sup> *, and a domain* D↓*, if* φ<sup>g</sup> = G(φ<sup>f</sup> , D↓) *is UNSAT, then so is* φ<sup>f</sup> *.*

**Under-Approximation.** Let φ<sup>f</sup> be an FOL<sup>∗</sup> formula, and D<sup>↓</sup> be a domain. The over-approximation φ<sup>g</sup> = G(φ<sup>f</sup> , D↓) contains a set of new relational objects introduced by <sup>G</sup> (L:2), denoted by NewRs. Let NoNewR(NewRs, <sup>D</sup>↓) be constraints that enforce that every new relational object o<sup>1</sup> in NewRs be semantically equivalent to some relational objects o<sup>2</sup> in D↓. Formally: the predicate NoNewR(NewRs, D↓) is defined as - o1∈NewRs <sup>o</sup>2∈D<sup>↓</sup> (o<sup>1</sup> <sup>≡</sup> <sup>o</sup>2), where the semantically equivalent relation between o<sup>1</sup> and o<sup>1</sup> (i.e., o<sup>1</sup> ≡ o2) is defined as cls(o1) = cls(o2) and ι(cls(o)) <sup>i</sup>=1 (o1[i] = o2[i]) ∧ o1.ext = <sup>o</sup>2.ext <sup>∧</sup> <sup>o</sup>1.time <sup>=</sup> <sup>o</sup>2.time (where the cls(o) returns the class of <sup>o</sup>). Let φ<sup>⊥</sup> <sup>g</sup> <sup>=</sup> <sup>φ</sup><sup>g</sup> <sup>∧</sup> NoNewR(NewRs, D↓). If <sup>φ</sup><sup>⊥</sup> <sup>g</sup> has a satisfying solution, then there is a solution for φ<sup>f</sup> . We call φ<sup>⊥</sup> <sup>g</sup> an *under-approximation* of φ<sup>f</sup> and denote the procedure for computing it by UnderApprox(φ<sup>f</sup> , D↓).

**Lemma 3 (Under-Approximation Query).** *For an FOL*<sup>∗</sup> *formula* φ<sup>f</sup> *, and a domain* D↓*, let* φ<sup>g</sup> = G(φ<sup>f</sup> , D↓) *and* φ<sup>⊥</sup> <sup>g</sup> <sup>=</sup> UnderApprox(φ<sup>f</sup> , D↓)*. If* <sup>σ</sup> *is a solution to* φ⊥ <sup>g</sup> *, then there exists a solution to* φ<sup>f</sup> *.*

#### **Algorithm 1.** IBS: search for a bounded (by vb) solution to T(¬P) - <sup>ψ</sup>∈*Reqs* <sup>T</sup>(ψ).

**Input** an MFOTL formula ¬P , and MFOTL requirements *Reqs* = {ψ1, ψ2, ...} . **Optional Input** vb, the volume bound, and data constraints T*data*. **Output** a counterexample σ, UNSAT or bounded-UNSAT.


## **Algorithm 2.** <sup>G</sup>: ground a NNF FOL<sup>∗</sup> formula <sup>φ</sup><sup>f</sup> in a domain <sup>D</sup><sup>↓</sup>.

**Input** an FOL<sup>∗</sup> formula φ*<sup>f</sup>* in NNF, and a domain of relational objects D<sup>↓</sup> . **Output** a grounded quantifier-free formula φ*<sup>g</sup>* over relational objects.


The proofs of Lemma 2 and 3 are in the extended version [11].

Suppose, for some domain D↓, that an over-approximation query φ<sup>g</sup> for an FOL<sup>∗</sup> formula φ<sup>f</sup> is satisfiable while the under-approximation query φ<sup>⊥</sup> <sup>g</sup> is UNSAT. Then, the solution to φ<sup>g</sup> provides hints on how to expand D<sup>↓</sup> to potentially obtain a satisfying solution for φ<sup>f</sup> , as captured in Corollary 1.

**Corollary 1 (Necessary relational objects).** *For an FOL*<sup>∗</sup> *formula* φ<sup>f</sup> *and a domain* D↓*, let* φ<sup>g</sup> *and* φ<sup>⊥</sup> <sup>g</sup> *be the over- and under-approximation queries of* φ<sup>f</sup> *based on* D↓*, respectively. Suppose* φ<sup>g</sup> *is satisfiable and* φ<sup>⊥</sup> <sup>g</sup> *is UNSAT, then every solution to* φ<sup>f</sup> *contains some relational object in formula* φ<sup>g</sup> *but not in* D↓*.*

#### **5.2 Counterexample-Guided Constraint Solving Algorithm**

Let an MFOTL formula ¬P (to find a satisfiable counterexample to P), a set of MFOTL requirements *Reqs*, an optional volume bound vb, and optionally a set of FOL<sup>∗</sup> data domain constraints Tdata be given. IBS, shown in Algorithm 1, searches for a solution σ to ¬P ∧ - <sup>ψ</sup>∈*Reqs* <sup>ψ</sup> (with respect to <sup>T</sup>data) bounded by vb, as a counter-example to - <sup>ψ</sup>∈*Reqs* <sup>ψ</sup> <sup>⇒</sup> <sup>P</sup> (Definition 2). bounded by vb. If no such solution is possible regardless of the bound, IBS returns UNSAT. If no solution can be found within the given bound, but a solution may exist for a larger bound, then IBS returns bounded-UNSAT. If vb is not specified, IBS will perform the search unboundedly until a solution or UNSAT is returned.

IBS first translates <sup>¬</sup><sup>P</sup> and every <sup>ψ</sup> <sup>∈</sup> *Reqs* into FOL<sup>∗</sup> formulas in *Reqs*<sup>f</sup> , denoted by <sup>¬</sup>P<sup>f</sup> and <sup>ψ</sup><sup>f</sup> , respectively. Then IBS searches for a satisfying solution to ¬P<sup>f</sup> ∧ - <sup>ψ</sup>*<sup>f</sup>* <sup>∈</sup>*Reqs<sup>f</sup>* <sup>ψ</sup><sup>f</sup> in the domain <sup>D</sup> of volume, which is at most vb. Instead of searching in <sup>D</sup> directly, IBS searches for a solution to <sup>¬</sup>P<sup>f</sup> <sup>∧</sup> - <sup>ψ</sup>*<sup>f</sup>* <sup>∈</sup>*Reqs*<sup>↓</sup> <sup>ψ</sup><sup>f</sup> in <sup>D</sup><sup>↓</sup> (denoted by <sup>φ</sup>↓) where *Reqs*<sup>↓</sup> <sup>⊆</sup> *Reqs*<sup>f</sup> and <sup>D</sup><sup>↓</sup> <sup>⊆</sup> <sup>D</sup>. IBS initializes *Reqs*<sup>↓</sup> and <sup>D</sup><sup>↓</sup> as empty sets (LL:3-4). Then, for the FOL<sup>∗</sup> formula <sup>φ</sup>↓, IBS creates an overand under-approximation query φ<sup>g</sup> (L:7) and φ<sup>⊥</sup> <sup>g</sup> (L:8), respectively (described in Sect. 5.1). IBS first solves the over-approximation query φ<sup>g</sup> by querying an SMT solver (L:9). If φ<sup>g</sup> is unsatisfiable, then φ<sup>↓</sup> is unsatisfiable (Lemma 2), and IBS returns UNSAT (L:10).

If φ<sup>g</sup> is satisfiable, then IBS solves the under-approximation query φ<sup>⊥</sup> <sup>g</sup> (L:11). If φ<sup>⊥</sup> <sup>g</sup> is unsatisfiable, then the current domain <sup>D</sup><sup>↓</sup> is too small, and IBS expands it (LL:12-18). This is because the satisfiability of φ<sup>g</sup> indicates the possibility of finding a satisfying solution after adding at least one of the new relational objects in the solution to φ<sup>g</sup> to D<sup>↓</sup> (Corollary 1). The domain D<sup>↓</sup> is expanded by adding all relational objects o in the minimum (in terms of volume) solution σmin to φ<sup>g</sup> (L:13). To obtain σmin, we follow MaxRes [28] methods: we analyze the UNSAT core of φ<sup>⊥</sup> <sup>g</sup> and incrementally weaken φ<sup>⊥</sup> <sup>g</sup> towards φ<sup>g</sup> (i.e., the weakened query φ⊥- <sup>g</sup> is an "over-under approximation" that satisfies φ<sup>⊥</sup> <sup>g</sup> <sup>⇒</sup> <sup>φ</sup>⊥- <sup>g</sup> ⇒ φg) until a satisfying solution σmin is obtained for the weakened query. However, if the volume of σmin exceeds vb (L:16), then bounded-UNSAT is returned (L:17). UNSAT core-guided domain expansion has also been explored for unfolding the definition of recursive functions [30,37].

On the other hand, if φ<sup>⊥</sup> <sup>g</sup> yields a solution σ, then σ is checked on *Reqs*<sup>f</sup> (L:19). If σ satisfies every ψ<sup>f</sup> in *Reqs*<sup>f</sup> , then σ is returned (L:20). If σ violates some requirements in *Reqs*<sup>f</sup> , then the violating requirement *lesson* is added to *Reqs*<sup>↓</sup> to be considered in the search for the next solutions (L:23).

If IBS does not find a solution or does not return UNSAT, it means that no solution is found because <sup>D</sup><sup>↓</sup> is too small or *Reqs*<sup>↓</sup> are too weak. IBS then restarts with the expanded domain D<sup>↓</sup> or the refined set of requirements *Reqs*↓. It computes the over- and under-approximation queries (φ<sup>g</sup> and φ<sup>⊥</sup> <sup>g</sup> ) again, and repeats the steps. See Sect. 5.3 for an illustration of IBS.

*Remark 1.* IBS finds the optimal solution because it looks for the minimum solution σmin to the over-approximation query φ<sup>g</sup> (L:13) and uses it for domain expansion (L:15). However, looking for σmin adds cost. If solution optimality is not required, IBS can be configured to heuristically find a solution σ to φ<sup>g</sup> such that vol(σ) ≤ vb. The *greedy best-first* search (gBFS) finds a solution to φ<sup>g</sup> that minimizes the number of relational objects that are not already in D↓, and then uses it to expand <sup>D</sup>↓. We configured a non-optimal version of IBS (nop) that uses gBFS heuristics and evaluated its performance in Sect. 6.

#### **5.3 Illustration of IBS**

Suppose a data collection centre (DCC) *collect*s and *access*es personal data information with two requirements: *req*1: data value can only be updated after having been collected or last updated for more than a week (168 hours); and *req*2: data can only be accessed if has been collected or updated within a week (168 hours). The signature S*data* for DCC contains three binary relations (R*data*): *Collect*, *Update*, and *Access*, such that *Collect*(d, v), *Update*(d, v) and *Access*(d, v) hold at a given time point if and only if data at ID d is collected, updated, and accessed with value v at this time point, respectively. The MFOTL formulas for P1, *req*<sup>1</sup> and *req*<sup>2</sup> are shown in Fig. 1. Suppose IBS is invoked to find a counterexample for property P1 (shown in Fig. 1) subject to requirements *Reqs* = {*req*1, *req*2} with the bound vb = 4. IBS translates the requirements and the property to FOL<sup>∗</sup> and initializes *Reqs*<sup>↓</sup> and D<sup>↓</sup> to empty sets. For each iteration, we use φ<sup>g</sup> and φ<sup>⊥</sup> <sup>g</sup> to represent the over- and under-approximation queries computed on LL:7-8, respectively.

1st iteration: D<sup>↓</sup> = ∅ and *Reqs*<sup>↓</sup> = ∅. Three new relational objects are introduced to φ<sup>g</sup> (due to ¬P1): *access*1, *collect*1, and *update*<sup>1</sup> such that: (C1) *access*<sup>1</sup> occurs after *collect*<sup>1</sup> and *update*1;(C2) *access*1.d = *collect*1.d = *update*1.d;(C3) *access*1.v = *collect*1.v ∧*access*1.v = *update*1.v; and (C4) either *collect*<sup>1</sup> or *update*<sup>1</sup> must be in the solution. φ<sup>g</sup> is satisfiable, but φ<sup>⊥</sup> <sup>g</sup> is UNSAT since D<sup>↓</sup> is an empty set. We assume D<sup>↓</sup> is expanded by adding *access*<sup>1</sup> and *update*1.

2nd iteration: D<sup>↓</sup> = {*access*1, *update*1} and *Reqs*<sup>↓</sup> = ∅. The overapproximation φ<sup>g</sup> stays the same, but φ<sup>⊥</sup> <sup>g</sup> becomes satisfiable since *access*<sup>1</sup> and *update*<sup>1</sup> are in D↓. Suppose the solution is σ<sup>4</sup> (see Fig. 2). However, σ<sup>4</sup> violates *req*2, so *req*<sup>2</sup> is added to *Reqs*↓.

3rd iteration: D<sup>↓</sup> = {*access*1, *update*1} and *Reqs*<sup>↓</sup> = {*req*2}. Two new relational objects are introduced in φ<sup>g</sup> (due to *req*2): *collect*<sup>2</sup> and *update*<sup>2</sup> such that (C5) *collect*2.time ≤ *access*1.time ≤ *collect*2.time + 168; (C6) *update*2.time ≤ *access*1.time ≤ *update*2.time+ 168; (C7) *access*1.d = *collect*2.d = *update*2.d; (C8) *access*1.v = *collect*2.v = *update*2.v; and (C9) *collect*<sup>2</sup> or *update*<sup>2</sup> is in the solution. The new φ<sup>g</sup> is satisfiable, but φ<sup>⊥</sup> <sup>g</sup> is UNSAT because *update*<sup>2</sup> ∈ D<sup>↓</sup> and *update*<sup>1</sup> = *update*<sup>2</sup> (C8 conflicts with C3). Therefore, D<sup>↓</sup> needs to be expanded. Assume *collect*<sup>2</sup> is added to D↓.

4th iteration: D<sup>↓</sup> = {*access*1, *update*1, *collect*2} and *Reqs*<sup>↓</sup> = {*req*2}. The overapproximation φ<sup>g</sup> stays the same, but φ<sup>⊥</sup> <sup>g</sup> becomes satisfiable since *collect*<sup>2</sup> is in D↓. Suppose the solution is σ<sup>3</sup> (see Fig. 2). Since σ<sup>3</sup> violates *req*1, *req*<sup>1</sup> is added to *Reqs*↓.

5th iteration: D<sup>↓</sup> = {*access*1, *update*1, *collect*2} and *Reqs*<sup>↓</sup> = {*req*1, *req*2}. The following constraints are added to φ<sup>g</sup> (due to *req*1): (C9) ¬(*update*2.time−168 ≤ *collect*1.time ≤ *update*2.time). Since (C9) conflicts with (C8), (C7) and (C1), *update*<sup>2</sup> cannot be in the solution to φg. The over-approximation φ<sup>g</sup> is satisfiable if *collect*<sup>1</sup> (introduced in the 1st iteration) or *update*<sup>2</sup> (3rd iteration) are in the solution. However, φ<sup>⊥</sup> <sup>g</sup> is UNSAT since D<sup>↓</sup> does not contain *collect*<sup>1</sup> or *update*2. Thus, D<sup>↓</sup> is expanded. Assume *update*<sup>2</sup> is added to D↓.

6th iteration: D<sup>↓</sup> = {*access*1, *update*1, *collect*2, *update*2}, *Reqs*<sup>↓</sup> = {*req*1, *req*2}. The following constraints are added to φ<sup>g</sup> (C10) *update*2.time ≥ *update*1.time + 168 (due to req1) and (C11) *update*2.time ≤ *update*1.time (due to ¬P). Since (C10) conflicts with (C11), *update*<sup>2</sup> cannot be in the solution to φg. Thus, φ<sup>g</sup> is satisfiable only if *collect*<sup>1</sup> is in the solution. However, φ<sup>⊥</sup> <sup>g</sup> is UNSAT because *collect*<sup>1</sup> ∈ D↓. Therefore, D<sup>↓</sup> is expanded by adding *collect*1.

final iteration: D<sup>↓</sup> = {*access*1, *update*1, *collect*2, *update*2, *collect*1} and *Reqs*<sup>↓</sup> = {*req*1, *req*2}. The under-approximation φ<sup>⊥</sup> <sup>g</sup> becomes satisfiable, and yields the solution σ<sup>5</sup> in Fig. 2 which satisfies both *req*<sup>1</sup> and *req*2.

#### **6 Evaluation**

To evaluate our approach, we developed a prototype tool, called LEGOS, that implements our MFOTL bounded satisfiability checking algorithm, IBS (Algorithm 1). It includes Python API for specifying system requirements and MFOTL safety properties. We use pySMT [14] to formulate SMT queries and Z3 [8] to check their satisfiability. The implementation and the evaluation artifacts are included in the supplementary material [12]. In this section, we evaluate the effectiveness of our approach using five case studies, aiming to answer the following research question: *How effective is our approach at determining the bounded satisfiability of MFOTL formulas?* We measure effectiveness in terms of the ability to determine satisfiability (i.e., the satisfying solution and its volume, UNSAT, or bounded UNSAT), and performance, i.e., time and memory usage.

**Cases studies.** The five case studies considered in this paper are summarized below: (1) PHIM (derived from [1,10]): a computer system for keeping track of personal health information with cost management; (2) CF@H<sup>1</sup>: a system for monitoring COVID patients at home and enabling doctors to monitor patient data; (3) PBC [4]: an approval policy for publishing business reports within a company; (4) BST [4]: a banking system that processes customer transactions; and (5) NASA [26]: an automated air-traffic control system design that aims to avoid aircraft collisions.<sup>2</sup> Table 1 gives their statistics. For each case study, we

<sup>1</sup> https://covidfreeathome.org/.

<sup>2</sup> The requirements and properties for the NASA case study are originally expressed in LTL, which is subsumed by MFOTL.

record the number of requirements, relations, relation arguments, and properties, denoted as #reqs, #rels, #args, and #props, respectively. Additionally, Table 1 shows initial configurations used in our experiments, with number of custodians (#c), patients (#p), and data (#d) for PHIM; number of users (#u), and data (#d) for CF@H and PBC; number of employees (#e), customers (#c), transactions (#t), and the maximum amount for a transaction (*sup*) for BST; number of ground-separated (#GSEP) and of the self-separating aircraft (#SSEP) for NASA.


**Table 2.** Performance comparison between IBS

**Table 1.** Case study statistics.

Case studies were selected for (i) the purpose of comparison with existing works (i.e., NASA); (ii) checking whether our approach scales with case studies involving data/time constraints (PBC, BST, PHIM and CF@H); or (iii) evaluating the applicability of our approach with real-word case studies (CF@H and NASA). In addition to prior case studies, we include PHIM and CF@H which have complex data/time constraints. The number of requirements for the five case studies ranges between ten (BST) and 194 (NASA). The number of relations present in the MFOTL requirements ranges from three (BST) to 28 (CF@H), and the number of arguments in these relations ranges from 1 (PHM, PBC, and BST) to 79 (NASA).

**Experimental setup.** Given a set of requirements, data constraints and properties of interest for each case study, we measured the run-time (time) and peak memory usage (mem.) of performing bounded satisfiability checking of MFOTL properties, and the volume vol<sup>σ</sup> (the number of relational objects) of the solution (σ) with (op) and without (nop) the optimality guarantees (see Remark 1 for finding non-optimal solutions). We conduct two experiments: the first one evaluates the efficiency and scalability of our approach; the second one compares our approach with satisfiability checking. Since there is no existing work for checking MFOTL satisfiability, we compared with LTL satisfiability checking because MFOTL subsumes LTL. To study the scalability of our approach, our first experiment considers four different configurations obtained by increasing the data constraints of the case-study requirements. The initial configuration (small) is described in Table 1 and the initial bound is 10. The medium and large configurations are obtained by multiplying the initial data constraints and volume bound

**Table 3.** Run-time performance for four case studies and 18 properties. We record the outcome (out.) of the algorithm with (op) or without (nop) the optimal solution guarantee: UNSAT (U), bounded-UNSAT (b-U), or the volume of the counterexample σ (a natural number, corresponding to volσ). We consider four different configurations: small (see Tab. 6), medium (x10), big (x100), and unbounded (∞) data domain constraints and volume bound. Volume differences between op and nop are bolded.


by ten and hundred, respectively. The last (unbounded) configuration does not bound either the data domain or the volume. As we noted earlier in Sect. 4, the purpose of adding data constraints is to avoid unrealistic counterexamples. For example, the NASA case study uses a data set for specifying the possible system control modes and uses data ranges to restrict the possible measures from the aircraft (e.g., aircraft's trajectory). In the other case studies, data constraints are realistic data ranges (e.g., a patient's account balance should be non-negative). To study the performance of our approach relative to existing work, our second experiment considers two configurations of the NASA case study verified in [24] using the state-of-the-art symbolic model checker nuXmv [6] <sup>3</sup>. We compare our approach's result against the reproduced result of nuXmv verification. For both experiments, we report the analysis outcomes, i.e., the volume of the satisfying solution (if one exists), UNSAT, or bounded UNSAT; and performance, i.e., time and memory usage. The experiments were conducted using a ThinkPad X1 Carbon with an Intel Core i7 1.80 GHz processor, 8 GB of RAM, and running 64-bit Ubuntu GNU/Linux 8.

**Results of the first experiment** are summarized in Table 3. Out of the 72 trials, our approach found 31 solutions. It also returned five bounded-UNSAT answers, and 36 UNSAT answers. The results show that our approach is effective in checking satisfiability of case studies with different sizes. More precisely,

<sup>3</sup> LEGOS solved all configurations from the NASA case study; see the results in [12]. For comparison, we report only on the configurations that are explicitly supported by nuXmv.

we observe that it takes under three seconds to return UNSAT and between .04 seconds (bs2:medium) and 32 min (ph7:medium:op) to return a solution. In the worst case, op took 32 min for checking ph<sup>7</sup> where the property and requirements contain complex constraints. Effectively, ph<sup>7</sup> requires the deletion of data stored at id 10, while the cost of deletion increases over time under PHIM's requirements. Therefore, the user has to perform a number of actions to obtain a sufficient balance to delete the data. Additionally, each action that increases the user's balance has its own preconditions, effects, and time cost, making the process of choosing the sequence of actions to meet the increasing deletion cost non-trivial.

We can see a difference in time between cf2 'big' and 'unbounded', this is because the domain expansion followed two different paths and one produces significantly easier SMT queries. Since our approach is guided by counterexamples (i.e., the path is guided by the solution from the SMT solver (Algorithm1-L:13)), our approach does not have direct control over the exact path selection. In future work, we aim to add optimizations to avoid/backtrack from hard paths.

We observe that the data-domain constraint and volume bound used in different configurations do not affect the performance of IBS when the satisfiability of the instances does not depend on them, which is the case for all the instances except for ph<sup>6</sup>−<sup>7</sup>:small, cf<sup>1</sup>−<sup>3</sup>:small, and bs3:small. As mentioned in Sect. 4, the data-domain constraint ensures that satisfying solutions have realistic data values. For ph1−ph4, the bound used in the small, medium and large configurations creates additional constraints in the SMT queries for each relational object, and therefore results in a larger peak memory than the unbounded configuration.

Finding the optimal solution (by op), in contrast to finding a satisfying solution without the optimal guarantee (by nop), imposes a substantial computational cost while rarely achieving a volume reduction. The non-optimal heuristic nop often outperformed the optimal approach for satisfiable instances. Out of 31 satisfiable instances, nop solved 12 instances 3 times faster, 10 instances 10 times faster and seven instances 20 times faster than op. Compared to the nonoptimal solution, the optimal solution reduced the volume for only two instances: ph7:large and ph7:unbounded by one (3%) and three (9%), respectively. On all other satisfying instances, op and nop both find the optimal solutions. When there is no solution, both op and nop are equally efficient.

**Results of the second experiment** are summarized in Table 2. Our approach and nuXmv both correctly verified that all six properties were UNSAT in both NASA configurations. We observe that the performance of our approach is comparable to nuXmv for the first configuration with .10 to .20 seconds of difference on average. Yet, for the second configuration, our approach terminates in less than 0.20 seconds and nuXmv takes 1.50 seconds on average. We conclude that our approach's performance is comparable to that of nuXmv for LTL satisfiability checking even though our approach is not specifically designed for LTL.

**Summary.** In summary, we have demonstrated that our approach is effective at determining the bounded satisfiability of MFOTL formulas using case studies with different sizes and from different application domains. When restricted to LTL, our approach is at least as effective as the existing work on LTL satisfiability checking which uses a state-of-the-art symbolic model checker. Importantly, IBS can often determine satisfiability of instances without reaching the volume bound, and its performance is not sensitive to the data domain. On the other hand, IBS's optimal guarantee imposes a substantial computational cost while rarely achieving a volume reduction over non-optimal solutions obtained by nop. We need to investigate the trade-off between optimality and efficiency, as well as evaluate the performance of IBS on a broader range of benchmarks.

## **7 Related Work**

Below, we compare with the existing approaches that address the satisfiability checking of temporal logic and first-order logic.

**Satisfiability checking of temporal properties.** Temporal logic satisfiability checking has been studied for the verification of system designs. Satisfiability checking for Linear Temporal Logic (LTL) can be performed by reducing the problem to model checking [35], by applying automata-based techniques [25], or by SAT solving [5,21–23]. Satisfiability checking for metric temporal logic (MTL) [32] and its variants, e.g., mission-time LTL [24] and signal temporal logic [2], has been studied for the verification of real-time system designs. These existing techniques are inadequate for our needs: LTL and MTL cannot effectively capture quantified data constraints commonly used in legal properties. MFOTL does not have such a limitation as it extends MTL and LTL with firstorder quantifiers, thereby supporting the specification of data constraints.

**Finite model finding for first-order logic.** Finite-model finders [7,33] look for a model by checking universal quantifiers exhaustively over candidate models with progressively larger domains; we look for finite-volume solutions using a similar approach. On the other hand, we consider an explicit bound on the volume of the solution, and are able to find the solution with the smallest volume. SMT solvers support quantifiers with quantifier instantiation heuristics [16,17] such as E-matching [9,27] and conflict-based instantiation [34]. Quantifier instantiation heuristics are nonetheless generally incomplete, whereas, in our approach, we obtain completeness by bounding the volume of the satisfying solution.

#### **8 Conclusion**

In this paper, we proposed an incremental bounded satisfiability checking approach, called IBS, aimed to enable verification of legal properties, expressed in MFOTL, against system requirements. IBS first translates MFOTL formulas to first-order logic with relational objects (FOL∗) and then searches for a satisfying solution to the translated FOL<sup>∗</sup> formulas in a bounded search space by deriving over- and under-approximating SMT queries. IBS starts with a small search space and incrementally expands it until an answer is returned or until the bound is exceeded. We implemented IBS on top of the SMT solver Z3. Experiments using five case studies showed that our approach is effective for identifying errors in requirements from different application domains. Our approach is currently limited to verifying safety properties. In the future, we plan to extend our approach so that it can handle a broader spectrum of property types, including liveness and fairness. IBS's performance and scalability depend crucially on how the domain of relational objects is maintained and expanded. As future work, we would like to study the effectiveness of other heuristics to improve IBS's scalability (e.g., random restart and expansion with domain-specific heuristics). We also aim to study how to learn/infer MFOTL properties during search to further improve the efficiency of our approach.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Formula Normalizations in Verification**

Simon Guilloud(B) , Mario Bucev, Dragana Milovanˇcevi´c , and Viktor Kunˇcak

School of Computer and Communication Sciences, EPFL, Station 14, 1015 Lausanne, Switzerland

{simon.guilloud,mario.bucev,dragana.milovancevic,viktor.kuncak}@epfl.ch

**Abstract.** We apply and evaluate polynomial-time algorithms to compute two different normal forms of propositional formulas arising in verification. One of the normal form algorithms is presented for the first time. The algorithms compute normal forms and solve the word problem for two different subtheories of Boolean algebra: orthocomplemented bisemilattice (OCBSL) and ortholattice (OL). Equality of normal forms decides the word problem and is a sufficient (but not necessary) check for equivalence of propositional formulas. Our first contribution is a quadratic-time OL normal form algorithm, which induces a coarser equivalence than the OCBSL normal form and is thus a more precise approximation of propositional equivalence. The algorithm is efficient even when the input formula is represented as a directed acyclic graph. Our second contribution is the evaluation of OCBSL and OL normal forms as part of a verification condition cache of the Stainless verifier for Scala. The results show that both normalization algorithms substantially increase the cache hit ratio and improve the ability to prove verification conditions by simplification alone. To gain further insights, we also compare the algorithms on hardware circuit benchmarks, showing that normalization reduces circuit size and works well in the presence of sharing.

#### **1 Introduction**

Algorithms and techniques to solve and reduce formulas in propositional logic (and its generalizations) are a major field of study. They have prime relevance in SAT and SMT solving algorithms [2,8,31], in optimization of logical circuit size in hardware [25], in interactive theorem proving where propositional variables can represent assumptions and conclusions of theorems [23,35,43], for decision procedures in automated theorem proving [13,26,37,41,42], and in every subfield of formal verification in general [27]. The propositional problem of satisfiability is NP-complete, whereas validity and equivalence are coNP-complete. While heuristic techniques give useful results in practice, in this paper we investigate guaranteed worst-case polynomial-time deterministic algorithms. Such algorithms can serve as building blocks of more complex functionality, without creating an unpredictable dependency.

Recently, researchers proposed the use of certain non-distributive complemented lattice-like structures to compute normal forms of formulas [20]. These results appear to have a practical potential, but they have not been experimentally evaluated. Moreover, the proposed completeness characterization is in

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 398–422, 2023. https://doi.org/10.1007/978-3-031-37709-9\_19

terms of "orthocomplemented bisemilattices" (OCBSL), which have a number of counterintuitive properties. For example, the structure is not a lattice and does not satisfy the absorption laws <sup>x</sup> <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) = <sup>x</sup> and <sup>x</sup> <sup>∨</sup> (<sup>x</sup> <sup>∧</sup> <sup>y</sup>) = <sup>x</sup>. As a consequence, there is no natural semantic ordering on formulas corresponding to implication, with <sup>x</sup> <sup>∧</sup> <sup>y</sup> <sup>=</sup> <sup>x</sup> and <sup>x</sup> <sup>∨</sup> <sup>y</sup> <sup>=</sup> <sup>y</sup> inducing two different relations.

Inspired by these limitations, we revisit results on *lattices*, which are much better behaving structures. We strengthen the OCBSL structure with the absorption law to consider the class of *ortholattices*, as summarized in Table 1. Ortholattices (OL) have a natural partial order for which <sup>∧</sup>,<sup>∨</sup> act as the greatest lower bound and the least upper bound. They also satisfy de Morgan's law, allowing the elimination of one of the connectives in terms of the other two. On the other hand, ortholattices do not, in general, satisfy the distributivity law, which sets them apart from Boolean algebras.

We present a new algorithm that computes a normal form for OL in quadratic time. The normal form is strictly stronger than the one for OCBSL: there are terms in the language {∧,∨,¬} that are distinct in OCBSL, but are equal in OL. Checking equality of OL normal forms thus more precisely approximates propositional formula equivalence. Both normal forms can be thought of as strengthening of the negation normal form.

**Table 1.** Laws of algebraic structures with signature (*S,*∧*,* <sup>∨</sup>*,* <sup>0</sup>*,* <sup>1</sup>*,* <sup>¬</sup>). Structures satisfying laws L1–L8 and L1'–L8' were called *orthocomplemented bisemilattices* (OCBSL) in [20]. Those OCBSL that additionally satisfy L9 and L9' are *ortholattices* (OL).

$$\begin{array}{llll} \text{L1:} & x \lor y = y \lor x & \text{L1':} & x \land y = y \land x\\ \text{L2:} & x \lor (y \lor z) = (x \lor y) \lor z & \text{L2':} & x \land (y \land z) = (x \land y) \land z\\ \text{L3:} & x \lor x = x & \text{L3':} & x \land x = x\\ \text{L4:} & x \lor 1 = 1 & \text{L4':} & x \land 0 = 0\\ \text{L5:} & x \lor 0 = x & \text{L5':} & x \land 1 = x\\ \text{L6:} & \neg x = x & \text{L6':} & \text{same as L6}\\ \text{L7:} & x \lor \neg x = 1 & \text{L7':} & x \land \neg x = 0\\ \text{L8:} & \neg (x \lor y) = \neg x \land \neg y & \text{L8':} & \neg (x \land y) = \neg x \lor \neg y\\ \text{L9:} & x \lor (x \land y) = x & \text{L9':} & x \land (x \lor y) = x \end{array}$$

**Example 1.** Consider the formula <sup>x</sup> <sup>∧</sup> (<sup>y</sup> <sup>∨</sup> <sup>z</sup>). An OCBSL algorithm finds it equivalent to

$$x \land \neg(\neg y \land \neg z) \land x$$

but it will consider these two formulas non-equivalent to

$$x \land (u \lor x) \land (y \lor z)$$

The OL algorithm will identify the equivalence of all three formulas, thanks to the laws (L9, L9'). It will nonetheless consider them non-equivalent to

$$(x \land y) \lor (x \land z)$$

which a complete but exponential worst-case time algorithm for Boolean algebra equalities, such as one implemented in SAT solvers, will identify as equivalent.

A major practical question is the usefulness of such O(n log(n)2) (OCBSL) and O(n2) (OL) algorithms in verification. Are they as predictably efficient as the theoretical analysis suggests? What benefits do they provide as a component of verification tools? To answer these questions, we implement both OCBSL and OL algorithms on directed acyclic graph representations of formulas. We deploy the algorithms in tools that manipulate formulas, most notably verification conditions in a program verifier, as well as combinational Boolean circuits.

**Contributions.** We make the following contributions:

	- behavior on randomly generated formulas;
	- scalability evaluation on normalizing circuits of size up to 10<sup>8</sup> gates;
	- normalization for simplification and caching of verification conditions when using the Stainless verifier, with both hard benchmarks (such as a compression algorithm) and collections of student submissions for programming assignments.

We show that OCBSL and OL both have notable potential in practice.

#### **1.1 Related Work**

The overarching perspective behind our paper is understanding polynomial-time normalization of boolean algebra terms. Given (co)NP-hardness of problems related to Boolean algebras, we look at subtheories given by a subset of Boolean algebra axioms, including structures such as lattices. Lattices themselves have many uses in program abstraction, including abstract interpretation [11] and model checking [14,18]. The theory of the word problem for lattices has been studied already by Whitman [44], who proposed a quadratic solution for the word problem for free *lattices*. Lattices alone do not incorporate the notion of a complement (negation). Whitman's algorithm has been adapted and extended to finitely presented lattices [17] and other variants, and then to free ortholattices by Bruns [7]. We extend this last result to not only decide equality, but also to compute a *normal form* for free ortholattices and to *circuit* (DAG) representation of terms. An efficient normal form does not follow from an efficient equivalence checking, as there are many formulas in the same equivalence class. Normal form is particularly useful in applications such as formula caching, which we evaluate in Sect. 6. For a weaker theory of OCBSL, the normal form algorithm was introduced in [20], without any experimental evaluation. The theory of ortholattices, even if it adds only one more axiom, is notably stronger and better understood. The underlying lattice structure makes it possible to draw on the body of work on using lattices to abstract systems and enable algorithmic verification. The support for graphs (instead of only terms) as a representation is of immense practical relevance, because expanding circuits into trees without the use of auxiliary variables creates structures of astronomical size (Sect. 6).

A notable normal form that decides equality for propositional logic (thus also accounting for the distributivity law) are reduced ordered binary decision diagrams (ROBDDs) [9]. ROBDDs are of great importance in verification, but can be exponential in the size of the initial formula. Circuit synthesis and verification tools such as ABC [6] use SAT solvers to optimize sub-circuits [45], which is an approach to choose a trade-off between the completeness and cost of exponentialtime algorithm. Boolean algebras are in correspondence with boolean rings, which replace the least upper bound operation ∨ with the symmetric difference <sup>⊕</sup> (defined as (p∧ ¬q)∨(¬p∧q) and satisfying <sup>x</sup> <sup>⊕</sup> <sup>x</sup> = 0, corresponding to the *exclusive or* in the two-element case). There have been proposals to exploit the boolean ring structure in verification [12]. Polynomials over rings can also be used to obtain a normal form, but the polynomial canonical forms that we are aware of are exponential-sized. SMT solvers [2,34] extend SAT solvers, which makes them worst-case exponential (at best). We expect that our approach and algorithms could be used for preprocessing or representation, especially in nonclausal variants of SMT solvers [24,39]. In our evaluation, we apply formula normal forms to the problem of caching of verification conditions. Caching is often used in verification tools, including Dafny [28] and Stainless [22]. Our caching works on formulas and preserves the API of a constraint solver. It is thus fine grained and can be added to a program verifier or analyzer, regardless of whether it uses any other, domain-specific, forms of caching [29].

## **2 Preliminaries**

We present definitions and results necessary for the presentation of the ortholattice (OL) normal form algorithm. We assume familiarity with term rewriting and representation of terms as trees and directed acyclic graphs [15,20]. We use first-order logic with equality (whose symbol is =). We write <sup>A</sup> <sup>|</sup><sup>=</sup> <sup>F</sup> to mean that a first-order logic formula F is a consequence of (thus provable from) the set of formulas A.

**Definition 1 (Terms).** *Consider an algebraic signature* <sup>S</sup>*. We use* <sup>T</sup>*S*(X) *to denote the set of terms over* S *with variables in* X *(typically an arbitrary countably infinite set, unless specified otherwise). Terms are constructed inductively as trees. Leaves are labeled with constant symbols or variables. Nodes are labeled with function symbols. If the label of a node is a commutative function, the children of the node are considered as a set (non-ordered) and otherwise as a list (ordered). We assume that commutative symbols are denoted as such in the signature.*

**Definition 2 (The Word Problem).** *Consider an algebraic signature* S *and a set of equational axioms* E *on* S *(for example the theory of lattices or ortholattices). The word problem for* E *is the problem of determining, given two terms* <sup>t</sup><sup>1</sup> *and* <sup>t</sup><sup>2</sup> ∈ T*S*(X)*, whether* <sup>E</sup> t<sup>1</sup> = t2*.*

**Definition 3 (Normal Form).** *Consider an algebraic signature* S *and a set of equational axioms* <sup>E</sup> *on* <sup>S</sup>*. A function* <sup>f</sup> : <sup>T</sup>*S*(X) → T*S*(X) produces a normal form for <sup>E</sup> *iff:* <sup>∀</sup>t1, t<sup>2</sup> ∈ T*S*(X)*,* <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> *is equivalent to* <sup>f</sup>(t1) = <sup>f</sup>(t2)*.*

For <sup>Z</sup> an arbitrary non-empty set and (∼) <sup>⊆</sup> <sup>Z</sup> <sup>×</sup> <sup>Z</sup> an equivalence relation on <sup>X</sup> we use a common notation: if <sup>x</sup> <sup>∈</sup> <sup>Z</sup> then [x] <sup>∼</sup> <sup>=</sup> {<sup>y</sup> <sup>∈</sup> <sup>Z</sup> <sup>|</sup> <sup>x</sup> <sup>∼</sup> <sup>y</sup>}. Let <sup>Z</sup>*/*<sup>∼</sup> <sup>=</sup> {[x] <sup>∼</sup> <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>Z</sup>}.

We now briefly review key concepts of free algebras. Let S be a signature and E be an equational theory over this signature. Consider an equivalence relation on terms <sup>p</sup> <sup>∼</sup>*<sup>E</sup>* <sup>q</sup> ⇐⇒ (<sup>E</sup> <sup>|</sup><sup>=</sup> <sup>p</sup> <sup>=</sup> <sup>q</sup>), and note that <sup>T</sup>*S*(X)*/*∼*<sup>E</sup>* is itself an E-algebra. A **freely generated** E**-algebra**, denoted F*E*(X), is an algebra generated by variables in <sup>X</sup> and isomorphic to <sup>T</sup>*S*(X)*/*∼*<sup>E</sup>* , i.e. in which *only* the laws of all E-algebra hold. There is always a homomorphism from a freely generated E-algebra to any other E-algebra over X.

The set of terms <sup>T</sup>*S*(X) is also called the **term algebra** over <sup>S</sup>. It is the algebra of all terms that contains no identity other than syntactic equality. Given a (possibly free) algebra A over S and generated by X, there is a natural homomorphism <sup>κ</sup>*A*, in a sense an evaluation function, from <sup>T</sup>*S*(X) to <sup>A</sup>. The word problem for a theory <sup>E</sup> then consists in, given p, q ∈ T*S*(X), deciding if <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>p</sup> <sup>=</sup> <sup>q</sup>, that is, κ*<sup>F</sup>E*(t1) = κ*<sup>F</sup>E*(t2).

In the sequel, we continue to use = to denote the equality symbol inside formulas as well as the usual identity of mathematical objects. We use == to specifically denote the computer-performed operation of *structural* equality on trees and sets, whereas === denotes *reference* equality of objects, meaning that a === b if and only if a and b denote the same object in memory. The distinction between == and === is relevant because == is a larger relation but may take linear or worse time to compute, whereas we assume === is constant time.

**Lattices.** Lattices [4] are well-studied structures with signature (∧,∨) satisfying laws L1–L3, L9, L1'–L3' and L9' from Table 1. In particular, they do not have a complement operation, ¬, in the signature. Lattices can also be viewed as a special kind of partially ordered sets with an order relation defined by (<sup>a</sup> <sup>≤</sup> <sup>b</sup>) ⇐⇒ (<sup>a</sup> <sup>∧</sup> <sup>b</sup> <sup>=</sup> <sup>a</sup>), where the last condition is also equivalent to (<sup>a</sup> <sup>∨</sup> <sup>b</sup> <sup>=</sup> <sup>b</sup>), given the axioms of lattices. When applied to two-element Boolean algebras, this order relation corresponds to logical implication in propositional logic. A **bounded lattice** is a lattice with maximal and minimal elements 1 and 0. The word problem for *lattices* has been solved by Whitman [44] through an algorithm to decide the ≤ relation and is based on the following properties of free lattices:

$$\begin{array}{l} \text{(1)} \ s\_{1} \vee \ldots \vee s\_{m} \le t \iff \forall i. s\_{i} \le t \\ \text{(2)} \ s \le t\_{1} \wedge \ldots \wedge t\_{n} \iff \forall j. s \le t\_{j} \\ \text{(3)} \ s\_{1} \wedge \ldots \wedge s\_{m} \le y \iff \exists i. s\_{i} \le y \\ \text{(4)} \ x \le t\_{1} \vee \ldots \vee t\_{n} \iff \exists j. x \le t\_{j} \end{array}$$

$$\begin{array}{l} s \le t \iff (\exists i. s\_{i} \le t) \vee (\exists j. s \le t\_{j}), \\ \text{with } s = (s\_{1} \wedge \ldots \wedge s\_{m}) \text{ and } \ t = (t\_{1} \vee \ldots \vee t\_{n}) \end{array} \tag{w}$$

where x and y denote variables and s and t terms. The first four properties are direct consequences of the axioms of lattices. (w) above is *Whitman property* and holds in free lattices (not in all lattices). Applying the above rules recursively decides the ≤ relation.

**Orthocomplemented Bisemilattices (OCBSL).** OCBSL [20] are also a weakening of Boolean algebras (and, in fact, a subtheory of ortholattices). They satisfy laws L1–L8, L1'–L8' but not the absorption law (L9, L9'). This implies in particular that OCBSL do not have a canonical order relation as lattices do, but rather have two, in general distinct, relations:

$$\begin{array}{l} a \le b \iff a \land b = a\\ a \sqsubseteq b \iff a \lor b = b \end{array}$$

If we add absorption axioms, <sup>a</sup> <sup>∧</sup> <sup>b</sup> <sup>=</sup> <sup>a</sup> implies <sup>a</sup> <sup>∨</sup> <sup>b</sup> = (<sup>a</sup> <sup>∧</sup> <sup>b</sup>) <sup>∨</sup> <sup>b</sup> <sup>=</sup> <sup>b</sup> (and dually), so the structure becomes a lattice. The algorithm presented in [20] does not rely on lattice properties. Instead, it is proven that the axioms of OCBSL can be extended to a term rewriting system which is confluent and terminating, and hence admits a normal form. Using variants of algorithms on labelled trees to handle commutativity, this normal form can be computed in quasilinear time <sup>O</sup>(<sup>n</sup> log<sup>2</sup>(n)). In contrast, in the case of free *lattices*, there exists no confluent and terminating term rewriting system [16].

### **3 Deriving an Ortholattice Normal Form Algorithm**

Ortholattices [3, Chapter II.1] are structures satisfying laws L1–L9, L1'–L9' of Table 1. An ortholattice (OL) need not be a Boolean algebra, nor an orthomodular lattice; the smallest example of such OL is "Benzene" (O6), with elements {0, a, b,¬b,¬a, <sup>1</sup>} where <sup>a</sup> <sup>≤</sup> <sup>b</sup> [5]. The word problem for free ortholattices, which checks if a given equation is true, has been shown to be solvable in quadratic time by Bruns [7]. In this section, we go further by presenting an efficient computation of *normal forms*, which reduces the word problem to syntactic equality. In addition, normal forms can be efficiently used for formula simplification and caching, unlike equality procedure itself.

**Definition 4.** *For a set of variables* X*, we define a disjoint set of the same cardinality* <sup>X</sup> *with a bijective function* (·) : <sup>X</sup> → <sup>X</sup> *. Denote by* L *the theory of bounded lattices and OL the theory of ortholattices. Define* F*L,* F*OL to be their free lattices and* T*<sup>L</sup> and* T*OL to be the sets of terms over their respective signature. Define* <sup>≤</sup>*<sup>L</sup> as the relation on* <sup>T</sup>*<sup>L</sup> such that* <sup>s</sup> <sup>≤</sup>*<sup>L</sup>* <sup>t</sup> ⇐⇒ <sup>κ</sup>*<sup>F</sup>L*(s) <sup>≤</sup> <sup>κ</sup>*<sup>F</sup>L*(t) *and* <sup>≤</sup>*OL analogously by* <sup>s</sup> <sup>≤</sup>*OL* <sup>t</sup> ⇐⇒ <sup>κ</sup>*<sup>F</sup>OL*(s) <sup>≤</sup> <sup>κ</sup>*<sup>F</sup>OL*(t)*, where* <sup>κ</sup> *denotes natural homomorphisms as introduced in the previous section.*

Note: <sup>p</sup> <sup>≤</sup>*OL* <sup>q</sup> ⇐⇒ (E*OL* <sup>|</sup>= (<sup>p</sup> <sup>∧</sup> <sup>q</sup> <sup>=</sup> <sup>q</sup>)) where <sup>E</sup>*OL* is the set of axioms of Table 1.

## **3.1 Deciding** *≤OL* **by Reduction to Bounded Lattices**

We consider <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ) as a subset of <sup>T</sup>*OL*(X) via the injective inclusion on variables mapping <sup>x</sup> → <sup>x</sup> and <sup>x</sup> → ¬x. We also define a function <sup>δ</sup> : <sup>T</sup>*OL*(X) <sup>→</sup> <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ) as transformation into negation normal form, using laws L6 (double negation elimination), L8 and L8' (de Morgan's laws).

We define a set <sup>R</sup> ⊆ T*L*(<sup>X</sup> <sup>∪</sup>X ) of terms reduced with respect to the contradiction laws (L7 and L7'). These imply that, e.g., given a term <sup>a</sup>∨b, if <sup>¬</sup><sup>b</sup> <sup>≤</sup> (a∨b), then from as <sup>b</sup> <sup>≤</sup> <sup>a</sup> <sup>∨</sup> <sup>b</sup>, we have 1 = <sup>b</sup> ∨ ¬<sup>b</sup> <sup>≤</sup> (<sup>a</sup> <sup>∨</sup> <sup>b</sup>). The following inductive definition induces an algorithm to check <sup>x</sup> <sup>∈</sup> <sup>R</sup>, meaning that such reductions do not apply inside x:

> <sup>0</sup>, <sup>1</sup>, x, x <sup>∈</sup> <sup>R</sup> (for <sup>x</sup> <sup>∈</sup> <sup>X</sup>) <sup>a</sup> <sup>∨</sup> <sup>b</sup> <sup>∈</sup> <sup>R</sup> ⇐⇒ <sup>a</sup> <sup>∈</sup> R, b <sup>∈</sup> R, δ(¬a) -*<sup>L</sup>* <sup>a</sup> <sup>∨</sup> b, δ(¬b) -*<sup>L</sup>* <sup>a</sup> <sup>∨</sup> <sup>b</sup> <sup>a</sup> <sup>∧</sup> <sup>b</sup> <sup>∈</sup> <sup>R</sup> ⇐⇒ <sup>a</sup> <sup>∈</sup> R, b <sup>∈</sup> R, δ(¬a) *<sup>L</sup>* <sup>a</sup> <sup>∧</sup> b, δ(¬b) *<sup>L</sup>* <sup>a</sup> <sup>∧</sup> <sup>b</sup>

Above, <sup>≤</sup>*<sup>L</sup>* is the order relation on lattices, <sup>x</sup> <sup>≥</sup>*<sup>L</sup>* <sup>y</sup> denotes <sup>y</sup> <sup>≤</sup>*<sup>L</sup>* <sup>x</sup>, and -*L*, *<sup>L</sup>* are the negations of those conditions: x -*<sup>L</sup>* <sup>y</sup> iff not <sup>x</sup> <sup>≤</sup>*<sup>L</sup>* <sup>y</sup>, whereas <sup>x</sup> *<sup>L</sup>* <sup>y</sup> iff not <sup>y</sup> <sup>≤</sup>*<sup>L</sup>* <sup>x</sup>.

We also define <sup>β</sup> : <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ) <sup>→</sup> <sup>R</sup> by:

$$\begin{cases} \beta(0) = 0, \beta(1) = 1, \beta(x) = x, \beta(x') = x' \text{ (for } x \in X) \\ \beta(a \lor b) = \begin{cases} \beta(a) \lor \beta(b) & \text{if } \beta(a) \lor \beta(b) \in R \\ 1 & \text{otherwise} \end{cases} \\ \beta(a \land b) = \begin{cases} \beta(a) \land \beta(b) & \text{if } \beta(a) \land \beta(b) \in R \\ 0 & \text{otherwise} \end{cases} \end{cases}$$

**Example 2.** We have <sup>β</sup>((x∧¬y)∨(¬x∨y)) = 1 because <sup>δ</sup>(¬(x∧¬y)) = <sup>¬</sup>x∨<sup>y</sup> and <sup>¬</sup><sup>x</sup> <sup>∨</sup> <sup>y</sup> <sup>≤</sup>*<sup>L</sup>* (<sup>x</sup> ∧ ¬y) ∨ ¬<sup>x</sup> <sup>∨</sup> <sup>y</sup>.

Note that it is generally not sufficient to check only for <sup>δ</sup>(¬a) -*<sup>L</sup>* b for larger examples. In particular, if <sup>δ</sup>(¬a) is itself a conjunction, by Whitman's property, the condition <sup>δ</sup>(¬a) - (<sup>a</sup> <sup>∨</sup> <sup>b</sup>) is not in general equivalent to having either <sup>δ</sup>(¬a) -*<sup>L</sup>* <sup>b</sup> or <sup>δ</sup>(¬a) -*<sup>L</sup>* a.

We next reformulate the theorem from Bruns [7]. A key construction from the proof is the following Lemma.

**Lemma 1.** <sup>R</sup>*/*∼*<sup>L</sup> is an ortholattice isomorphic to* <sup>F</sup>*OL*(X)*.*

**Theorem 1.** *Let* s, t ∈ T*OL*(X)*. Then,* <sup>s</sup> <sup>≤</sup>*OL* <sup>t</sup> ⇐⇒ <sup>β</sup>(δ(s)) <sup>≤</sup>*<sup>L</sup>* <sup>β</sup>(δ(t))*.*

*Proof.* We sketch and adapt the original proof. Intuitively, computing <sup>β</sup>(δ(s)) <sup>≤</sup>*<sup>L</sup>* <sup>β</sup>(δ(t)) should be sufficient to compute the <sup>≤</sup>*OL*relation: <sup>δ</sup> reduces terms to normal forms modulo rules L6 (double negation elimination) and L8, L8' (De Morgan's Law), and then β takes care of rule L7 (contradiction). The only rules left are rules from (bounded) lattices, which should be dealt with by <sup>≤</sup>*L*. From Lemma 1, the fact that <sup>β</sup> factors in the evaluation function <sup>κ</sup>*<sup>F</sup>OL* (i.e. is equivalence preserving) and properties of free algebras, it can be shown that <sup>κ</sup>*FOL* <sup>=</sup> <sup>γ</sup> ◦ <sup>N</sup>∼*<sup>L</sup>* ◦ <sup>β</sup> ◦ <sup>δ</sup>, where <sup>N</sup>∼*<sup>L</sup>* (x)=[x] <sup>∼</sup>*<sup>L</sup>* , and <sup>γ</sup> : <sup>R</sup>*/*∼*<sup>L</sup>* <sup>→</sup> <sup>F</sup>*OL*(X) is an isomorphism. Hence

$$\kappa\_{Fol}(s) \le \kappa\_{Fol}(t) \iff \beta(\delta(s))\_{/\sim\_L} \le \beta(\delta(t))\_{/\sim\_L}$$

which is equivalent to <sup>s</sup> <sup>≤</sup>*OL* <sup>t</sup> ⇐⇒ <sup>β</sup>(δ(s)) <sup>≤</sup>*<sup>L</sup>* <sup>β</sup>(δ(t)).

#### **3.2 Reduction to Normal Form**

To obtain a normal form for <sup>T</sup>*OL*(X), we will compose <sup>δ</sup> and <sup>β</sup> with a normal form function for <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ). A disjunction <sup>a</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>a</sup>*<sup>m</sup>* (and dually for a conjunction) is in normal form for ≤*<sup>L</sup>* if and only if the following two properties hold [15, p. 17]:

1. if <sup>a</sup>*<sup>i</sup>* = (a*i*<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>a</sup>*in*), then for all <sup>j</sup>, <sup>a</sup>*ij* ≤ <sup>a</sup> 2. (a1, ..., a*n*) forms an antichain (if <sup>i</sup> <sup>=</sup> <sup>j</sup> then <sup>a</sup>*<sup>i</sup>* a*<sup>j</sup>* )

We now show how to reduce a term in R so that it satisfies both properties using function ζ that enforces property 1, and then η that additionally enforces property 2. The functions operate dually on ∧ and ∨; we specify them only on ∨ cases for brevity.

**Enforcing Property 1.** Define <sup>ζ</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> recursively such that:

$$\zeta(a\_1 \lor \dots \lor a\_m) = \begin{cases} \zeta(a\_1 \lor \dots \lor a\_{ij} \lor \dots \lor a\_m) & \text{if } a\_i = (a\_{i1} \land \dots \land a\_{in}) \\ & \text{and } a\_{ij} \le\_L a\_1 \lor \dots \lor a\_m \\ \zeta(a\_1) \lor \dots \lor \zeta(a\_m) & \text{otherwise} \end{cases}$$

(dually for <sup>∧</sup>). It follows that <sup>s</sup> <sup>∼</sup>*<sup>L</sup>* <sup>ζ</sup>(s) for every term <sup>s</sup> because <sup>a</sup>*ij* <sup>≤</sup>*<sup>L</sup>* <sup>a</sup><sup>1</sup> <sup>∨</sup> ...∨a*<sup>m</sup>* implies <sup>a</sup><sup>1</sup> <sup>∨</sup>...∨a*<sup>m</sup>* <sup>=</sup> <sup>a</sup><sup>1</sup> <sup>∨</sup>...∨a*<sup>m</sup>* <sup>∨</sup>a*ij* and <sup>a</sup>*<sup>i</sup>* <sup>∨</sup>a*ij* <sup>=</sup> <sup>a</sup>*ij* by absorption.

**Enforcing Property 2 (Antichain).** Define <sup>η</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> such that

$$\eta(a\_1 \lor \dots \lor a\_m) = \begin{cases} \eta(a\_1 \lor \dots \lor a\_{i-1} \lor a\_{i+1} \lor \dots \lor a\_m) & \text{if } a\_i \le\_L a\_j, i \ne j \\ \eta(a\_1) \lor \dots \lor \eta(a\_m) & \text{otherwise} \end{cases}$$

We have <sup>s</sup> <sup>∼</sup>*<sup>L</sup>* <sup>η</sup>(s) for every term <sup>s</sup> because <sup>a</sup>*<sup>i</sup>* <sup>≤</sup>*<sup>L</sup>* <sup>a</sup>*<sup>j</sup>* means <sup>a</sup>*<sup>i</sup>* <sup>∨</sup> <sup>a</sup>*<sup>j</sup>* <sup>=</sup> <sup>a</sup>*<sup>j</sup>* .

**Example 3.** We have: <sup>η</sup>(ζ( [(<sup>a</sup> <sup>∨</sup> <sup>b</sup>) <sup>∧</sup> (<sup>a</sup> <sup>∨</sup> <sup>c</sup>)] <sup>∨</sup> <sup>b</sup> )) = <sup>η</sup>((<sup>a</sup> <sup>∨</sup> <sup>b</sup>) <sup>∨</sup> <sup>b</sup>) = <sup>a</sup> <sup>∨</sup> <sup>b</sup>. Indeed, the first equality follows from

(<sup>a</sup> <sup>∨</sup> <sup>b</sup>) <sup>≤</sup>*<sup>L</sup>* [(<sup>a</sup> <sup>∨</sup> <sup>b</sup>) <sup>∧</sup> (<sup>a</sup> <sup>∨</sup> <sup>c</sup>)] <sup>∨</sup> <sup>b</sup>

and the second from <sup>b</sup> <sup>≤</sup>*<sup>L</sup>* (<sup>a</sup> <sup>∨</sup> <sup>b</sup>).

Denote by R the subset of R containing the terms satisfying property 1 and R the subset of R of terms satisfying property 2. It is easy to see that ζ is actually <sup>R</sup> <sup>→</sup> <sup>R</sup> and <sup>η</sup> can be restricted to <sup>R</sup> <sup>→</sup> <sup>R</sup>. Moreover s, t <sup>∈</sup> <sup>R</sup> and <sup>s</sup> <sup>∼</sup>*<sup>L</sup>* <sup>t</sup> implies <sup>s</sup> <sup>=</sup> <sup>t</sup>. Recall that <sup>∀</sup><sup>w</sup> ∈ T*OL*(X).β(δ(w)) <sup>∈</sup> <sup>R</sup>. Since <sup>β</sup> and <sup>δ</sup> are equivalence preserving, <sup>∀</sup>w1, w<sup>2</sup> ∈ T*OL*(X)

$$w\_1 \sim\_{OL} w\_2 \iff \beta(\delta(w\_1)) \sim\_{OL} \beta(\delta(w\_2))$$

Moreover, since (by Lemma 1) <sup>R</sup>*/*∼*<sup>L</sup>* is an ortholattice, we have

<sup>β</sup>(δ(w1)) <sup>∼</sup>*OL* <sup>β</sup>(δ(w2)) ⇐⇒ <sup>β</sup>(δ(w1)) <sup>∼</sup>*<sup>L</sup>* <sup>β</sup>(δ(w2))

i.e. in <sup>R</sup>, <sup>∼</sup>*OL*≡∼*L*. Then,

$$\beta(\delta(w\_1)) \sim\_L \beta(\delta(w\_2)) \iff \eta(\zeta(\beta(\delta(w\_1))) \sim\_L \eta(\zeta(\beta(\delta(w\_2))))$$

and since both <sup>η</sup>(ζ(β(δ(w1))) <sup>∈</sup> <sup>R</sup> and <sup>η</sup>(ζ(β(δ(w2))) <sup>∈</sup> <sup>R</sup>

$$\eta(\zeta(\beta(\delta(w\_1))) = \eta(\zeta(\beta(\delta(w\_2))))$$

We finally conclude:

**Theorem 2.** *NFOL* <sup>=</sup> <sup>η</sup> ◦ <sup>ζ</sup> ◦ <sup>β</sup> ◦ <sup>δ</sup> *is a computable normal form function for ortholattices.*

#### **3.3 Complexity and Normal Form Size**

Before presenting the algorithm in more detail, we argue why the normal form function from the previous section can be computed efficiently. We assume a RAM model and hence that creating new nodes in the tree representation of terms can be done in constant time.

Note that the size of the output of each of δ, β, ζ and η is linearly bounded by the size of the input. Thus, the asymptotic runtime complexity of the composition is the sum of the runtimes of these functions. Recall that δ (negation normal form) is computable in linear time and ζ and η are both computable in worst-case quadratic time, plus the time needed to compute <sup>≤</sup>*L*. Then, <sup>β</sup>, <sup>R</sup> and <sup>≤</sup>*<sup>L</sup>* are each computable in constant time plus the time needed for the mutually recursive calls. While a direct recursive implementation would be exponential, observe that the computation time of R and β is proportional to the total number of times they get called on. If we store (memoize) the results of the functions for each different input, this time can be bounded by the total number of different sub-nodes that are part of the input or which we create during the algorithm's execution. Similarly, ≤*<sup>L</sup>* needs to be applied to, at worst, every pair of such sub-nodes. Consequently, if we memoize the result of each of these functions at all their calls, we may expect to obtain at most quadratic time to compute them on all the sub-nodes of a formula.

The above argument is, however, not entirely sufficient, because computing <sup>R</sup>(<sup>a</sup> <sup>∧</sup> <sup>b</sup>) requires creating the new nodes <sup>¬</sup><sup>a</sup> and <sup>¬</sup><sup>b</sup> and then computing their negation normal form, which again creates new nodes. Indeed, note that, for memoization, we need to rely on *reference* (pointer) equality, as *structural* equality would take a linear amount of time to compute (for a total cubic time). Hence, to obtain quadratic time and space, we need to be able to negate a node in negation normal form without creating new nodes too many new nodes in memory. To do so, define op : <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ) → T*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ) by

$$\begin{array}{l} op(x) = x' \quad op(a \land b) = op(a) \lor op(b), \\ op(x') = x \quad op(a \lor b) = op(a) \land op(b) \end{array}$$

op(a) is functionally equal to <sup>δ</sup>(¬a), but has the crucial property that

children(op(τ )) === op[children(τ )]

Where τ denotes a formal conjunction or disjunction and children(τ ) is the set of children of τ as a tree. op can be efficiently memoized. Moreover, it can be bijectively memoized: if op(a) = b we shall also store op(b) = a. We thus obtain op(children(op(τ ))) === children(τ ). In this approach we are guaranteed to never instantiate any node beyond the n subnodes of the original formula (in negation normal form) and their opposite for a total of 2n nodes. Hence, we only ever needed to call op, <sup>R</sup> and <sup>β</sup> on up to 2<sup>n</sup> different inputs and <sup>≤</sup> on up to 4n<sup>2</sup> different inputs, guaranteeing a final quadratic running time.

**Minimal Size.** Finally, as none of δ, β, ζ and η ever increase the size of the formula (in terms of the number of literals, conjunctions and disjunctions), neither does NF*OL*. Consequently, for any term w, NF*OL*(w) is one of the smallest terms equivalent to w. Indeed, let wmin = w such that wmin is a term of smallest size in the equivalence class of w. In particular, NF*OL*(wmin) cannot be smaller than wmin (because wmin is minimal in the class) nor larger (because NF*OL* is size non-increasing). Since NF*OL*(w) = NF*OL*(wmin), NF*OL*(w) is of minimal size.

**Theorem 3.** *The normal form from Theorem 2 can be computed by an algorithm running in time and space* <sup>O</sup>(n<sup>2</sup>)*. Moreover, the resulting normal form is guaranteed to be smallest in the equivalence class of the input term.*

### **4 Algorithm with Memoization and Structure Sharing**

To obtain a practical realization of Theorem 3, we need to address two main challenges. First, as explained in the previous section, we need to memoize the result of some functions to avoid exponential blowup. Second, we want to make the procedure compatible with structure sharing, since it is an important feature for many applications.

By *memoization* we mean modifying a function so that it saves the result of the calls for each argument, so that they can be found without future recomputations. Results of function calls can be stored in a map. For single-argument functions we find it is typically more efficient to introduce a field in each object to hold the result of calling a function on it. Under *structure sharing* we understand the possibility to reuse subformulas multiple times in the description of a logical expression. In case of signature <sup>∧</sup>,∨,¬, such expressions can be viewed as combinational Boolean circuits. We represent such terms using directed acyclic graph (DAG) reference structures instead of tree structures.

Circuits can be exponentially more succinct than equivalent formulas, but not all formula rewrites are efficient in the presence of structure sharing (consider for example, rules with substitution such as <sup>x</sup> <sup>∧</sup> <sup>F</sup> <sup>x</sup> <sup>∧</sup> <sup>F</sup>[<sup>x</sup> := 1], where F may also be referred to somewhere else). Structure sharing is thus non-trivial to maintain throughout all representations and transformations. Indeed, making a naive recursive modification of a circuit will unfold the DAG into a tree, often causing an exponential increase in space. Doing so optimally also requires the use of memoization. Moreover, the choice of representations and datastructures is critical.

We show that it is possible to make both algorithms fully compatible with structure sharing without ever creating node duplicates. The algorithm ensures that the resulting circuits will contain a smaller number of subnodes, preserve equivalence, and enforce that two circuits have the same representation if and only if they describe the same term (by the laws of OL).


#### **Algorithm 2:** Computing Negations

```
1 def inverse( τ) // AIGFormula -> AIGFormula
2 if isDefined( τ.inverse) then
3 return τ.inverse
4 else
5 τ¯ ← τ.copy(polarity = !τ.polarity)
6 τ.inverse ← τ¯
7 τ¯.inverse ← τ
8 return τ¯
```



**Pseudocode.** Algorithms 1, 2, 3, 4 present pseudocode implementation of the normal form function from Theorem 2. To more easily maintain structure sharing and gain performance, we move away from the *negation normal form* representation and prefer to use a representation of formulas similar to AIG (And-Inverter Graph) where a formula is either a Conjunction, a Variable or a Literal and contains a boolean value telling if the formula is positive or negative (see Algorithm 1). This implies that δ needs to transform arbitrary Boolean formulas into AIGFormulas instead of negation normal forms. Fortunately, AIGFormula can be efficiently translated to NNF (and back) so we can view them as an alternative representation of terms in <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ). For the sake of space, we do not show the reduction from general formula trees on the signature (∧,∨,¬) and work directly with AIGFormulas, but the implementation needs memoization to avoid exponential duplication in presence of structure sharing.

Recall that computing R requires taking the negation of some formulas, and projecting them back into <sup>T</sup>*L*(<sup>X</sup> <sup>∪</sup> <sup>X</sup> ) with δ. Using *AIGFormula* makes it possible to always take the negation of a formula in constant time and space. The corresponding function *inverse(* τ *)* is in Algorithm 2, and corresponds to the op function from the previous section. The memoization ensures that for all τ , *inverse*(*inverse*(τ )) === τ , and our choice of data structure ensures that *children*(*inverse*(τ )) === *children*(τ ). Those two properties guarantee that any sequence of access to children and inverses of τ will always yield a formula object within the original DAG, or its single inverse copy. In particular, regardless of structure sharing in the input structure, we never need to store in memory more than twice the total number of formula nodes in the input. As explained in Sect. 3.3, a similar condition could be made to hold with NNF, but we believe it is more complicated and less efficient when implemented.

Function ≤ in Algorithm 3 is based on Whitman's algorithm adapted to AIGFormula. For memoization, because the function takes two arguments, we store in each node the set of nodes it is smaller than or not using two sets. Note that storing and accessing values in a set (even a hash set) is only as efficient as computing the equality relation on two objects is. Because structural equality == takes linear time to compute, we use referential equality with the *uniqueId* of each formula (declared in Algorithm 1). We found that using sparse bit sets yields the best performances.

The *simplify* function in Algorithm 4 makes a one-level simplification of a conjunction node, assuming that its children have already been simplified. We present the case when τ is positive. It works in three steps. The subfunction *zeta* corresponds to the ζ function from the previous section. It both flattens consecutive positive conjunctions and applies a transformation based on a strengthened version of the absorption law. Then at line 13, we filter out the nodes which are smaller than some other node, for example if <sup>c</sup> <sup>≤</sup> <sup>b</sup> then <sup>a</sup> <sup>∧</sup> <sup>b</sup> <sup>∧</sup> <sup>c</sup> becomes <sup>a</sup> <sup>∧</sup> <sup>c</sup>. This corresponds to function η. Finally, line 16 applies the contradiction law, i.e. if <sup>a</sup>∧b∧<sup>c</sup> ≤ ¬<sup>a</sup> then <sup>a</sup>∧b∧<sup>c</sup> becomes 0. Note again that checking only if either <sup>b</sup> ≤ ¬<sup>a</sup> or <sup>c</sup> ≤ ¬<sup>a</sup> holds is not sufficient (see for example the case <sup>a</sup> = (¬<sup>b</sup> ∨ ¬c). This corresponds to the β function. The correspondence with the three functions ζ, η and β is not exact; all computations are done in a single traversal over the structure of the formula, rather than in separate passes as the composition ◦ of functions in Theorem 2 might suggest.

**Importance of Structure Sharing.** As detailed in Sect. 6, our implementation finished in a few tenths of a second on circuits containing approximately 10<sup>5</sup> And gates, but whose expanded formula would have size over 10<sup>2000</sup>, demonstrating the compatibility of the algorithm with structure sharing. For this, we must ensure at every phase and for every intermediate representation, from parsing of the input to exporting the solution, that no duplicate node is ever created. This is achieved, again, using memoization. The complete and testable implementation of both the OL and OCBSL algorithms in Scala is available at https://github. com/epfl-lara/lattices-algorithms.

## **5 Application to More Expressive Logics**

This section outlines how we use OCBSL and OL algorithms in program verification. Boolean Algebra is not only relevant for pure propositional logic; it is also the coreof more complex logics, such as the ones used for verification of software.

```
Algorithm 4: Computing normal form
1 def simplify( τ) // Conjunction -> AIGFormula
     // Assume τ is positive
     // (In negative cases, some nodes must be inverted and ≤ reversed.)
2 newChildren ← List()
3 def zeta( child)
4 match child :
5 case PositiveConjunction :
6 newChildren.add(child.Children)
7 case child:NegativeConjunction :
8 gc ← child.children.find(gc 	→ τ≤ gc)
9 if isDefined( gc) then zeta( gc)
10 else newChildren.add(child)
11 for child ← τ.children do
12 zeta( child)
13 children' ← // filter out redundant children smaller than another child
14 if children'.size == 0 then return Literal(True)
15 else if children'.size == 1 then return children'.head
16 else if ∃ c ∈ children'. τ≤ inverse( c) then return Literal(False)
17 else return Conjunction(newChildren)
18
19 def NFOL( τ) // AIGFormula -> AIGFormula
20 if isDefined( τ.normal) then return τ.normal
21 else
22 τ.normal ← match τ :
23 case Variable(id, True): τ
24 case Variable(id, False): inverse( NFOL( inverse( τ)))
25 case Conjunction(children, polarity): simplify( children map NFOL
             polarity)
26 return τ.normal
```
Propositional terms appear as subexpressions of the program (as members of the Boolean type), but also in verification conditions corresponding to correctness properties. This section highlights key aspects of such a deployment.

We consider programs containing let bindings, pattern matching, algebraic data types, and theories including numbers and arrays. Let bindings typically arise when a variable is set in a program, but is also introduced in program transformations to prevent exponential increase in the size of program trees. Since OCBSL and OL are compatible with a DAG representation—fulfilling a similar role to let bindings—they can similarly "see through" bindings without breaking them or duplicating subexpressions.

If-then-else and pattern matching conditions can be analyzed and used by the algorithms, possibly leading to dead-branch removal or condition simplification. Extending OCBSL and OL to reason about ADT sorts further increases the simplification potential for pattern matching. For instance, given assumptions φ, a scrutinee s and an ADT constructor identifier id of sort S, we are interested in determining whether s is an instance of the constructor id. A trivial case includes checking the form of s. Otherwise, we can run OCBSL or OL to check whether <sup>φ</sup> <sup>=</sup><sup>⇒</sup> (<sup>s</sup> is id) holds. If <sup>φ</sup> <sup>=</sup><sup>⇒</sup> (<sup>s</sup> is id) fails, we instead test whether <sup>φ</sup> <sup>=</sup>⇒ ¬(<sup>s</sup> is id ) for all id <sup>=</sup> id <sup>∈</sup> <sup>S</sup>. We may also negatively answer to the query if <sup>φ</sup> <sup>=</sup><sup>⇒</sup> (<sup>s</sup> is id ) for some id <sup>=</sup> id <sup>∈</sup> <sup>S</sup>.

The original OCBSL algorithm presented in [20] achieves quasi-linear time complexity by assigning codes to subnodes such that equivalent nodes (by the laws of OCBSL) have the same codes. This is not required for the OL algorithm as it is quadratic anyway, but can still be done to allow common subexpression elimination. This is similar to hash-consing, but more powerful, as it also eliminates expressions which are equivalent with respect to OCBSL or OL.

Of particular relevance is the inclusion of underlying theories such as numbers or arrays. OL has an advantage over OCBSL in terms of extensibility. Namely, OL makes it possible to implement more properties of theories through expansion of its ≤*OL* relation (Algorithm 3) with inequalities between syntactically distinct *atomic* formulas. For example, if <sup>&</sup>lt;*<sup>I</sup>* and <sup>≤</sup>*<sup>I</sup>* are relations on mathematical integers in the theory of the SMT solver, our implementation deduces that (x <*<sup>I</sup>* <sup>y</sup>) <sup>≤</sup>*OL* (<sup>x</sup> <sup>≤</sup>*<sup>I</sup>* <sup>y</sup>) using the rule <sup>z</sup> <sup>+</sup> a <*<sup>I</sup>* 0 =<sup>⇒</sup> <sup>z</sup> <sup>+</sup> <sup>b</sup> <sup>≤</sup>*<sup>I</sup>* <sup>0</sup> when <sup>b</sup> <sup>≤</sup>*<sup>I</sup>* <sup>a</sup> + 1, instantiated with <sup>z</sup> <sup>=</sup> <sup>x</sup> <sup>−</sup> <sup>y</sup> and <sup>a</sup> <sup>=</sup> <sup>b</sup> = 0. In one of our benchmarks, this simple rule led OL to simplify a verification condition (VC) of the form <sup>¬</sup>(x <*<sup>I</sup>* <sup>y</sup> <sup>∧</sup> <sup>φ</sup><sup>1</sup> <sup>∧</sup> x >*<sup>I</sup>* <sup>y</sup> <sup>∧</sup> <sup>φ</sup>2) to true, which was of interest because φ1, φ<sup>2</sup> were large. This simplification is performed at line 16 of Algorithm <sup>4</sup> with <sup>τ</sup> <sup>=</sup> x <*<sup>I</sup>* <sup>y</sup> <sup>∧</sup> x >*<sup>I</sup>* <sup>y</sup> <sup>∧</sup> <sup>φ</sup>, where we have <sup>c</sup> <sup>=</sup> x >*<sup>I</sup>* <sup>y</sup> because <sup>τ</sup> <sup>≤</sup>*OL* (<sup>x</sup> <sup>≤</sup>*<sup>I</sup>* <sup>y</sup>) ⇐= (x <*<sup>I</sup>* <sup>y</sup>) <sup>≤</sup>*OL* (<sup>x</sup> <sup>≤</sup>*<sup>I</sup>* <sup>y</sup>). In contrast, OCBSL was not able to do the simplification because it is not able to systematically check for inequalities of subterms. For arrays, our implementation also checks for the property <sup>i</sup> <sup>=</sup> <sup>j</sup> <sup>≤</sup>*OL* <sup>a</sup>[<sup>i</sup> := <sup>v</sup>](j) = <sup>a</sup>(j). Combined with two other rules, related to congruence, OL performs particularly well for array-intensive benchmarks such as SortedArray. Note that in OCBSL we may encode a weak form of implication by specifying (giving the same code to) <sup>φ</sup> <sup>∧</sup> <sup>ψ</sup> <sup>=</sup> <sup>φ</sup> or <sup>φ</sup> <sup>∨</sup> <sup>ψ</sup> <sup>=</sup> <sup>ψ</sup>, but unlike the OL encoding, this does not even allow simplifying formulas such as <sup>φ</sup> <sup>∧</sup> <sup>τ</sup> ∧ ¬<sup>ψ</sup> without a specific check, which would require quadratic time in general.

**Other Extensions.** Beyond program verification, we suspect OL or OCBSL based techniques to be extendable in applications such as type checkers, interactive and automated theorem provers using first order, higher order, temporal and modal logics, SMT solvers or lattice problems in abstract interpretation. Unidirectional rules which may be particularly relevant for automated theorem proving include [f(x) = <sup>f</sup>(y)] <sup>≤</sup>*OL* [<sup>x</sup> <sup>=</sup> <sup>y</sup>], [∀x, P(x)] <sup>≤</sup>*OL* <sup>P</sup>(t), and <sup>P</sup> <sup>≤</sup>*OL* <sup>Q</sup> when <sup>P</sup> <sup>→</sup> <sup>Q</sup> is a known theorem. In the context of quantified logics and lambda calculus, both algorithms are compatible with de Bruijn index representation of bound variables. Both algorithms can be used as partial simplification before or while applying more powerful but possibly incomplete heuristic simplification methods, such has the simplification rule <sup>x</sup> <sup>∧</sup> <sup>F</sup>[x] <sup>x</sup> <sup>∧</sup> <sup>F</sup>[<sup>x</sup> := 1] (which, if viewed as an equality axiom, turns OL into Boolean algebra).

## **6 Evaluation**

Our experimental evaluation comprises three parts. First, we analyze the behavior of the OL and OCBSL algorithms on large random formulas, to understand the feasibility of using them for normalization. Second, we evaluate the algorithms on combinatorial circuits [1]. Third and most importantly, we show their impact through a new simplifier for verification conditions of the Stainless [22] verifier. The goal of the simplifier is to avoid the need to invoke a solver for some of the formulas by reducing them to True, as well as to normalize them before storing them in a persistent cache file. The cache avoids the need to repeatedly prove previously proven verification conditions. By improving normalization, we improve the cache hit rate. We conduct all experiments on a server with 2× Intel-Xeon-CPU E5-2680 v2 at 2.80 GHz, 40 cores including hyperthreading and 64 GB of memory.

#### **6.1 Randomly Generated Propositional Formulas**

We first evaluate the two algorithms on randomly generated formulas. We measure the running time and the reduction in formula size. We build the random formulas as follows.

**Definition 5.** *A random formula is parameterized by a size* s *and a set of available variables* <sup>X</sup> <sup>=</sup> {x1, ..., x*n*}*. Given a size* <sup>s</sup>*, if* <sup>s</sup> <sup>≤</sup> <sup>1</sup> *then pick uniformly at random a variable from* X *or its negation and return it. Otherwise, pick* t *such that* <sup>0</sup> <t<s <sup>−</sup> <sup>1</sup> *and generate two formulas* <sup>φ</sup><sup>1</sup> *and* <sup>φ</sup><sup>2</sup> *of sizes* <sup>t</sup> *and* <sup>s</sup> <sup>−</sup> <sup>1</sup> <sup>−</sup> <sup>t</sup>*. Return uniformly at random And*(φ1, φ2) *or Or*(φ1, φ2)*.*

**Running Time.** We show in Fig. 1a the approximate running time of both algorithms for various sizes of formulas. We ran the experiment 21 times for each formula size category and took the median. For comparison with a theoretically linear time process, we also give the running time of the corresponding negation normal form transformation. These implementations do not come with low-level optimizations and are intended for demonstrating usability in practice, and do not serve as a competitive indicator.

**Fig. 1.** (a) Median running time of NNF and the two algorithms (log-log scale). (b) Median size of the normalized formulas relative to the original in NNF. <sup>|</sup>*X*<sup>|</sup> = 50 variables.

**Size Reduction.** For a fairer comparison, we apply a basic simplification (flattening and transformation into negation normal form) to random formulas before computing their size. We compare the number of connectors before and after the simplification for both algorithms. We show the relative improvements of the OL and OCBSL algorithms compared to the original formulas for various sizes of formulas and 50 variables. We have run both algorithms 21 times and report the median results in Figs. 1b.

It is interesting to note that the OL normal form is consistently and significantly smaller than the OCBSL normal form, i.e. the Absorption law actually allows non-trivial reductions in size. This confirms that, in general, there is a trade-off between the two algorithms between speed and simplification strength.

### **6.2 Computing Normal Forms for Hardware Circuits**

Moving towards more realistic formulas, we assess the scalability of OCBSL and OL on the EPFL Combinatorial Benchmark [1] comprising 10 arithmetic circuits designed to challenge optimization tools, with up to 10<sup>8</sup> gates.

**Table 2.** Results on the EPFL Combinatorial Benchmark. OL times-out for hyp after 1h.


We run the experiment five times. We report the median running time and the relative size after optimization in Table 2. We observe that the OCBSL algorithm is close to as good as the OL algorithm in all cases, and, moreover, that it is very time-efficient even for problems with hundreds of millions of gates. The OL algorithm sometimes performs slightly better and is pretty much as timeefficient for not too large inputs, but becomes significantly more time-consuming for inputs with more than approximately 10<sup>6</sup> gates. Those results suggest on one hand that OCBSL may be a more suitable reduction technique on some applications with very large formulas, depending on their internal structures. It also suggests that both algorithms work well in practice with Boolean circuits making heavy use of structure sharing. Indeed, the expanded form of, for example, the adder circuit would have about 2<sup>2000</sup> nodes.

#### **6.3 Caching Verification Conditions in Stainless**

We implement the approach described in Sect. 5 by modifying the Stainless verifier [22,40] <sup>1</sup>, a publicly available tool for building formally verified Scala programs.

<sup>1</sup> https://github.com/epfl-lara/stainless/.

Our implementation adds two new simplifiers to Stainless: OCBSL-backed and OL-backed. They are part of Stainless release v0.9.8<sup>2</sup> and are selectable by the command line options --simplifier=ocbsl and --simplifier=ol respectively. For the OL simplifier, we have extended the ≤*OL* relation with 12 simple arithmetic and array rules.

We experimentally compare the two new simplifiers to the existing one (which we denote Old). We use two groups of benchmarks: (1) six Stainless case studies from the Bolts repository<sup>3</sup> that take a significant amount of time to verify, and (2) nine benchmark sets from automated grading of student assignments. Together, this constitutes around 84'000 lines of Scala code, specifications, and auxiliary assertions. We report the following metrics: the size of the VCs after simplification, the number of cache hits, the number of VCs simplified to 1, the wall-clock time and the cumulative solving time. The wall-clock time comprises the full Stainless pipeline, from parsing the program to outputting the result, passing by solver calls and VC simplification.

**Fig. 2.** VCs (tree) size scatter plot from all benchmarks for Old, OCBSL and OL.

**Evaluation on Bolts Case Studies.** We consider the following case studies from the mentioned Bolts repository:


<sup>2</sup> https://github.com/epfl-lara/stainless/releases/tag/v0.9.8.

<sup>3</sup> https://github.com/epfl-lara/bolts.

– ConcRope (408 VCs, 621 LOC), a Conc-Tree rope [36], supporting amortized constant time append and prepend operation, based on a Leon formalization [30].

We report the VCs size measurement in Fig. 2, where we aggregate the results from all benchmarks. Figure 2a reveals a couple of VCs with an increased size. Inspection of these VCs shows the reason is due to the new simplifiers always inlining "simple expressions", such as field projection on free variables, instead of having them bound. On average, OCBSL and OL decrease the size of the VCs by 37% compared to Old. OL reduces the size of the VCs slightly compared to OCBSL (Fig. 2b).

**Fig. 3.** Old, OCBSL and OL results for cache hits, VCs reduced to 1, solving and running time. (c), (d) are normalized with respect to Old. In (c), the gray boxes represent the time spared due to extra cache hits and VCs reduced to 1 compared to Old.

In Fig. 3a, we report the cache hit ratio. For the new simplifiers, reducing the formula size has the desired effect of noticeably increasing the hit ratio, especially for 4 out of 6 benchmarks. The additional power of OL helps for System F and SortedArray.

We report in Fig. 3c not only the solving time for the two simplifiers (normalized with respect to Old), but also the solving time saved thanks to additional cache hits and VCs simplified to 1. ConcRope and RedBlack do not benefit from the new simplifiers, while the other benchmarks do in various degrees. For LongMap, adding the two ratios yields a ratio of <sup>≈</sup> 1, implying the reduced solving time is due to extra caching. The solver did not benefit from the new simplifiers for non-cached VCs. The System F benchmark shows a ratio exceeding 1, meaning that OCBSL and OL did not help the solver more than the extra time they took to run. For QOI and SortedArray, the combined ratio is less than 1: the new simplifiers helped the solver for non-cached VCs. OL performs significantly better than OCBSL in the SortedArray benchmark, thanks to the extension of the <sup>≤</sup>*OL* relation with array rules. We note that 25% of QOI VCs have a size of more than 880, against 480 for the second benchmark (SortedArray), and 450 for the third (LongMap).

Turning our attention to Fig. 3d, we note that the time spared to solver calls is essentially compensated for more work on the new simplifiers on three of the benchmarks. Moreover, LongMap, SortedArray and especially QOI have a net benefit over Old.

OCBSL and OL simplifiers show the greatest improvement on large VCs. Note that the outcome of a Stainless run highly depends on user-provided assertions, which were hand-tuned under the Old simplifier. It is thus possible that new simplifiers have a disadvantage because they were not used during the verification process. The additional power provided by the new simplifiers may make writing such intermediate assertions easier and faster, so we expect the full advantage of new simplifiers in newly developed verified software.


**Table 3.** Results on programming assignments

**Evaluation on Programming Assignments.** We additionally evaluate our approach on benchmarks consisting of many student solutions for several programming assignments. We consider benchmarks from [32,33], obtained by translation of student solutions in OCaml [38]. In this evaluation, we only prove termination of all student solutions, which is one of the bottlenecks when proving correctness of students solutions. We annotated all benchmarks with explicit decreasing measures. Stainless generates verification conditions that require the measure to decrease in recursive calls. Caching is particularly desirable in this scenario, with many programs and a high degree of similarity. Table 3 shows our evaluation results, comparing the two new simplifiers (OCBSL and OL) to the old one.

First, we note that moving from Old to OCBSL to OL reduces the number of calls to the solver. Furthermore, many new VCs are proven valid by normalization alone (reduced to 1). The largest benefit of OL is in the sigma benchmark, where the subsumption of linear arithmetic literals in the simplifier substantially increases the number of formulas proven by normalization: from 6 (0.4%) in OCBSL to 794 (52%) for OL.

The new simplifiers improve the number of cache hits, even if not as much as for the Bolts case studies. The smaller reduction is because there is a high degree of similarity across the submissions, so the Old simplifier already achieves a large percentage of cache hits. Note also that a smaller number of cache hits in the sigma benchmark is because many of the VCs are proven valid by the simplifier, avoiding the need to consult the cache or the solver in first place.

Second, we notice a slight reduction in the overall VC size, with a couple of exceptions where OCBSL resulted in a size increase due to inlining. Thanks to formulas proven by normalization and improved cache hits, the overall solving time decreases in several benchmarks. The wall clock running time is approximately unchanged, but we expect such benefits in the future.

## **7 Conclusion**

We proposed a new approach to simplify and reason about formulas, based on algorithms which are sound and complete for the normal form problem (and the word problem) of two subtheories of Boolean algebra. These algorithms are sound but incomplete for Boolean algebras (and thus for the two-element boolean algebra of propositional logic). We introduced and proved the correctness of a new algorithm to compute normal forms in a theory of *ortholattices*, which do not enforce the distributivity law but only its weaker variation, absorption. Our algorithm runs in time <sup>O</sup>(n<sup>2</sup>). A weaker subtheory, OCBSL, gives up the absorption law. The disadvantage of OCBSL is a weaker normal form, whereas the advantage is that we know of an algorithm running in subquadratic time, <sup>O</sup>(<sup>n</sup> log(n)<sup>2</sup>). We evaluated both algorithms, using them to reduce the size of large random formulas and combinatorial circuits, showing that they work well with structure sharing. We also implemented the algorithms in the Stainless verifier, where computing normal forms reduced the size of formulas given to the solver and improved the cache hit ratio. Our experimental evaluation confirmed that the tradeoff between normal form strength and the asymptotic complexity remains visible in practice. We found both algorithms useful in practice. OCBSL normalization has excellent running time even for very large circuits, so we believe it can replace the simpler negation normal form and syntactic equality checking at low cost in essentially all applications. The quadratic cost of the OL algorithm is too prohibitive on circuits over 10<sup>7</sup> gates. However, this was not a problem for its application to verification conditions in Stainless, where its added precision and the ability to compare atomic formulas made it more effective in normalizing certain formulas to True and increasing cache hits. In some of the most difficult case studies, such as Quite OK Image Format [10], these improvements translated into substantial reduction of the wall clock time. Such measurable improvements, combined with theoretical guarantees, make the OL and OCBSL algorithms an appealing building block for verification systems.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Kratos2: An SMT-Based Model Checker for Imperative Programs

Alberto Griggio1(B) and Martin Jonáš1,2

<sup>1</sup> Fondazione Bruno Kessler, Trento, Italy griggio@fbk.eu <sup>2</sup> Masaryk University, Brno, Czechia martin.jonas@mail.muni.cz

Abstract. This paper describes Kratos2, a tool for the verification of imperative programs. Kratos2 operates on an intermediate verification language called K2, with a formally-specified semantics based on smt, allowing the specification of both reachability and liveness properties. It integrates several state-of-the-art verification engines based on sat and smt. Moreover, it provides additional functionalities such as a flexible Python api, a customizable C front-end, generation of counterexamples, support for simulation and symbolic execution, and translation into multiple low-level verification formalisms. Our experimental analysis shows that Kratos2 is competitive with state-of-the-art software verifiers on a large range of programs. Thanks to its flexibility, Kratos2 has already been used in various industrial projects and academic publications, both as a verification back-end and as a benchmark generator.

## 1 Introduction

We present Kratos2, a tool for the verification of real-world imperative programs. Kratos2 is a complete rewrite and redesign of Kratos [17], improving and extending it in multiple directions. First, Kratos2 introduces a simple yet expressive intermediate language called K2, with a formally-specified semantics based on Satisfiability Modulo Theories (smt), which is parametric on the underlying smt theory. K2 is expressive enough to capture most of the features of real-world C programs, such as pointers, dynamic memory allocation, floating-point data types, and bit-precise semantics of bounded integers, which the old version of the tool could not handle (being limited to C programs without pointers and recursion, and in which C integers were interpreted as mathematical integers). Kratos2 comes with a separate C front-end c2Kratos that can translate C programs to K2. Second, Kratos2 includes a variety of state-of-the-art verification back-ends based on either symbolic model checking or symbolic execution with sat and smt solvers. Besides reachability properties, Kratos2 also supports various forms of

A. Griggio has been partly supported by the project "AI@TN" funded by the Autonomous Province of Trento and by the PNRR project FAIR - Future AI Research (PE00000013), under the NRRP MUR program funded by the NextGenerationEU. M. Jonáš has been partly supported by the Czech Science Foundation grant GA23-06506S.

*liveness* properties, which can be used to encode termination and more complex linear-time temporal properties. Third, Kratos2 implements an interactive interpreter, which can simulate K2 programs using non-deterministic inputs provided either by the user or by external oracles. Kratos2 also supports counterexample reconstruction, another feature not available in the original Kratos.

The new intermediate language K2 enables modular translation of C programs into various verification languages. Namely, Kratos2 can be used for translating C programs into nuXmv [14], vmt [20], aiger [9], Btor2 [31], Constrained Horn Clauses (chcs) [11], or Boogie [29] formats. Additionally, Kratos2 comes with a Python api for construction and manipulation of K2 programs, which the users can leverage to implement custom front-ends and generators of K2 programs and also additional translators from K2 to other formalisms.

Although Kratos2 has not been described in a publication until now, it has already been successfully used in several research and industrial projects. In particular, Kratos2 has been used as a back-end for the verification of automotive software in the context of the autosar platform [15,16]; of C code automatically generated from aadl specifications by the taste development environment [12]; and for verification of C code for railway interlocking systems automatically generated from the specifications in a controlled natural language [1]. Kratos2 has also been used as a benchmark generator to produce symbolic transition systems from C programs [30].

The rest of the paper is structured as follows. The functionalities offered by Kratos2 from the user perspective are described in Sect. 2; Sect. 3 introduces K2, describing its syntax and formal semantics. The internal architecture of Kratos2, with details about its main components, is presented in Sect. 4; implementation notes and experimental evaluation on C programs from the annual software verification competition sv-comp are provided in Sect. 5. Finally, Sect. 6 concludes the paper and presents directions for future developments.

### 2 Functional View

In this section we provide a high-level overview of the functionalities available in Kratos2. More details will then be provided in the following sections.

An Intermediate Language for Imperative Programs. The core of Kratos2 is built around an idealized language for imperative programs called K2. Unlike common high-level real-world programming languages, K2 has a simple and clean semantics based on first-order logic modulo theories that is fully formally specified. The K2 language, similar in spirit to other intermediate verification languages proposed in the literature such as Boogie [29] or Why3 [26] (although less feature rich than the two), is at the same time simple enough to be easily manipulated and translated into formalisms used by sat-based and smt-based verification back-ends on one hand, and expressive enough to efficiently capture a significant subset of C on the other, as demonstrated also by our experimental results on standard sv-comp benchmarks (see Sect. 5).

Verification of Safety and Liveness with Multiple Back-Ends. Kratos2 implements multiple state-of-the-art verification algorithms based on sat and smt, supporting both bit-precise reasoning over machine integers and floatingpoint numbers as well as higher-level reasoning based on, e.g., mathematical integers, real numbers, and uninterpreted functions, depending on the combinations of theories used in the input K2 program under analysis. Moreover, Kratos2 supports not only the verification of safety properties (via a reduction to reachability of designated "error" program locations), but it also supports liveness properties such as proving that a specific program location is reached a finite number of times in all executions, or that it is always visited infinitely often in all infinite executions.

A Python api for Program Manipulations. Kratos2 provides a rich and flexible Python api for parsing, printing, and manipulating K2 programs and expressions, which can be used to implement converters from high-level languages to K2 or to directly generate K2 programs from user-specific applications.

A Customizable C Front-End. Kratos2 comes with a front-end for C programs which supports a wide range of customization options for controlling the translation from C to K2. These range from the choice of theories to use to encode C data types (e.g., bit-vectors or unbounded integers), to the use of customized program transformations or the injection of new built-in functions with special meaning (such as special assume, malloc, or memset built-ins). Thanks to its plug-in architecture, the front-end can be easily customized for domain-specific subsets of C, for example to implement special optimization passes that are safe only in the given context, or to automatically inject properties to the code based on specification files (as is, e.g., the case in sv-comp [3]).

Encoding into Multiple Formalisms. Kratos2 can be used as an encoder or benchmark generator because it can translate imperative programs written in C or in K2 into other formalisms, including symbolic transition systems in nuXmv [14], vmt [20], aiger [9] or Btor2 [31] formats, Constrained Horn Clauses (chcs) [11], or other intermediate verification languages like Boogie [29].

Simulation and Symbolic Execution. Finally, Kratos2 can be used as an interpreter, allowing an (interactive) simulation of K2 programs and their symbolic execution, as an alternative to the verification back-ends based on model checking.

## 3 The K2 Language

In this section we introduce K2, the intermediate verification language used by Kratos2. We present its abstract syntax, formally define its semantics, and discuss its support for safety and liveness properties.

Fig. 1. Abstract syntax of K2 statements and expressions.

Fig. 2. Abstract syntax of K2 programs.

Abstract Syntax. We denote lists of elements with an overbar, i.e., ·. If a is a list, <sup>|</sup>a<sup>|</sup> is its length, and if <sup>i</sup> is a natural number, <sup>a</sup><sup>i</sup> is the <sup>i</sup>-th element of <sup>a</sup>. If e is an element, <sup>a</sup> · <sup>e</sup> is the list obtained by appending <sup>e</sup> at the end of <sup>a</sup>.

Definition 1 (Variables and Functions). *A* variable *is a symbol with an associated sort, as in the multi-sorted first-order logic. A* function *is a tuple* f, a, r, l, <sup>σ</sup>*, where:*


Given a list of variables <sup>v</sup>, we define syms(v) as the corresponding set of symbols. Given a function f, a, r, l, <sup>σ</sup>, we denote with syms(f) the set syms(a)<sup>∪</sup> syms(r)∪syms(l). We extend the definition to lists of statements σ in the natural way. We now describe K2 programs, whose abstract syntax is shown in Fig. 2.

Definition 2 (Programs). *<sup>A</sup>* program P *is a tuple* g, F, ι, e*, where:*


Semantics. We use the standard notions of theory, interpretation, model, and satisfaction from many-sorted first-order logic and smt [2]. In the following, we assume that we have fixed a theory T with equality that contains at least the sort Bool. Given an interpretation μ that is a model for T, we define the *evaluation* of an expression e (generated by the grammar of Fig. 1) under μ, denoted μ[e], as μ[e] = μ(v) for e <sup>=</sup> var v and μ[e] = μ(o)(μ[p<sup>1</sup>],...,μ[p<sup>n</sup>]) for <sup>e</sup> <sup>=</sup> op <sup>o</sup> <sup>p</sup> and n <sup>=</sup> <sup>|</sup>p|. We denote with <sup>μ</sup>[<sup>v</sup> → <sup>e</sup>] the interpretation that maps <sup>v</sup> to <sup>e</sup>, and that agrees with μ everywhere else, and with μ[\v] *any* interpretation that agrees with μ on all the symbols *except* v. Finally, if e is of sort Bool, we write μ <sup>|</sup><sup>=</sup> e to denote that e evaluates to true under μ.

Definition 3 (Program states). *Pairs* f, i *where* f *is a function name and* i *is a natural number are called* program locations*. A* state *of a program* P *is a pair* s <sup>=</sup> -G, <sup>C</sup> *where:*


*A state* <sup>s</sup> *is* initial *if and only if* <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ι</sup>*,* <sup>|</sup>C<sup>|</sup> = 1 *and* <sup>C</sup><sup>1</sup> <sup>=</sup> e, <sup>1</sup>, L *for some* <sup>L</sup>*. Given a state* <sup>s</sup> *with* <sup>C</sup><sup>|</sup>C<sup>|</sup> <sup>=</sup> f, i, L*, we define the* current interpretation μ *for* s *as* μ(v) = G(v) *for* v <sup>∈</sup> syms(g) *and as* <sup>μ</sup>(v) = <sup>L</sup>(v) *otherwise.*

We define the semantics for programs as a set of transition rules of the form s σ −→ s , where s, s are states and σ is a statement. We then call a *path* of a program <sup>P</sup> any sequence of transitions (possibly infinite) <sup>s</sup><sup>0</sup> <sup>σ</sup><sup>0</sup> −→ ... <sup>σ</sup>*<sup>i</sup>* −→ <sup>s</sup><sup>i</sup>+1 ... that complies with the transition rules and where <sup>s</sup><sup>0</sup> is an initial state.

The rules are shown in Fig. 3. In the definitions, we fix a program P <sup>=</sup> g, F, ι, e and use the following convenience functions, where f is a function name and <sup>i</sup> a natural number: arg(f, i) returns the variable <sup>a</sup><sup>i</sup> of the function <sup>F</sup>(f); ret(f, i) returns the variable <sup>r</sup><sup>i</sup> of the function <sup>F</sup>(f); stmt(f, i) returns the statement <sup>σ</sup><sup>i</sup> of <sup>F</sup>(f); stmts(f) returns the list of statements <sup>σ</sup> of <sup>F</sup>(f).

Reachability and Liveness. We then say that a state s is *reachable* in P iff there exists a finite path <sup>s</sup><sup>0</sup> <sup>σ</sup><sup>0</sup> −→ ... <sup>σ</sup>*<sup>n</sup>* −−→ s that ends in s. Similarly, a *program location* f, i *is reachable* iff there exists a path as above in which <sup>σ</sup><sup>n</sup> <sup>=</sup> stmt(f, i)<sup>1</sup>.

<sup>1</sup> Note that here we assume w.l.o.g. that all statements in a program are different, even when they are structurally equal, so the above definition is unambiguous.

Fig. 3. Transition rules. In all the rules, *<sup>µ</sup>* denotes the current interpretation for the left-hand state of the rule.

Conversely, if no such path exists, then f, i is *unreachable*. The location f, i is *infinitely-often reachable* iff there exists an infinite path <sup>s</sup><sup>0</sup> <sup>σ</sup><sup>0</sup> −→ ... <sup>σ</sup>*<sup>i</sup>* −→ <sup>s</sup><sup>i</sup>+1 ... in which for all indices <sup>j</sup> there exists an index k>j such that <sup>σ</sup><sup>k</sup> <sup>=</sup> stmt(f, i). If no such path exists, then f, i is *eventually unreachable*. Finally, we say that f, i is *live* iff it is infinitely-often reachable *in all infinite paths of* P*.*

In K2, queries about reachability or liveness of program locations are expressed via *annotations* of label statements. Annotations are metadata that are attached to statements, in the form of key-value pairs, which do not affect the semantics of the program, but are meant to provide additional information that can be used by tools that manipulate the K2 program. Specifically, Kratos2 uses the following annotations to define properties:

error **<id>**: holds iff all labels annotated with the same <id> are unreachable; notlive **<id>**: holds iff all labels annotated with the same <id> are eventually unreachable;

live **<id>**: holds iff all labels annotated with the same <id> are live.

These basic properties can be easily used to represent more common higher-level properties of programs, such as assertions and termination. For example, assertions can be reduced to reachability with a combination of assume and jump statements, whereas termination can be checked by adding a final self loop over a label with an attached live annotation. Finally, eventual unreachability can be used to encode arbitrary ltl properties using the standard automata-theoretic approach combined with a symbolic encoding of the accepting automaton such as [22].<sup>2</sup>

#### 3.1 Example

We conclude this section with a simple example of a C program and its equivalent formulation in K2. Both versions are shown in Fig. 4. Most of the code is translated in a fairly direct way (with conditional statements and structured loops translated into nondeterministic jumps constrained by assumptions). However, since in K2, unlike in C, global variables are uninitialized by default, the K2 program contains an additional setup function (called init\_and\_main in the example) that sets glbl to zero before calling the original main. Another point to highlight is the use of the :error annotation (highlighted in bold) to model the C assertion.

## 4 Architectural View

This section describes the main components of Kratos2 and the flow of information among them. From the high-level point of view, Kratos2 is composed of the front-end c2Kratos, which converts the input C program to the K2 language, and of the core Kratos2, which is responsible for parsing, simplifications, transformations, and verification of K2 code. This separation helps to keep the core Kratos2 simple, as it does not have to handle the complex semantic nuances of C. Moreover, it makes it easy to add front-ends for new languages by writing a separate translator from the language in question to K2.

The front-end c2Kratos reads the input C file, builds its abstract syntax tree (ast) and then builds the corresponding K2 code in two passes. In the first pass, it converts the ast to an *extended* K2. Compared to the standard K2, the extended K2 also has primitives for pointers, records, complex loops, and compound instructions. These are removed in the second pass, by converting pointers to operations over maps, records to multiple variables, complex loops to sequences of assignments, jump instructions, and assumptions, and compound instructions to sequences of basic assignments to auxiliary variables.

The core Kratos2 consists of several components, whose relationships are visualized in Fig. 5:

<sup>2</sup> In the case of ltl properties, the question arises as to what to consider as an atomic step of the program. This is both crucial and application-dependent: for example, in embedded software consisting of a "transition function" that is executed periodically, it might make sense to consider each call to such function as one step, whereas in other contexts a more fine-grained notion of step might be needed. K2 (and Kratos2) makes no commitment about this, providing only the support for eventual unreachability of label statements, which can always be defined unambiguously.


Fig. 4. Example C program and its K2 translation.

cfg builder and simplifier reads the input K2 file and builds the corresponding interprocedural control flow graph (cfg). It then performs several simplifications of the cfg, such as constant propagation and lightweight slicing. The result can be used either by the interpreter, symbolic executor, or one of the encoders. The simplified cfg can also be converted back into a K2 representation.

Interpreter interprets the cfg using the externally provided inputs to guide the execution. The inputs contain new values for all havoc commands and also destination labels for all nondeterministic jump commands. The inputs can be provided by the user, a random generator, or by one of the verification engines. The last option is used for counterexample reconstruction and validation.

Transition system encoder encodes the cfg to a symbolic transition system over a suitable theory. The encoder first inlines all function calls in the program. It then encodes the resulting inlined program using *large block encoding* [4], which allows encoding larger acyclic subgraphs of the cfg by a single transition formula. The resulting transition system can be verified by one of the available verification back-ends, or converted to a textual representation in one of the available output formats (vmt [20], nuXmv [14], Btor2 [31], or aiger [9]).<sup>3</sup>

<sup>3</sup> Depending on the features of the input K2 program, some of the verification backends or output formats might not be available. E.g., sat-based engines are not available if the K2 program contains some infinite-state variables.

Fig. 5. Architecture of Kratos2.

chc encoder converts the cfg to a set of Constrained Horn Clauses [11]. In contrast to the transition system encoder, the chc encoder supports interprocedural analysis and recursive functions, encoded as a set of *non-linear* chc*s* as described, e.g., in [28].

Symbolic executor implements a classical symbolic execution algorithm with iterative deepening to avoid getting stuck in long uninteresting branches. It supports (possibly recursive) K2 programs over arbitrary combinations of integers, reals, bit-vectors, floats, and arrays.

smt-based engines encompass several smt-based verification algorithms of symbolic transition systems. For reachability properties, Kratos2 implements standard bounded model checking (bmc) [7], k-induction [32], and IC3 with implicit predicate abstraction [18]. For liveness properties, we use a procedure combining liveness-to-safety reduction with ranking functions synthesis [23].

sat-based engines encompass several verification algorithms of finite-state symbolic transition systems. Namely, for transition systems over the theory of bit-vectors and floats, Kratos2 offers bmc, k-induction, and different variants of IC3 [13], working over the bit-blasted Boolean transition system, for both reachability and liveness properties. Additionally, Kratos2 implements a dedicated engine for reachability properties in transition systems over the theory of bit-vectors, floats, and arrays similar to [10,30].

## 5 Implementation and Experimental Evaluation

Implementation. Core Kratos2 is implemented in C++ on top of the Math-SAT5 [19] smt solver and the nuXmv [14] symbolic model checker. The sat-based verification engine additionally makes use of the MiniSat [25] and CaDiCaL [8] sat solvers. The front-end c2Kratos is implemented in Python and relies on pycparser for parsing of the input C program. Kratos2 is freely available for non-commercial purposes from https://kratos.fbk.eu.


Table 1. Solved benchmarks by the three compared tools. Column U shows the number of solved *unsafe* benchmarks, S of *safe* benchmarks, and W of *wrong* results.

Experimental Setup. We performed an experimental evaluation to answer two research questions: Is the K2 language expressive enough to efficiently represent realistic C programs? Do the engines implemented in Kratos2 offer reasonable performance on realistic verification tasks? To this end, we considered all the C programs from the *ReachSafety* category of the 2022 edition of the annual software verification competition sv-comp [3].The category consists of 5400 C programs divided into 12 benchmark families. We compared Kratos2 with Veri-Abs 1.4.2 [24] and CPAchecker 2.2 [5], respectively the winner and runner-up of the *ReachSafety* category of sv-comp 2022. Similarly to the approach used by CPAchecker, we executed Kratos2 in *sequential portfolio* mode, which successively runs symbolic execution, smt-based IC3, sat-based IC3, and smt-based bmc with predetermined time-outs for each of the engines.

The experiments were performed on several identical pcs equipped with Intel Core i7-8700 cpu @ 3.20 GHz and 32 GiB of ram. Each execution was limited to use a single cpu core, 15 min of cpu time, and 8 GiB of ram. For reliable benchmarking, all experiments were executed using BenchExec [6]. A replication package describing the details of the setup is available at https://doi.org/ 10.5281/zenodo.7890411.

Results. To answer the first research question, we observe that from the total 5400 benchmarks, only 56 were not converted to K2 by c2Kratos due to unsupported floating point built-ins or features such as variable length arrays.

To answer the second research question, Table 1 shows the numbers of solved benchmarks by the individual tools and quantile plots in Fig. 6 show their running times. The results show that Kratos2 is competitive with CPAchecker on all benchmark families except for eca. It is also competitive with VeriAbs on most benchmark families. There are 23 benchmarks uniquely solved by Kratos2, 48 by CPAchecker, and 1039 by VeriAbs. Moreover, both Kratos2 and VeriAbs produced no wrong results, unlike most other participants of sv-comp.

Fig. 6. Quantile plots of solved benchmarks for all three compared tools in individual benchmark families. The plot shows the number of benchmarks (*y*-axis) that were solved within the given number of seconds (*x*-axis).

We remark that CPAchecker is an established and optimized software verifier that regularly scores high in software verification competitions, and that VeriAbs implements algorithm selection heuristics, using both its own custom engines and external state-of-the-art verifiers. As such, it is not surprising that it performs much better than Kratos2 and CPAchecker on some of the families.

We conclude that the K2 language is expressive enough to efficiently capture a significant subset of C used in realistic programs. Furthermore, the verification engines implemented in Kratos2 mostly offer a performance comparable with state-of-the-art software verifiers.

#### 6 Conclusions and Future Work

We have described Kratos2, a mature software verifier for imperative programs written in K2, a new intermediate verification language with a formal semantics based on smt. Kratos2 is a complete rewrite of the original Kratos tool, offering significant extensions in functionalities and performance. The tool has already been successfully applied in various contexts, both industrial and academic.

As future work, we will consolidate the (currently alpha-quality) implementation of the esst algorithm of the original Kratos [21] to handle multithreaded programs with cooperative scheduling. We will also investigate a tighter integration with chc solvers to better handle recursive programs, as well as improved techniques to handle arrays and pointers such as [27,33]. On the language side, we plan to add support for contracts and pre-/post-conditions via annotations.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Making IP **=** PSPACE Practical: Efficient Interactive Protocols for BDD Algorithms

Eszter Couillard<sup>1</sup> , Philipp Czerner1(B) , Javier Esparza<sup>1</sup> , and Rupak Majumdar<sup>2</sup>

<sup>1</sup> Technical University of Munich, Munich, Germany {coillar,czerner,esparza}@in.tum.de <sup>2</sup> Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany rupak@mpi-sws.org

Abstract. We show that interactive protocols between a prover and a verifier, a well-known tool of complexity theory, can be used in practice to certify the correctness of automated reasoning tools.

Theoretically, interactive protocols exist for all PSPACE problems. The verifier of a protocol checks the prover's answer to a problem instance in probabilistic polynomial time, with polynomially many bits of communication, and with exponentially small probability of error. (The prover may need exponential time.) Existing interactive protocols are not used in practice because their provers use naive algorithms, inefficient even for small instances, that are incompatible with practical implementations of automated reasoning.

We bridge the gap between theory and practice by means of an interactive protocol whose prover uses BDDs. We consider the problem of counting the number of assignments to a QBF instance (#CP), which has a natural BDD-based algorithm. We give an interactive protocol for #CP whose prover is implemented on top of an extended BDD library. The prover has only a linear overhead in computation time over the natural algorithm.

We have implemented our protocol in blic, a certifying tool for #CP. Experiments on standard QBF benchmarks show that blic is competitive with state-of-the-art QBF-solvers. The run time of the verifier is negligible. While loss of absolute certainty can be concerning, the error probability in our experiments is at most 10−<sup>10</sup> and reduces to 10−10<sup>k</sup> by repeating the verification k times.

This work was supported by an ERC Advanced Grant (787367: PaVeS), by the Deutsche Forschungsgemeinschaft project 389792660 TRR 248—CPEC, and by the Research Training Network of the Deutsche Forschungsgemeinschaft (DFG) (378803395: ConVeY).

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 437–458, 2023. https://doi.org/10.1007/978-3-031-37709-9\_21

#### 1 Introduction

Automated reasoning tools often underlie our assertions about the correctness of critical hardware and software components. In recent years, the scope and scalability of these techniques have grown significantly.

Automated reasoning tools are not immune to bugs. If we are to trust their verdict, it is important that they provide evidence of their correct behaviour. A substantial amount of research has gone into proof-producing automated reasoning tools [4,14,16,22,23]. These works define a notion of "correctness certificate" suitable for the reasoning problem at hand, and adapt the reasoning engine to produce independently checkable certificates. For example, SAT solvers produce either a satisfying assignment or a proof of unsatisfiability in some proof system, e.g. resolution (see [16] for a survey). Extending such certificates beyond boolean SAT is an active area of current research [3,4,18,24,29].

In the worst case, the size of certificates grows exponentially in the size of the input, even for boolean unsatisfiability (unless NP = coNP). If users have limited computational or communication resources, transferring and checking large certificates becomes a burden. Large certificates are not just a theoretical curiosity. In practice, resolution proofs for complex SAT problems may run to petabytes [15]. Ideally, we would prefer "small" certificates (polynomial in the size of the input) which can be checked independently in polynomial time.

The IP = PSPACE theorem proves that certification with polynomial verification time is possible for any problem in PSPACE, provided one trades off absolute certainty for certainty with high probability [27]. The complexity class IP consists of those languages for which there is a polynomial-round, complete and sound *interactive protocol* [1,2,13,20]—a sequence of interactions between a (computationally unbounded) prover and a (computationally bounded) verifier after which the verifier decides whether the prover correctly performed a computation. The protocol is complete if, whenever an input belongs to the language, there is an *honest prover* who can convince a polynomial-time randomised verifier in a polynomial number of rounds. The protocol is sound if, whenever an input does not belong to the language, the Verifier will reject the input with high probability—no matter what certificates are provided to the Verifier. That is, a "Prover" cannot fool the certification process.

Since every language in PSPACE has an interactive protocol, there are interactive protocols for UNSAT, QBF, *counting* QBF, safety verification of concurrent state machines, etc. Observe that the prover of a protocol may perform exponential time computations (which is unavoidable unless <sup>P</sup> = PSPACE), but the verifier only requires polynomial time in the original input.

If interactive protocols provide a foundation for small and efficiently verifiable certificates (at least for problems in PSPACE), why are they not in widespread practice? We believe the reason to be the following: for asymptotic complexity purposes, it suffices to use honest provers with best-case exponential complexity that naively enumerate all possibilities. Such provers are incompatible with automated reasoning tools, which use more sophisticated data structures and heuristics to scale to real-world examples. So we need to make *practical algorithms* for automated reasoning *efficiently certifying*. We call an algorithm *efficiently certifying* if, in addition to computing the output, it can execute the steps of an honest prover in an interactive protocol with only polynomial overhead over its running time.

In this paper, we show that algorithms using reduced ordered binary decision diagrams (henceforth called BDDs) [9] can be made efficiently certifying. We consider #CP, the problem of computing the number of satisfying assignments of a *circuit with partial evaluation (CP)*. Besides boolean nodes, a CP contains *partial evaluation* nodes π[x:=false] (resp., π[x:=true]) that take a boolean predicate as input, say <sup>ϕ</sup>, and output the result of setting <sup>x</sup> to false (resp., true) in <sup>ϕ</sup>. #CP generalises SAT, QBF, and *counting* SAT (#SAT), and has a natural algorithm using BDDs: Compute BDDs for each node of the circuit in topological order, and count the accepting paths of the final BDD.

The theoretical part of the paper proceeds in two steps. First, we present CPCertify, a complete and sound interactive protocol for #CP. CPCertify is similar to the SumCheck protocol [20]. It involves encoding boolean formulas as polynomials over a finite field. The prover is responsible for producing certain polynomials from the original circuit and evaluating them at points of the field chosen by the verifier. These polynomials are either multilinear (all exponents are at most 1) or quadratic (at most 2).

Second, we show that an honest prover in CPCertify can be implemented on top of a suitably extended BDD library. The run times of the certifying BDD algorithms are only a constant overhead over the computation time without certification—they depend linearly on the total number of nodes of the intermediate BDDs computed by the prover to solve the #CP instance. We use two key insights. The first is an encoding of multilinear polynomials as BDDs; we show that the intermediate BDDs represent all the multilinear polynomials a prover needs during the run of CPCertify. The second shows that the quadratic polynomials correspond to *intermediate steps* during the computation of the intermediate BDDs. We extend BDDs with additional "book-keeping" nodes that allow the prover to also compute the quadratic polynomials while solving the problem. So computing the polynomials required by CPCertify has *zero* additional cost; the only overhead is the cost of evaluating the polynomials at the field points chosen by the verifier.

We have implemented a certifying #CP solver based on our extended BDD library. Our experiments show that the solver is competitive with state-of-the-art non-certifying QBF solvers, and can outperform certifying QBF solvers based on BDDs. The number of bytes exchanged between the prover and the verifier are an order of magnitude smaller, and Verifier's run time several orders of magnitude smaller, than current encodings of QBF proofs, while bounding the error probability to below 10<sup>−</sup><sup>10</sup>. Thus, our results open the way for practically efficient, probabilistic certification of automated reasoning problems using interactive protocols.

Additional Related Work. Proof systems for SAT and QBF remain an active area of research—both in theoretical proof complexity and in practical tool development. Jussila, Sinz, and Biere [17,28] showed how to extract extended resolution proofs from BDD operations. This is the basis for proof-producing SAT and QBF solvers based on BDDs [6–8]. As in our work, the proof uses intermediate nodes produced in the construction of the BDD operations. We focus on interactive certification instead of extended resolution proofs, which can be exponentially larger than the input formula.

Recently, Luo et al. [21] consider the problem of providing *zero-knowledge* proofs of unsatisfiability, a motivation similar but not equal to ours. Their techniques require the verifier to work in time polynomial in the proof, which can be exponentially bigger than the input formula. In contrast, the verifier of CPCertify runs in polynomial time in the input. Since any language in PSPACE has a zero knowledge proof [5], our protocol can in principle be made zero knowledge. Whether that system scales in practice is left for future work.

Full Version. Detailed proofs can be found in the full version of the paper [11].

### 2 Preliminaries

The Class IP. An *interactive protocol* between a *Prover* and a *Verifier* consists of a sequence of interactions in which a Verifier asks questions to a Prover, receives responses to the questions, and must ultimately decide if a common input x belongs to a language. The computational power of the Prover is unbounded but the Verifier is a randomised, polynomial-time algorithm.

Formally, let P, V denote (deterministic) Turing machines.

We say that (r; <sup>m</sup>1, ..., m2<sup>k</sup>) is a <sup>k</sup>*-round interaction*, with r, m1, ..., m2<sup>k</sup> <sup>∈</sup> {0, <sup>1</sup>}<sup>∗</sup>, if <sup>m</sup>i+1 <sup>=</sup> <sup>V</sup> (r, m1, ..., m<sup>i</sup>) for even <sup>i</sup> and <sup>m</sup>i+1 <sup>=</sup> <sup>P</sup>(m1, ..., m<sup>i</sup>) for odd i. We think of r as an additional sequence of bits given to Verifier V that is chosen randomly. The *output* out(P, V )(x, r, k) is defined as <sup>m</sup>2k, where (r; <sup>m</sup>1, ..., m2<sup>k</sup>) is the unique <sup>k</sup>-round interaction with <sup>m</sup><sup>1</sup> <sup>=</sup> <sup>x</sup>.

A language L belongs to IP if there are V,P<sup>H</sup> and polynomials p1, p2, p3, s.t. <sup>V</sup> (r, x, m2, ..., m<sup>i</sup>) runs in time <sup>p</sup><sup>1</sup>(|x|) for all r, x, m2, ..., mi, and, for each <sup>x</sup> and an <sup>r</sup> ∈ {0, 1}<sup>p</sup>2(|x|) chosen uniformly at random:


Intuitively, in an interactive protocol, a computationally unbounded Prover interacts with a randomised polynomial-time Verifier for k rounds. In each round, Verifier sends probabilistic "challenges" to Prover, based on the input and the answers to prior challenges, and receives answers from Prover. At the end of k rounds, Verifier decides to accept or reject the input. The completeness property ensures that if the input belongs to the language L, then there is an "honest" Prover P<sup>H</sup> who can always convince Verifier that indeed x ∈ L. If the input does not belong to the language, then the soundness property ensures that Verifier rejects the input with high probability no matter how a (dishonest) Prover tries to convince them.

It is known that IP = PSPACE [20,27], that is, every language in PSPACE has a polynomial-round interactive protocol. The proof exhibits an interactive protocol for the language QBF of true quantified boolean formulae; in particular, the honest Prover is a polynomial space, exponential time algorithm that uses a truth table representation of the formula to implement the protocol.

Polynomials. Interactive protocols make extensive use of polynomials over some prime finite field F.

Let X be a finite set of variables. We use x, y, z, . . . for variables and p, q, . . . for polynomials. When we write a polynomial explicitly, we write it in brackets, e.g. [3xy <sup>−</sup> <sup>z</sup><sup>2</sup>]. We write **<sup>1</sup>** and **<sup>0</sup>** for the polynomials [1] and [0], respectively. We use the following operations on polynomials:


<sup>A</sup> *(partial) assignment* is a (partial) mapping <sup>σ</sup> : <sup>X</sup> <sup>→</sup> <sup>F</sup>. We write <sup>Π</sup><sup>σ</sup> <sup>p</sup> for π[x1:=σ(x1)]...π[x*k*:=σ(x*k*)] p, where x1, ..., x<sup>k</sup> are the variables for which σ is defined. Additionally, we call <sup>σ</sup> *binary* if <sup>σ</sup>(x) ∈ {0, 1} for each <sup>x</sup> <sup>∈</sup> <sup>X</sup>.

Binary and Multilinear Polynomials. A polynomial is *multilinear in* x if the degree of <sup>x</sup> in <sup>p</sup> is 0 or 1. A polynomial is *multilinear* if it is multilinear in all its variables. For example, [xy <sup>−</sup> <sup>y</sup><sup>2</sup>] is multilinear in <sup>x</sup> but not in <sup>y</sup>, and [3xy <sup>−</sup> <sup>2</sup>zy] is multilinear. A polynomial <sup>p</sup> is *binary* if <sup>Π</sup><sup>σ</sup> <sup>p</sup> ∈ {**0**, **<sup>1</sup>**} for every binary assignment σ. Two polynomials p, q are *binary equivalent*, denoted p ≡<sup>b</sup> q, if <sup>Π</sup><sup>σ</sup> <sup>p</sup> = Π<sup>σ</sup> <sup>q</sup> for every binary assignment <sup>σ</sup>. (Note that non-binary polynomials can be binary equivalent.)

## 3 Circuits with Partial Evaluation

We introduce circuits with partial evaluation (CP), a compact representation of quantified boolean formulae, and formulate #CP, the problem of counting the number of satisfying assignments of a CP. #CP generalises QBF, the satisfiability problem for quantified boolean formulas. Figure 1 shows an example of a CP. Informally, it is a directed acyclic graph whose nodes are labelled with variables, boolean operators, or *partial evaluation operators* π[x:=b]. Intuitively, π[x:=b]ϕ sets the variable x to the truth value b in the formula ϕ. In this way, each node of a circuit stands for a boolean function, and the complete circuit stands for the boolean function of the root. Figure 1 shows the formulae represented by each node.

Definition 1. *Let* X *denote a finite set of* variables *and* S ⊆ X*. A* circuit with partial evaluation and variables in S *(*S*-CP) has the form*


*The set of* free variables *of a* <sup>S</sup>*-CP* <sup>ϕ</sup> *is* free(ϕ) := <sup>S</sup>*. The* children *of a CP are inductively defined as follows:* true*,* false*, and* x *have no children; the children of* ϕ ∧ ψ *and* ϕ ∨ ψ *are* ϕ *and* ψ*; and the only child of* ¬ϕ *and* π[y:=b] ϕ *is* ϕ*. The set of* descendants *of* ϕ *is the smallest set* M *containing* ϕ *and all children of every element of* <sup>M</sup>*. The* size *of* <sup>ϕ</sup> *is* <sup>|</sup>ϕ<sup>|</sup> := <sup>|</sup>M|*.*

We represent a CP ϕ as a directed acyclic graph. The nodes of the graph are the descendants of ϕ. A CP ϕ encodes a boolean predicate Pϕ, which maps assignments <sup>σ</sup> : free(ϕ) → {false,true} to a truth value <sup>P</sup><sup>ϕ</sup>(σ) ∈ {false,true}. It does so in the obvious manner, e.g., <sup>P</sup><sup>x</sup>(σ) := <sup>σ</sup>(x), <sup>P</sup><sup>ϕ</sup>∧<sup>ψ</sup>(σ) := <sup>P</sup><sup>ϕ</sup>(σ) <sup>∧</sup> <sup>P</sup><sup>ψ</sup>(σ), etc. We use π[x:=b] as partial evaluation operator, so <sup>P</sup><sup>π</sup>[*x*:=*b*]<sup>ϕ</sup>(σ) := <sup>P</sup><sup>ϕ</sup>(σ∪{<sup>x</sup> <sup>→</sup> <sup>b</sup>}). Intuitively, <sup>π</sup>[x:=b] <sup>ϕ</sup> replaces each occurrence of x in ϕ by b. An assignment σ *satisfies* ϕ if <sup>P</sup><sup>ϕ</sup>(σ) = true. We define the macros

$$\begin{aligned} \forall\_x \varphi &:= \pi\_{[x:=0]} \,\varphi \wedge \pi\_{[x:=1]} \,\varphi\\ \exists\_x \varphi &:= \pi\_{[x:=0]} \,\varphi \vee \pi\_{[x:=1]} \,\varphi \end{aligned}$$

Figure 1 shows a CP for the quantified boolean formula <sup>∀</sup><sup>y</sup>(¬<sup>x</sup> <sup>∨</sup> (<sup>x</sup> <sup>∧</sup> <sup>y</sup>)).

We consider the following problem:

#CP Input CP <sup>ϕ</sup>. Output The number of satisfying assignments of ϕ.

Given a quantified boolean formula, we can use the macros for quantifiers to construct in linear time an equivalent CP, i.e., a CP with the same satisfying assignments. Similarly, #SAT instances can also be reduced to #CP.

Structure of the Rest of the Paper. In Sect. 4, we give an interactive protocol for #CP called CPCertify. In Sect. 5, we implement an honest Prover for CPCertify on top of an extended BDD-based algorithm for #CP. The prover runs in time polynomial in the size of the largest BDD for any of the subcircuits of the initial circuit. Together, these results yield our main result, Theorem 1, showing that any BDD-based algorithm can be modified to run an interactive protocol with small polynomial overhead. Finally, Sect. 6 presents empirical results.

Fig. 1. A CP (Sect. 3), the boolean functions represented by each node (in boxes), and the arithmetisation of the formulae (Sect. 4.1).

## 4 An Interactive Protocol for #CP

In this section we describe an interactive protocol for #CP, following the Sum-Check protocol of [20]. Section 4.1 introduces arithmetisation, a technique to transform #CP into an equivalent problem about polynomials. Section 4.2 shows how to transform #CP into an equivalent problem about evaluating polynomials *of low degree*. Finally, Sect. 4.3 presents an interactive protocol for this problem.

#### 4.1 Arithmetisation

We define a mapping [[·]] that assigns to each CP <sup>ϕ</sup> a polynomial [[ϕ]] over the variables free(ϕ), called the *arithmetisation* of <sup>ϕ</sup>:


Figure 1 also shows the polynomials corresponding to the nodes of the CP.

Let F be a fixed prime finite field. Given an arbitrary truth assignment <sup>σ</sup> : <sup>X</sup> → {true, false}, let <sup>σ</sup> : <sup>X</sup> <sup>→</sup> <sup>F</sup> be the binary assignment given by <sup>σ</sup>(x)=1 if <sup>σ</sup>(x) = true and <sup>σ</sup>(x)=0 if <sup>σ</sup>(x) = false, where <sup>0</sup> and <sup>1</sup> denote the additive and multiplicative identities in <sup>F</sup>. The mapping [[·]] is defined to satisfy the following property, whose proof is immediate:

Proposition 1. *Let* ϕ *be an* S*-CP encoding some boolean predicate* Pϕ*. Then* <sup>P</sup><sup>ϕ</sup>(σ)=Π<sup>σ</sup> [[ϕ]] *for every truth assignment* <sup>σ</sup> *to* <sup>S</sup>*.*

So, intuitively, the polynomial [[ϕ]] is a conservative extension of the predicate Pϕ: It returns the same values for all binary assignments. Accordingly, in the rest of the paper we abuse language and write σ instead of σ for the binary assignment corresponding to the truth assignment σ.

Observe that #CP can be reformulated as follows: given a CP ϕ, compute the number of binary assignments <sup>σ</sup> s.t. Π<sup>σ</sup>[[ϕ]] = **<sup>1</sup>**.

#### 4.2 Degree Reduction

Given a CP ϕ, its associated polynomial can have degree exponential in the height of ϕ. Since we are ultimately interested in evaluating polynomials over binary assignments, and since <sup>x</sup><sup>2</sup> <sup>=</sup> <sup>x</sup> for <sup>x</sup> ∈ {0, <sup>1</sup>}, we can convert polynomials to low degree without changing their behaviour on binary assignments.

For this, we use a *degree-reduction* operator δ<sup>x</sup> for every variable x. The operator <sup>δ</sup>x<sup>p</sup> reduces the exponent of all powers of <sup>x</sup> in <sup>p</sup> to <sup>1</sup>. For example, <sup>δ</sup><sup>x</sup>[x<sup>2</sup><sup>y</sup> + 3xy<sup>2</sup> <sup>−</sup> <sup>2</sup>x<sup>3</sup>y<sup>2</sup> + 4] = [xy + 3xy<sup>2</sup> <sup>−</sup> <sup>2</sup>xy<sup>2</sup> + 4]. Observe that <sup>δ</sup>x<sup>p</sup> <sup>≡</sup><sup>b</sup> <sup>p</sup>. Instead of working on the input CP directly, we first convert it into a *circuit with partial evaluation and degree reduction* by inserting degree-reduction operators after binary operations. This ensures all intermediate polynomials obtained by arithmetisation have low degree.

Definition 2. *A* circuit with partial evaluation and degree reduction over the set S of variables (S-CPD) *is defined in the same manner as an* S*-CP, extended as follows:*


*For an* <sup>S</sup>*-CPD* <sup>ϕ</sup> *we define* free(ϕ)*,* <sup>|</sup>ϕ|*, children, descendants, and the graphical representation as for* S*-CPs.*

We convert a CP <sup>ϕ</sup> into a CPD conv(ϕ) by adding a degree-reduction operator for each free variable before any binary operation.

Definition 3. *Given a CP* <sup>ϕ</sup> *with* free(ϕ) = {x1, ..., xk}*, its associated CPD* conv(ϕ) *is inductively defined as follows:*


Figure <sup>2</sup> shows the CPD conv(ϕ) for the CP ϕ of Fig. 1, together with the polynomials corresponding to each node.

We collect some basic properties of CPDs:

Lemma 1. *Let* ϕ *be a CP.*


CPDs have another useful property. Recall that given a CP ϕ we are interested in its number of satisfying assignments. The next lemma shows that this number can be computed by evaluating the polynomial [[conv(ϕ)]] on *a single input*.

Fig. 2. CPD and polynomials for the CP of Fig. 1.

Lemma 2. *A CP* <sup>ϕ</sup> *with* <sup>n</sup> *free variables has* m < <sup>|</sup>F<sup>|</sup> *satisfying assignments iff* <sup>Π</sup><sup>σ</sup>[[conv(ϕ)]] = <sup>m</sup> · 2<sup>−</sup><sup>n</sup>*, where* <sup>σ</sup> *is the assignment satisfying* <sup>σ</sup>(x) := 2<sup>−</sup><sup>1</sup> *in the field* F *for every variable* x*.* 1

<sup>1</sup> Any prime field <sup>F</sup> with <sup>|</sup>F<sup>|</sup> <sup>&</sup>gt; <sup>2</sup> has an element <sup>c</sup> such that <sup>2</sup><sup>c</sup> = 1.

#### 4.3 CPCertify: An Interactive Protocol for #CP

We describe an interactive protocol, called CPCertify, for a CP ϕ with n free variables. Let X denote the variables used in ϕ. Prover and Verifier fix a finite field with at least <sup>m</sup> + 1 elements, where <sup>m</sup> is an upper bound on the number of assignments (e.g. <sup>m</sup> = 2n). Prover tries to convince the Verifier that Πσ[[conv(ϕ)]] = <sup>K</sup> for some <sup>K</sup> <sup>∈</sup> <sup>F</sup>.

In the protocol, Verifier challenges Prover to compute polynomials of the form Π<sup>σ</sup>([[ψ]]), where <sup>ψ</sup> is a node of the CPD conv(ϕ) and <sup>σ</sup> : free(ψ) <sup>→</sup> <sup>F</sup> is a (non-binary!) assignment; we call the expression Π<sup>σ</sup>[[conv(ψ)]] <sup>a</sup> *challenge*. Observe that all assignments are chosen by Verifier. Prover answers with some <sup>k</sup> <sup>∈</sup> <sup>F</sup>. We call the expression Π<sup>σ</sup>[[conv(ψ)]] = <sup>k</sup> <sup>a</sup> *claim*, or the *answer* to the challenge Π<sup>σ</sup>[[conv(ψ)]].

CPCertify consists of an initialisation and a number of rounds, one for each descendant of conv(ϕ). Rounds are executed in topological order, starting at the root, i.e. at conv(ϕ) itself. The structure of a round for a node <sup>ψ</sup> of conv(ϕ) depends on whether ψ is an internal node (including the root), or a leaf.

At each point, Verifier keeps track of a set C of claims that must be checked.

Initialisation. Verifier sends Prover the challenge Π<sup>σ</sup>[[conv(ϕ)]], where <sup>σ</sup>(x) := 2<sup>−</sup><sup>1</sup> for every <sup>x</sup> <sup>∈</sup> free(ϕ). Prover returns the claim Π<sup>σ</sup>[[conv(ϕ)]] = <sup>K</sup> for some <sup>K</sup> <sup>∈</sup> <sup>F</sup>. (By Lemma 2, this amounts to claiming that <sup>ϕ</sup> has <sup>K</sup> · 2<sup>n</sup> satisfying assignments.) Verifier initialises <sup>C</sup> := {Π<sup>σ</sup>[[conv(ϕ)]] = <sup>K</sup>}.

#### Round for an Internal Node. A round for an internal node ψ runs as follows:


Observe that, since a node ψ can be a child of several nodes, Verifier may collect multiple claims for ψ, one for each parent node.

Round for a Leaf. If <sup>ψ</sup> is a leaf, then <sup>ψ</sup> = <sup>x</sup> for a variable <sup>x</sup>, or <sup>ψ</sup> ∈ {true, false}. Verifier removes all claims {Π<sup>σ</sup>*<sup>i</sup>* [[ψ]] = <sup>k</sup>i}<sup>m</sup> <sup>i</sup>=1 from <sup>C</sup>, computes the values <sup>c</sup><sup>i</sup> := <sup>Π</sup><sup>σ</sup>*<sup>i</sup>* [[ψ]], and rejects if <sup>k</sup><sup>i</sup> <sup>=</sup> <sup>c</sup><sup>i</sup> for any <sup>i</sup>.

<sup>2</sup> The precise bound on the failure probability will be given in Proposition 2.

Observe that if all claims made by Prover about leaves are true, then very likely Prover's initial claim is also true.

Description of Step (b). Let {Πσ*<sup>i</sup>* [[ψ]] = <sup>k</sup>i}<sup>m</sup> <sup>i</sup>=1 be the claims in C relating to node ψ. Verifier and Prover conduct step (b) as follows:

	- (b.1.1) For every <sup>i</sup> ∈ {1, ..., m}, let <sup>σ</sup> <sup>i</sup> denote the partial assignment which is undefined on x and otherwise matches σi. Verifier sends the challenges {Π<sup>σ</sup>- *i* [[ψ]]}<sup>m</sup> <sup>i</sup>=1 to Prover. Prover answers with claims {Π<sup>σ</sup>- *i* [[ψ]] = <sup>p</sup>i}<sup>m</sup> <sup>i</sup>=1. Note that p1,...,p<sup>m</sup> are univariate polynomials with free variable x.
	- (b.1.2) Verifier checks whether <sup>k</sup><sup>i</sup> <sup>=</sup> <sup>π</sup>[x:=σ*i*(x)] <sup>p</sup><sup>i</sup> holds for each <sup>i</sup>. If not, Verifier rejects. Otherwise, Verifier picks <sup>r</sup> <sup>∈</sup> <sup>F</sup> uniformly at random and updates <sup>σ</sup><sup>i</sup>(x) := <sup>r</sup> and <sup>k</sup><sup>i</sup> := <sup>π</sup>[x:=r]p<sup>i</sup> for every <sup>i</sup> ∈ {1, ..., m}.

*Example 1.* Consider the case in which <sup>X</sup> = {x}, and Prover has made two claims, <sup>Π</sup><sup>σ</sup><sup>1</sup> [[ψ]] = <sup>k</sup><sup>1</sup> and <sup>Π</sup><sup>σ</sup><sup>2</sup> [[ψ]] = <sup>k</sup><sup>2</sup> with <sup>σ</sup><sup>1</sup>(x)=1 and <sup>σ</sup><sup>2</sup>(x)=2. In step (b.1.1) we have σ <sup>1</sup> <sup>=</sup> <sup>σ</sup> <sup>2</sup> (both are the empty assignment), and so Verifier sends the challenge [[ψ]] to Prover twice, who answers with claims [[ψ]] = <sup>p</sup><sup>1</sup> and [[ψ]] = <sup>p</sup>2. In step (b.1.2) Verifier checks that <sup>p</sup><sup>1</sup>(1) = <sup>k</sup><sup>1</sup> and <sup>p</sup><sup>2</sup>(2) = <sup>k</sup><sup>2</sup> hold, picks a random number <sup>r</sup>, and updates <sup>σ</sup><sup>1</sup>(x) := <sup>σ</sup><sup>2</sup>(x) := <sup>r</sup> and <sup>k</sup><sup>1</sup> := <sup>p</sup><sup>1</sup>(r), k<sup>2</sup> := <sup>p</sup><sup>2</sup>(r). Now the condition of the while loop fails, so Verifier moves to (b.2) and checks <sup>k</sup><sup>1</sup> <sup>=</sup> <sup>k</sup>2.

Description of Step (c). Let <sup>Π</sup><sup>σ</sup>[[ψ]] = <sup>k</sup> be the claim computed by Verifier in step (b). Verifier removes this claim from C and replaces it by claims about the children of ψ, depending on the structure of ψ:


This concludes the description of the interactive protocol. We now show CPCertify is complete and sound.

Proposition 2 (CPCertify is complete and sound). *Let* ϕ *be a CP with* <sup>n</sup> *free variables. Let* Πσ[[conv(ϕ)]] = <sup>K</sup> *be the claim initially sent by Prover to Verifier. If the claim is true, then Prover has a strategy to make Verifier accept. If not, for every Prover, Verifier accepts with probability at most* 4n|ϕ|/|F|*.*

If the original claim is correct, Prover can answer every challenge truthfully and all claims pass all of Verifier's checks. So Verifier accepts. If the claim is not correct, we proceed round by round. We bound the probability that the Verifier is tricked in a single step to at most 2/|F<sup>|</sup> using the Schwartz-Zippel Lemma. We then bound the number of such steps to 2n|ϕ<sup>|</sup> and use a union bound.

#### 5 A BDD-Based Prover

We assume familiarity with *reduced ordered binary decision diagrams* (BDDs) [9]. We use BDDs over <sup>X</sup> <sup>=</sup> {x1,...,xn}. We fix the variable order <sup>x</sup><sup>1</sup> < x<sup>2</sup> <sup>&</sup>lt; ...<xn, i.e. the root node would decide based on the value of xn.

Definition 4. *BDDs are defined inductively as follows:*


*The level of a BDD* <sup>w</sup> *is denoted* (w)*. The set of* descendants *of* <sup>w</sup> *is the smallest set* S *with* w ∈ S *and* u, v ∈ S *for all* x, u, v ∈ S*. The* size |w| *of* w *is the number of its descendants.*

*The* arithmetisation *of a BDD* <sup>w</sup> *is the polynomial* [[w]] *defined as follows:* [[ true]] := **<sup>1</sup>***,* [[ false]] := **<sup>0</sup>** *and* [[ x, u, v]] := [1 <sup>−</sup> <sup>x</sup>] · [[u]] + [x] · [[v]]*.*

Figure <sup>3</sup> shows a BDD for the boolean function <sup>ϕ</sup>(x, y, z)=(<sup>x</sup> <sup>∧</sup> <sup>y</sup> ∧ ¬z) <sup>∨</sup> (¬<sup>x</sup> <sup>∧</sup> <sup>y</sup> <sup>∧</sup> <sup>z</sup>) <sup>∨</sup> (<sup>x</sup> ∧ ¬<sup>y</sup> <sup>∧</sup> <sup>z</sup>) and the arithmetisation of each node.

BDDSolver: A BDD-based Algorithm for #CP. An instance ϕ of #CP can be solved using BDDs. Starting at the leaves of ϕ, we iteratively compute a BDD for each node ψ of the circuit encoding the boolean predicate Pψ. At the end of this procedure we obtain a BDD for Pϕ. The number of satisfying assignments of ψ is the number of accepting paths of the BDD, which can be computed in linear time in the size of the BDD.

For a node <sup>ψ</sup> <sup>=</sup> <sup>ψ</sup><sup>1</sup> ψ2, given BDDs representing the predicates P<sup>ϕ</sup><sup>1</sup> and <sup>P</sup><sup>ϕ</sup><sup>2</sup> , we compute a BDD for the predicate <sup>P</sup><sup>ϕ</sup> := <sup>P</sup><sup>ϕ</sup><sup>1</sup> - <sup>P</sup><sup>ϕ</sup><sup>2</sup> , using the Apply- operator on BDDs. We name this algorithm for solving #CP "BDDSolver."

From BDDSolver to CPCertify. Our goal is to modify BDDSolver to play the role of an honest Prover in CPCertify with minimal overhead. In CPCertify, Prover repeatedly performs the same task: evaluate polynomials of the form Πσ[[ψ]], where <sup>ψ</sup> is a descendant of the CPD conv(ϕ), and <sup>σ</sup> assigns values to all free variables of ψ except possibly one. Therefore, the polynomials have at most one free variable and, as we have seen, degree at most 2.

Before defining the concepts precisely, we give a brief overview of this section.

– First (Proposition 3), we show that BDDs correspond to binary multilinear polynomials. In particular, BDDs allow for efficient evaluation of the polynomial. As

Fig. 3. A BDD and its arithmetisation. For x, u, v, we denote the link from x to v with a solid edge and x to u with a dotted edge. We omit links to false.

argued in Lemma 1(a), for every descendant <sup>ψ</sup> of <sup>ϕ</sup>, the CPD conv(ψ) (which is a descendant of conv(ϕ)) evaluates to a multilinear polynomial. In particular, Prover can use standard BDD algorithms to calculate the corresponding polynomials Π<sup>σ</sup>[[ψ]] for all descendants <sup>ψ</sup> of conv(ϕ) that are neither binary operators nor degree reductions.

– Second (the rest of the section), we prove a surprising connection: the intermediate results obtained while executing the BDD algorithms (with slight adaptations) correspond precisely to the remaining descendants of conv(ϕ).

The following proposition proves that BDDs represent exactly the binary multilinear polynomials.

Proposition 3. *(a) For a BDD* <sup>w</sup>*,* [[w]] *is a binary multilinear polynomial. (b) For a binary multilinear polynomial* <sup>p</sup> *there is a unique BDD* <sup>w</sup> *s.t.* <sup>p</sup> = [[w]]*.*

#### 5.1 Extended BDDs

During the execution of CPCertify for a given CPD conv(ϕ), Prover sends to Verifier claims of the form Π<sup>σ</sup>[[ψ]], where <sup>ψ</sup> is a descendant of conv(ϕ), and <sup>σ</sup> : <sup>X</sup> <sup>→</sup> <sup>F</sup> is a partial assignment. While all polynomials computed by CPCertify are binary, not all are multilinear: some polynomials have degree 2. For these polynomials, we introduce *extended BDDs* (eBDDs) and give eBDD-based algorithms for the following two tasks:


Computing eBDDs for CPDs: Informal Introduction. Consider a CP ϕ and its associated CPD conv(ϕ). Each node of <sup>ϕ</sup> induces a chain of nodes in conv(ϕ), consisting of degree-reduction nodes δx<sup>1</sup> ,...,δx*<sup>n</sup>* , followed by the node itself (see Fig. 4). Given BDDs u and v for the children of the node in the CP, we can compute a BDD for the node itself using a well-known BDD algorithm Apply-(u, v) parametric in the boolean operation labelling the node [9]. Our goal is to transform Apply into an algorithm that computes eBDDs *for all nodes in the chain*, i.e. eBDDs for all the polynomials p0, p1,...,p<sup>n</sup> of Fig. 4.

Fig. 4. A node of a CP (-) gets a chain of degree reduction nodes in the associated CPD.

Roughly speaking, Apply-(u, v) recursively computes BDDs <sup>w</sup><sup>0</sup> = Apply-(u0, v<sup>0</sup>)

and <sup>w</sup><sup>1</sup> = Apply-(u1, v<sup>1</sup>), where <sup>u</sup><sup>b</sup> and <sup>v</sup><sup>b</sup> are the <sup>b</sup>-children of <sup>u</sup> and <sup>v</sup>, and then returns the BDD with <sup>w</sup><sup>0</sup> and <sup>w</sup><sup>1</sup> as <sup>0</sup>- and <sup>1</sup>-child, respectively.<sup>3</sup>

Most importantly, we modify Apply to run in breadth-first order. Figure 5 shows a graphical representation of a run of Apply<sup>∨</sup>(u, v), where <sup>u</sup> and <sup>v</sup> are the two BDD nodes labelled by <sup>x</sup>. Square nodes represent pending calls to Apply-. Initially there is only one square call Apply<sup>∨</sup>(u, v) (Fig. 5, top left). Apply<sup>∨</sup> calls itself recursively for u0, v<sup>0</sup> and u1, v<sup>1</sup> (Fig. 5, top right). Each of the two calls splits again into two; however, the first three are identical (Fig. 5, bottom left), and so reduce to two. These two calls can now be resolved directly; they return nodes true and false, respectively. At this point, the children of Apply-(u, v) become y,true,true = true, and y,true, false, which exists already as well (Fig. 5, bottom right).

We look at the diagrams of Fig. 5 not as a visualisation aid, but as graphs with two kinds of nodes: standard BDD nodes, represented as circles, and *product* nodes, represented as squares. We call them *extended BDDs*. Each node of an extended BDD is assigned a polynomial in the expected way: the polynomial [[u]] of a standard BDD node <sup>u</sup> with variable <sup>x</sup> is <sup>x</sup> · [[u<sup>1</sup>]] + (1 <sup>−</sup> <sup>x</sup>) · [[u<sup>0</sup>]], the polynomial [[v]] of a square <sup>∧</sup>-node <sup>v</sup> is [[v<sup>0</sup>]] · [[v<sup>1</sup>]], etc. In this way we assign to each eBDD a polynomial. In particular, we obtain the intermediate polynomials p0, p1, p2, p<sup>3</sup> of the figure, one for each level in the recursion. In the rest of the section we show that these are *precisely* the polynomials p0, p1,...,p<sup>n</sup> of Fig. 4.

Thus, in order to compute eBDDs for all nodes of a CPD conv(ϕ), it suffices to compute BDDs for all nodes of the CP ϕ. Since we need to do this anyway to solve #CP, the polynomial certification does not incur any overhead.

<sup>3</sup> In fact, this is only true when u and v are nodes at the same level and Apply-(u0, v0) = Apply-(u1, v1), but at this point we only want to convey some intuition.

Fig. 5. Run of Apply∨(u, v), but with recursive calls evaluated in breadth-first order. All missing edges go to node false.

Extended BDDs. As for BDDs, we define eBDDs over <sup>X</sup> = {x1,...,xn} with the variable order x<sup>1</sup> < x<sup>2</sup> < ... < xn.

Definition 5. *Let be a binary boolean operator. The set of eBDDs (for* -*) is inductively defined as follows:*


*The set of* descendants *of an eBDD* w *is the smallest set* S *with* w ∈ S *and* u, v ∈ S *for all* u v, x, u, v ∈ S *The* size *of* w *is its number of descendants. For* u, v ∈ { true, false} *we identify* u v *with* true *or* false *according to the result of* -*, e.g.* true∨ false = true*, as* true <sup>∨</sup> false = true*. The* arithmetisation *of an eBDD for a boolean operator* - ∈ {∧,∨} *is defined as for BDDs, with the extensions* [[ <sup>u</sup> <sup>∧</sup> <sup>v</sup>]] = [[u]]·[[v]] *and* [[ <sup>u</sup> <sup>∨</sup> <sup>v</sup>]] = [[u]]+[[v]]−[[u]]·[[v]]*.*

*Example 2.* The diagrams in Fig. 5 are eBDDs for - := <sup>∨</sup>. Nodes of the form x, u, v and u ∨ v are represented as circles and squares, respectively. Consider the top-left diagram. Abbreviating <sup>x</sup> <sup>⊕</sup> <sup>y</sup> := (<sup>x</sup> ∧ ¬y) <sup>∨</sup> (¬<sup>x</sup> <sup>∧</sup> <sup>y</sup>) we get [[Apply<sup>∨</sup>(u, v)]] = [[(<sup>x</sup> <sup>⊕</sup> <sup>y</sup>) <sup>∧</sup> (<sup>x</sup> <sup>∧</sup> <sup>y</sup>)]] = [[<sup>x</sup> <sup>⊕</sup> <sup>y</sup>]] · [[<sup>x</sup> <sup>∧</sup> <sup>y</sup>]] = (x(1 <sup>−</sup> <sup>y</sup>) + (1 <sup>−</sup> <sup>x</sup>) · <sup>y</sup> <sup>−</sup> xy(1 <sup>−</sup> <sup>x</sup>)(1 <sup>−</sup> <sup>y</sup>)) · xy, which is the polynomial <sup>p</sup><sup>0</sup> shown in the figure.

Table 1. On the left: Algorithm computing eBDDs for the sequence [[w]], <sup>δ</sup><sup>x</sup>*<sup>n</sup>* [[w]], δ<sup>x</sup>*n*−<sup>1</sup> δ<sup>x</sup>*<sup>n</sup>* [[w]], ..., δ<sup>x</sup><sup>1</sup> ··· δ<sup>x</sup>*<sup>n</sup>* [[w]] of polynomials. On the right: Recursive algorithm to evaluate the polynomial represented by an eBDD at a given partial assignment. P(w) is a mapping used to memoize the polynomials returned by recursive calls.

```
ComputeEBDD(w)
Input: eBDD w
Output: sequence w0, ..., wn of eBDDs
w0 := w; output w0
for i = 0, ··· , (w) − 1 do
 wi+1 := wi
 for every node -
                  u -
                     v of wi
     at level n − i do
   for b ∈ {0, 1} do
     ub := π[xn−i:=b] u
     vb := π[xn−i:=b] v
     tb := -
            ub -
                vb
   wi+1 := wi+1 [ -
                   u -
                      v / -
                            xn−i, t0, t1 ]
 output wi+1
```
EvaluateEBDD(w, σ) =: Eσ(w) Input: eBDD <sup>w</sup>; assignment <sup>σ</sup> : <sup>X</sup> <sup>→</sup> <sup>F</sup> Output: <sup>Π</sup>σ[[w]] if <sup>P</sup>(w) is defined return <sup>P</sup>(w) if <sup>w</sup> ∈ {true , false} return [[w]] if <sup>w</sup> <sup>=</sup> u ∧ v P(w) := Eσ(u) · Eσ(v) if <sup>w</sup> <sup>=</sup> u ∨ v P(w) := Eσ(u) + Eσ(v) − Eσ(u)Eσ(v) if <sup>w</sup> <sup>=</sup> x, u, v and σ(x) undefined P(w) := [1 − x] · Eσ(u)+[x] · Eσ(v) if <sup>w</sup> <sup>=</sup> x, u, v and <sup>σ</sup>(x) = <sup>s</sup> <sup>∈</sup> <sup>F</sup> P(w) := [1 − s] · Eσ(u)+[s] · Eσ(v) return P(w)

Computing eBDDs for CPDs. Given a node of a CP corresponding to a binary operator -, Prover has to compute polynomials p0, δ<sup>x</sup><sup>1</sup> p0, ..., δ<sup>x</sup>*<sup>n</sup>* ...δ<sup>x</sup><sup>1</sup> p<sup>0</sup> corresponding to the nodes of the CPD shown on the right. We show that Prover can compute these polynomials by representing them as eBDDs. Table 1 describes an algorithm that gets as input an eBDD w of level n, and outputs a sequence <sup>w</sup>0, w1, ..., wn+1 of eBDDs such that <sup>w</sup><sup>0</sup> <sup>=</sup> <sup>w</sup>; [[wi+1]] = <sup>δ</sup><sup>x</sup>*n*−*<sup>i</sup>* [[w<sup>i</sup>]] for every <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> (w) <sup>−</sup> <sup>1</sup>; and <sup>w</sup>n+1 is a BDD. Interpreted as sequence of eBDDs, Fig. 5 shows a run of this algorithm.

*Notation.* Given an eBDD <sup>w</sup> and eBDDs u, v such that (u) <sup>≥</sup> (v), we let <sup>w</sup>[u/v] denote the result of replacing <sup>u</sup> by <sup>v</sup> in <sup>w</sup>. For an eBDD <sup>w</sup> = <sup>x</sup>i, w0, w1 and <sup>b</sup> ∈ {0, <sup>1</sup>} we define <sup>π</sup>[x*i*:=b]<sup>w</sup> := <sup>w</sup>b, and for j>i we set <sup>π</sup>[x*<sup>j</sup>* :=b]<sup>w</sup> := <sup>w</sup>. (Note that [[π[x*<sup>j</sup>* :=b]w]] = <sup>π</sup>[x*<sup>j</sup>* :=b][[w]] holds for any <sup>j</sup> where it is defined.)

Proposition 4. *Let* <sup>ψ</sup>1, ψ<sup>2</sup> *denote CPs and* <sup>u</sup>1, u<sup>2</sup> *BDDs with* [[u<sup>i</sup>]] = [[ψ<sup>i</sup>]]*,* <sup>i</sup> <sup>∈</sup> {1, <sup>2</sup>}*. Let* <sup>w</sup> := <sup>u</sup><sup>1</sup> <sup>u</sup>2 *denote an eBDD. Then* ComputeEBDD(w) *satisfies* [[w<sup>0</sup>]] = [[ψ<sup>1</sup> <sup>ψ</sup><sup>2</sup>]] *and* [[wi+1]] = <sup>δ</sup><sup>x</sup>*n*−*<sup>i</sup>* [[w<sup>i</sup>]] *for every* <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> <sup>−</sup> <sup>1</sup>*; moreover,* <sup>w</sup><sup>n</sup> *is a BDD with* <sup>w</sup><sup>n</sup> = Apply-(u1, u<sup>2</sup>)*. Finally, the algorithm runs in time* <sup>O</sup>(T)*, where* <sup>T</sup> ∈ O(|u1|·|u2|) *is the time taken by* Apply-(u1, u<sup>2</sup>)*.*

Evaluating Polynomials Represented as eBDDs. Recall that Prover must evaluate expressions of the form <sup>Π</sup><sup>σ</sup>[[ψ]] for some CPD <sup>ψ</sup>, where <sup>σ</sup> assigns values to all variables of ψ except for possibly one. We give an algorithm to evaluate arbitrary expressions Π<sup>σ</sup>[[w]], where <sup>w</sup> is an eBDD, and show that if there is at most one free variable then the algorithm takes linear time in the size of ψ. The algorithm is shown on the right of Table 1. It has the standard structure of BDD procedures: It recurs on the structure of the eBDD, memoizing the result of recursive calls so that the algorithm is called at most once with a given input. Proposition 5. *Let* <sup>w</sup> *denote an eBDD,* <sup>σ</sup> : <sup>X</sup> <sup>→</sup> <sup>F</sup> *a partial assignment, and* k *the number of variables assigned by* σ*. Then* EvaluateEBDD *evaluates the polynomial* Πσ[[w]] *in time* <sup>O</sup>- poly(2n−k) · |w<sup>|</sup> *.*

#### 5.2 Efficient Certification

In the CPCertify algorithm, Prover must (a) compute polynomials for all nodes of the CPD, and (b) evaluate them on assignments chosen by Verifier. In the last section we have seen that ComputeEBDD (for binary operations of the CP), combined with standard BDD algorithms (for all other operations), yields eBDDs representing all these polynomials—at no additional overhead, compared to a BDD-based implementation. This covers part (a). Regarding (b), recall that all polynomials computed in (a) have at most one variable. Therefore, using EvaluateEBDD we can evaluate a polynomial in linear time in the size of the eBDD representing it.

The Verifier CPCertify is implemented in a straightforward manner. As the algorithm runs in polynomial size w.r.t. the CP (and not the computed BDDs, which may be exponentially larger), incurring overhead is less of a concern.

Theorem 1 (Main Result). *If* BDDSolver *solves an instance* ϕ *of* #CP *with* n *variables in time* T*, with* T >n|ϕ|*, then*


As presented above, EvaluateEBDD incurs a factor-of-n overhead, as every node of the CPD must be evaluated. In our implementation, we use a caching strategy to reduce the complexity of Theorem 1(b) to <sup>O</sup>(T).

Note that the bounds above assume a uniform cost model. In particular, operations on BDD nodes and finite field arithmetic are assumed to be <sup>O</sup>(1). This is a reasonable assumption, as for a constant failure probability log <sup>|</sup>F| ≈ log <sup>n</sup>. Hence the finite field remains small. (It is possible to verify the number of assignments even if it exceeds <sup>|</sup>F|, see below.)

#### 5.3 Implementation Concerns

We list a number of points that are not described in detail in this paper, but need to be considered for an efficient implementation.

Finite Field Arithmetic. It is not necessary to use large finite fields. In particular, one can avoid the overhead of arbitrarily sized integers. For our implementation we fix the finite field <sup>F</sup> := <sup>Z</sup>p, with <sup>p</sup> = 2<sup>61</sup> <sup>−</sup> 1 (the largest Mersenne prime to fit in 64 bits).

Incremental eBDD Representation. Algorithm ComputeEBDD computes a sequence of eBDDs. These must not be stored explicitly, otherwise one incurs a space-overhead. Instead, we only store the last eBDD as well as the differences between each subsequent element of the sequence. To evaluate the eBDDs, we then revert to a previous state by applying the differences appropriately.

Evaluation Order. It simplifies the implementation if Prover only needs to evaluate nodes of the CPD in some (fixed) topological order. CPCertify can easily be adapted to guarantee this, by picking the next node appropriately in each iteration, and by evaluating only one child of a binary operator ψ<sup>1</sup> ψ2. The value of the other child can then be derived by solving a linear equation.

Efficient Evaluation. As stated in Theorem 1, using EvaluateEBDD Prover needs <sup>Ω</sup>(nT) time to respond to Verifier's challenges. In our implementation we instead use a caching strategy that reduces this time to <sup>O</sup>(T). Essentially, we exploit the special structure of conv(ϕ): Verifier sends a sequence of challenges <sup>Π</sup><sup>σ</sup><sup>0</sup> <sup>δ</sup><sup>x</sup><sup>1</sup> ...δ<sup>x</sup>*<sup>n</sup>* w, <sup>Π</sup><sup>σ</sup><sup>1</sup> <sup>δ</sup><sup>x</sup><sup>2</sup> ...δ<sup>x</sup>*<sup>n</sup>* w, ..., <sup>Π</sup><sup>σ</sup>*<sup>n</sup>* <sup>w</sup>, where assignments <sup>σ</sup><sup>i</sup> and <sup>σ</sup>i+1 differ only in variables x<sup>i</sup> and xi+1. The corresponding eBDDs likewise change only at levels <sup>i</sup> and <sup>i</sup> + 1. We cache the linear coefficients of eBDD nodes that contribute to the arithmetisation of the root top-down, and the arithmetised values of nodes bottom up. As a result, only levels i, i + 1 need to be updated.

Large Numbers of Assignments. If the number of satisfying assignments of a CP exceeds <sup>|</sup>F|, Verifier would not be able to verify the count accurately. Instead of choosing <sup>|</sup>F| ≥ 2<sup>n</sup>, which incurs a significant overhead, Verifier can query the precise number of assignments, and then choose <sup>|</sup>F<sup>|</sup> randomly. This introduces another possibility of failure, but (roughly speaking) it suffices to double log <sup>|</sup>F<sup>|</sup> for the additional failure probability to match the existing one. Our implementation does not currently support this technique.

## 6 Evaluation

We have implemented an eBDD library, blic (BDD Library with Interactive Certification)<sup>4</sup>, that is a stand-in replacement for BDDs but additionally performs the role of Prover in the CPCertify protocol. We have also implemented a client that executes the protocol as Verifier. The eBDD library is about 900 lines of C++ code and the CPCertify protocol is about 400 lines. We have built a prototype certifying QBF solver in blic, totalling about 2600 lines of code. We aim to answer the following questions in our evaluation:


<sup>4</sup> https://gitlab.lrz.de/i7/blic.

Fig. 6. (a) Time taken on instances (dashed lines are <sup>y</sup> = 100<sup>x</sup> and <sup>y</sup> = 0.01x), (b) Cost of generating a certificate over computing the solution, (c) Time to verify the certificate, (d) Size of certificates

RQ1: Performance of blic. We compare blic with CAQE, DepQBF, and PGB-DDQ, three state-of-the-art QBF solvers. CAQE [10,29] does not provide any certificates in its most recent version. DepQBF [12,19] is a certifying QBF solver. PGBDDQ [7,25] is an independent implementation of a BDD-based QBF solver. Both DepQBF and PGBDDQ provide specialised checkers for their certificates, though PGBDDQ can also proofs in standard QRAT format. Note that PGBDDQ is written in Python and generates proofs in an ASCII-based format, incurring overhead compared to the other tools.

We take 172 QBF instances (all unsatisfiable) from the *Crafted Instances* track of the QBF Evaluation 2022.<sup>5</sup> The *Prenex CNF* track of the QBF competition is not evaluated here. It features instances with a large number of variables. BDD-based solvers perform poorly under these circumstances without additional optimisations. Our overall goal is not to propose a new approach for

<sup>5</sup> CAQE and DepQBF were the winner and runner-up in this category. The configuration we used differs from the competition, as described in the full version of the paper [11].

Table 2. Comparison of certificate generation, bytes exchanged between prover and verifier, and time taken to verify the certificate on a set of QBF benchmarks from [7]. "Solve time" is time taken to solve the instance and to generate a certificate (seconds), "Certificate" is the size of proof encoding for PGBDDQ, and bytes exchanged by CPCertify for blic, and "Verifier time" is time to verify the certificate (Verifier's run time for blic and time taken by qchecker).


solving QBF, but rather to certify a BDD-based approach, so we wanted to focus on cases where the existing BDD-based approaches are practical.

We ran each benchmark with a 10 min timeout; all tools other than CAQE were run with certificate production. All times were obtained on a machine with an Intel Xeon E7-8857 CPU and 1.58 TiB RAM<sup>6</sup> running Linux. See the full version of the paper [11] for a detailed description. blic solved 96 out of 172 benchmarks, CAQE solved 98, DepQBF solved 87, and PGBDDQ solved 91. Figure 6(a) shows the run times of blic compared to the other tools. The plot indicates that blic is competitive on these instances, with a few cases, mostly from the Lonsing family of benchmarks, where blic is slower than DepQBF by an order of magnitude. Figure 6(b) shows the overhead of certification: for each benchmark (that finishes within a 10min timeout), we plot the ratio of the time to compute the answer to the time it takes to run Prover in CPCertify. The dotted regression line shows CPCertify has a 2.8× overhead over computing BDDs. For this set of examples, the error probability never exceeds 10<sup>−</sup>8.<sup>9</sup> (10<sup>−</sup>11.<sup>6</sup> when Lonsing examples are excluded); running the verifier k times reduces it to 10<sup>−</sup>8.9<sup>k</sup>.

RQ2: Communication Cost of Certification and Verifier Time. We explore RQ2 by comparing the number of bytes exchanged between Prover and Verifier and the time needed for Verifier to execute CPCertify with the number of bytes in an QBF proof and the time required to verify the proof produced by DepQBF and PGBDDQ, for which we use QRPcheck [24,26] and qchecker [7,25], respectively. Note that the latter is written in Python.

<sup>6</sup> blic uses at most 60 GiB on the shown benchmarks, 5 GiB when excluding timeouts.

We show that the overhead of certification is low. Figure 6(c) shows the run time of Verifier—this is generally negligible for blic, except for the Lonsing and KBKF families, which have a large number of variables, but very small BDDs. Figure 6(d) shows the total number of bytes exchanged between Prover and Verifier in blic against the size of the proofs generated by PGBDDQ and DepQBF. For large instances, the number of bytes exchanged in blic is significantly smaller than the size of the proofs. The exception are again the Lonsing and KBKNF families of instances. For both plots, the dotted line results from a log-linear regression.

In addition to the Crafted Instances, we compare against PGBDDQ on a challenging family of benchmarks used in the PGBDDQ paper (matching the parameters of [7, Table 3]); these are QBF encodings of a linear domino placing game.<sup>7</sup> Our results are summarised in Table 2. The upper bound on Verifier error is 10<sup>−</sup>9.<sup>22</sup>. We show that blic outperforms PGBDDQ both in overall cost of computing the answer and the certificates as well as in the number of bytes communicated and the time used by Verifier.

Our results indicate that giving up absolute certainty through interactive protocols can lead to an order of magnitude smaller communication cost and several orders of magnitude smaller checking costs for the verifier.

## 7 Conclusion

We have presented a solver that combines BDDs with an interactive protocol. blic can be seen as a self-certifying BDD library able to certify the correctness of arbitrary sequences of BDD operations. In order to trust the result, a user must only trust the verifier (a straightforward program that poses challenges to the prover). We have shown that blic (including certification time) is competitive with other solvers, and Verifier's time and error probabilities are negligible.

Our results show that IP = PSPACE can become an important result not only in theory but also in the practice of automatic verification. From this perspective, our paper is a first step towards practical certification based on interactive protocols. While we have focused on BDDs, we can ask the more general question: which practical automated reasoning algorithms can be made efficiently certifying? For example, whether there is an interactive protocol and an efficient certifying version of modern SAT solving algorithms is an interesting open challenge.

## References

1. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge (2006). https://theory.cs.princeton.edu/complexity/ book.pdf

<sup>7</sup> DepQBF only solved 1 of 10 instances within 120 min, and is thus not compared.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Ownership Guided C to Rust Translation**

Hanliang Zhang1, Cristina David1, Yijun Yu2, and Meng Wang1(B)

<sup>1</sup> University of Bristol, Bristol, UK {pd21541,cristina.david,meng.wang}@bristol.ac.uk <sup>2</sup> The Open University, Milton Keynes, UK yijun.yu@open.ac.uk

**Abstract.** Dubbed a safer C, Rust is a modern programming language that combines memory safety and low-level control. This interesting combination has made Rust very popular among developers and there is a growing trend of migrating legacy codebases (very often in C) to Rust. In this paper, we present a C to Rust translation approach centred around static ownership analysis. We design a suite of analyses that infer ownership models of C pointers and automatically translate the pointers into safe Rust equivalents. The resulting tool, Crown, scales to real-world codebases (half a million lines of code in less than 10 s) and achieves a high conversion rate.

## **1 Introduction**

Rust [33] is a modern programming language which features an exciting combination of memory safety and low-level control. In particular, Rust takes inspiration from ownership types to restrict the mutation of shared state. The Rust compiler is able to statically verify the corresponding ownership constraints and consequently guarantee memory and thread safety. This distinctive advantage of provable safety makes Rust a very popular language, and the prospect of migrating legacy codebases in C to Rust is very appealing.

In response to this demand, automated tools translating C code to Rust emerge from both industry and academia [17,26,31]. Among them, the industrial strength translator C2Rust [26] rewrites C code into the Rust syntax while preserving the original semantics. The translation does not synthesise an ownership model and thus is not able to do more than replicating the unsafe use of pointers in C. Consequently, the Rust code must be labelled with the unsafe keyword which allows certain actions that are not checked by the compiler. More recent work focuses on reducing this unsafe labelling. In particular, the tool Laertes [17] aims to rewrite the (unsafe) code produced by C2Rust by searching the solution space guided by the type error messages from the Rust compiler. This is impressive, as for the first time proper Rust code beyond a line-by-line direct conversion from the original C source may be synthesised. On the other hand, the limit of the trial-and-error approach is also clear: the system does not support the reasoning of the generation process, nor create any new understanding of the target code (other than the fact that it compiles successfully).

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 459–482, 2023. https://doi.org/10.1007/978-3-031-37709-9\_22

In this paper, we take a more principled approach by developing a novel ownership analysis of pointers that is efficient (scaling to large programs (half a million LOC in less than 10 s)), sophisticated (handling nested pointers and inductively-defined data structures), and precise (being field and flow sensitive). Our ownership analysis makes a strengthening assumption about the Rust ownership model, which obviates the need for an aliasing analysis. While this assumption excludes a few safe Rust uses (see discussion in Sect. 5), it ensures that the ownership analysis is both scalable and precise, which is subsequently reflected in the overall scalability and precision of the C to Rust translation.

The primary goal of this analysis is of course to facilitate the C to Rust translation. Indeed, as we will see in the rest of the paper, an automated translation system is built to encode the ownership models in the generated Rust code which is then proven safe by the Rust compiler. However, in contrast to trying the Rust compiler as common in existing approaches [17,31], this analysis approach actually extracts new knowledge about ownership from code, which may lead to other future utilities including preventing memory leaks (currently allowed in safe Rust), identifying inherently unsafe code fragments, and so on. Our current contributions are:


## **2 Background**

We start by giving a brief introduction of Rust, in particular its ownership system and the use of pointers, as they are central to memory safety.

#### **2.1 Rust Ownership Model**

Ownership in Rust denotes a set of rules that govern how the Rust compiler manages memory [33]. The idea is to associate each value with a *unique* owner. This feature is useful for memory management. For example, when the owner goes out of scope, the memory allocated for the value can be automatically recycled.


In the above snippet, the assignment of v to u also transfers ownership, after which it is illegal to access v until it is re-assigned a value again.

This permanent transfer of ownership gives strong guarantees but can be cumbersome to manage in programming. In order to allow sharing of values between different parts of the program, Rust uses the concept of *borrowing*, which refers to creating a *reference* (marked by an ampersand). A reference allows referring to some value without taking ownership of it. Borrowing gives the temporary right to read and, potentially, uniquely mutate the referenced value.

This concept of time creates another dimension of ownership management known as *lifetime*. For mutable references (as marked by mut in the above examples), only one reference is allowed at a time. But for immutable references (the ones without the mut marking), multiple of them can coexist as long as there isn't any mutable reference at the same time. As one can expect, this interaction of mutable and immutable references, and their lifetimes is highly non-trivial. In this paper, we focus on analysing mutable references.

#### **2.2 Pointer Types in Rust**

Rust has a richer pointer system than C. The primitive C-style pointers (written as \*const T or \*mut T) are known as *raw pointers*, which are ignored by the Rust compiler for ownership and lifetime checks. Raw pointers are a major source of unsafe Rust (more below). Idiomatic Rust instead advocates *box pointers* (written as Box<T>) as owning pointers that uniquely own heap allocations, as well as *references* (written as &mut T or & T as discussed in the previous subsection) as non-owning pointers that are used to access values owned by others. Rust also offers smart pointers for which the borrow rules are checked at runtime (e.g. RefCell<T>). We aim for our translation to maintain CPU time without additional runtime overhead, and therefore we do not refactor raw pointers into RefCell<T>s.

C-style array pointers are represented in Rust as references to arrays and slice references, with array bounds known at compile time and runtime, respectively. The creation of meta-data such as array bounds is beyond the scope of ownership analysis. In this work, we keep array pointers as raw pointers in the translated code.

#### **2.3 Unsafe Rust**

As a pragmatic design, Rust allows programs to contain features that cannot be verified by the compiler as memory safe. This includes dereferencing raw pointers, calling low level functions, and so on. Such uses must be marked with the unsafe keyword and form fragments of *unsafe Rust*. It is worth noting that unsafe does not turn off all compiler checks; safe pointers are still checked.

Unsafe Rust is often used to implement data structures with complex sharing, overcome incompleteness issues of the Rust compiler, and support low-level systems programming [2,18]. But it can also be used for other reasons. For example, c2rust [26] directly translates C pointers into raw pointers. Without unsafe Rust, the generated code would not compile.

## **3 Overview**

In this section, we present an overview of Crown via two examples. The first example provides a detailed description of the push method for a singly-linked list, whereas the second shows a snippet from a real-world benchmark.

**Fig. 1.** Pushing into a singly-linked list

#### **3.1 Pushing into a Singly-Linked List**

The C code of function push in Fig. 1a allocates a new node where it stores the data received as argument. The new node subsequently becomes the head of list. This code is translated by c2rust to the Rust code in Fig. 1b. Notably, the c2rust translation is syntax-based and simply changes all the C pointers to \*mut raw pointers. Given that dereferencing raw pointers is considered an unsafe operation in Rust (e.g. the dereferencing of new\_node at line 16 in Fig. 1b), the push method must be annotated with the unsafe keyword (alternatively, it could be placed inside an unsafe block). Additionally, c2rust introduces two directives for the two struct definitions, #[repr(C)] and #[derive(Copy, Clone)]. The former keeps the data layout the same as in C for possible interoperation, and the latter instructs that the corresponding type can only be duplicated through copying.

While c2rust uses raw pointers in the translation, the ownership scheme in Fig. 1b obeys the Rust ownership model, meaning that the raw pointers could be translated to safe ones. A pointer to a newly allocated node is assigned to new\_node at line 15. This allows us to infer that the ownership of the newly allocated node belongs to new\_node. Then, at line 18, the ownership is transferred from new\_node to (\*list).head. Additionally, if (\*list).head owns any memory object prior to line 17, then its ownership is transferred to (\*new\_node).next at line 17. This ownership scheme corresponds to safe pointer use: (i) each memory object is associated with a unique owner and (ii) it is dropped when its owner goes out of scope. As an illustration for (i), when the ownership of the newly allocated memory is transferred from new\_node to (\*list).head at line 18, (\*list).head becomes the unique owner, whereas new\_node is made invalid and it is no longer used. For (ii), given that argument list of push is an output parameter (i.e. a parameter that can be accessed from outside the function), we assume that it must be owning on exit from the method. Thus, no memory object is dropped in the push method, but rather returned to the caller.

Crown infers the ownership information of the code translated by c2rust, and uses it to translate the code to safer Rust in Fig. 1c. As explained next, Crown first retypes raw pointers into safe pointers based on the ownership information, and then rewrites their uses.

**Retyping Pointers in Crown**. If a pointer owns a memory object at *any point within its scope*, Crown retypes it into a Box pointer. For instance, in Fig. 1c, local variable new\_node is retyped to be Option<Box<Node>> (safe pointer types are wrapped into Option to account for null pointer values). Variable new\_node is non-owning upon function entry, becomes owning at line 13 and ownership is transferred out again at line 16.

For struct fields, Crown considers all the code in the scope of the struct declaration. If a struct field owns a memory object at *any point within the scope of its struct declaration*, then it is retyped to Box. In Fig. 1b, fields next and head are accessed via access paths (\*new\_node).next and (\*list).head, and given ownership at lines 17 and 18, respectively. Consequently, they are retyped to Box at lines 4 and 9 in Fig. 1c, respectively.

A special case is that of output parameters, e.g. list in our example. For such parameters, although they may be owning, Crown retypes them to &mut in order to enable borrowing. In push, the input argument list is retyped to Option<&mut List> .

**Rewriting Pointer Uses in Crown.** After retyping pointers, Crown rewrites their uses. The rewrite process takes into consideration both their new type and the context in which they are being used. Due to the Rust semantics, the rewrite rules are slightly intricate (see Sect. 6). For instance, the dereference of new\_node at line 14 is rewritten to (\*new\_node).as\_deref\_mut().unwrap() as it needs to be mutated and the optional part of the Box needs to be unwrapped. Similarly, at line 15, (\*list).head is rewritten to be ((\*list.as\_deref\_mut()).unwrap ()).head.take() as the LHS of the assignment expects a Box pointer.

After the rewrite performed by Crown, the unsafe block annotation is not needed anymore. However, Crown does not attempt to remove such annotations. Notably, safe pointers are always checked by the Rust compiler, even inside unsafe blocks.

#### **3.2 Freeing an Argument List in bzip2**

We next show the transformation of a real-world code snippet with a loop structure: a piece of code in bzip2 that frees argument lists. bzip2 defines a singlylinked list like structure, Cell, that holds a list of argument names. In Fig. 2, we extract from the source code a snippet that frees the argument lists. Here, the local variable argList is an already constructed argument list, and Char is a type alias to C-style characters. As a note, Cell in Figs. 2b and 2c does not refer to Rust's std::cell::Cell.

**Fig. 2.** Freeing an argument list

Crown accurately infers an ownership scheme for this snippet. Firstly, ownership of argList is transferred to aa, which is to be freed in the subsequent loop. Inside the loop, ownership of link accessed from aa is firstly transferred to aa2, then ownership of name accessed from aa is released in a call to free. After the conditional, ownership of aa is also released. Last of all, aa regains ownership from aa2.

**Handling of Loops.** For loops, Crown only analyses their body once as that will already expose all the ownership information. For inductively defined data structures such as Cell, while further unrolling of loop bodies explores the data structures deeper, it does not expose any new struct fields: pointer variables and pointer struct fields do not change ownership between loop iterations. Additionally, Crown emits constraints that equate the ownership of all local pointers at the loop entry and exit. For example, the ownership statuses of aa and aa2 at loop entry are made equal with those at loop exit, and inferred to be owning and non-owning, respectively.

**Handling of Null Pointers.** It is a common C idiom for pointers to be checked against null after malloc or before free: if !p.is\_null() free(p) ;. This could be problematic since the then-branch and the else-branch would have conflicting ownership statuses for p. We adopt a similar solution as [24]: we insert an explicit null assignment in the null branch if !p.is\_null() free (p); else p = ptr::null\_mut();. As we treat null pointers as both owning and non-owning, the ownership of p will be dictated by the non-null branch, enabling Crown to infer the correct ownership scheme.

**Translation.** With the above ownership scheme, Crown performs the rewrites as in Fig. 2c. Note that we do not attempt to rewrite name since it is an array pointer (see Sect. 7 for limitations).

#### **4 Architecture**

In this section, we give a brief overview of Crown's architecture. Crown takes as input a Rust program with unsafe blocks, and outputs a safer Rust program, where a portion of the raw pointers have been retyped as safe ones (in accordance to the Rust ownership model), and their uses modified accordingly. In this paper we focus on applying our technique to programs automatically translated by c2rust, which maintain a high degree of similarity to the original C ones, where the C syntax is replaced by Rust syntax.

Crown applies several static analyses on the MIR of Rust to infer properties of pointers:


The results of these analyses are summarised as type qualifiers [21]. A type qualifier is an atomic property (i.e., ownership, mutability, and fatness) that 'qualifies' the standard pointer type. These qualifiers are then utilised for pointer retyping. For example, an owning, non-array pointer is retyped to Box . After pointers have been retyped, Crown rewrites their usages accordingly.

## **5 Ownership Analysis**

The goal of our ownership analysis is to compute an ownership scheme for a given program that obeys the Rust ownership model, if such a scheme exists. The ownership scheme contains information about whether pointers in the program are owning or non-owning at particular program locations. At a high-level, our analysis works by generating a set of ownership constraints (Sect. 5.2), which are then solved by a SAT solver (Sect. 5.3). A satisfying assignment for the ownership constraints is an ownership scheme that obeys the Rust semantics.

Our ownership analysis is flow and field sensitive, where the latter enables inferring ownership information for pointer struct fields. To satisfy field sensitivity, we track ownership information for *access paths* [10,14,29]. An access path represents a memory location by the way it is accessed from an initial, base variable, and comprises of the base variable and a sequence of field selection operators. For the program Fig. 1b, some example access paths are new\_node

(consists only of the base variable), (\*new\_node).next, and (\*list).head. Our analysis associates an ownership variable with each access path, e.g. p has associated ownership variable O*p*, and (\*p).next has associated ownership variable <sup>O</sup>(∗*p*)*.next*. Each ownership variable can take the value 1 if the corresponding access path is owning, or 0 if it is non-owning. By ownership of an access path we mean the ownership of the field (or, more generally, pointer) accessed last through the access path, e.g. the ownership of (\*new\_node).next refers to the ownership of field next.

#### **5.1 Ownership and Aliasing**

One of the main challenges of designing an ownership analysis is the interaction between ownership and aliasing. To understand the problem, let us consider the pointer assignment at line 3 in the code listing below. We assume that the lines before the assignment allow inferring that q, (\*q).next and r are owning, whereas p and (\*p).next are non-owning. Additionally, we assume that the lines after the assignment require (\*p).next to be owning (e.g. (\*p).next is being explicitly freed). From this, an ownership analysis could reasonably conclude that ownership transfer happens at line 3 (such that (\*p).next becomes owning), and the inferred ownership scheme obeys the Rust semantics.

```
1 let p, r, q : *mut Node;
2 // p and (*p).next non-owning; q, (*q).next and r owning
3 (*p).next = r;
4 // (*p).next must have ownership
```
Let's now also consider aliasing. A possible assumption is that, just before line 3, p and q alias, meaning that (\*p).next and (\*q).next also alias. Then, after line 3, (\*p).next and (\*q).next will still alias (pointing to the same memory object). However, according to the ownership scheme above, both (\*p).next and (\*q).next are owning, which is not allowed in Rust, where a memory object must have a unique owner. This discrepancy was not detected by the ownership analysis mimicked above. The issue is that the ownership analysis ignored aliasing. Indeed, ownership should not be transferred to (\*p).next if there exists an owning alias that, after the ownership transfer, continues to point to the same memory object as (\*p).next.

Precise aliasing information is very difficult to compute, especially in the presence of inductively defined data structures. In the current paper, we alleviate the need to check aliasing by making a strengthening assumption about the Rust ownership model: we restrict the way in which pointers can acquire ownership along an access path, thus limiting the interaction between ownership and aliasing. In particular, we introduce a novel concept of *ownership monotonicity*. This property states that, along an access path, the ownership values of pointers can only decrease (see Definition 1, where *is prefix*(a, b) returns true if access path a is a prefix of b, and false otherwise – e.g. *is prefix* (p, (\*p).next) = true). Going back to the previous code listing, the ownership monotonicity implies that, for access path (\*p).next we have <sup>O</sup><sup>p</sup> <sup>≥</sup> <sup>O</sup>(\*p).next, and for access path (\*q).next we have <sup>O</sup><sup>q</sup> <sup>≥</sup> <sup>O</sup>(\*q).next. This means that, if (\*p).next is allowed to take ownership, then p must already be owning. Consequently, all aliases of p must be non-owning, which means that all aliases of (\*p).next, including (\*q).next, are non-owning.

**Definition 1 (Ownership monotonicity).** *Given two access paths* a *and* b*, if is prefix*(a, b)*, then* <sup>O</sup>*<sup>a</sup>* <sup>≥</sup> <sup>O</sup>*b.*

Ownership monotonicity is stricter than the Rust semantics, causing our analysis to reject two scenarios that would otherwise be accepted by the Rust compiler (see discussion in Sect. 5.4). In this work, we made the design decision to use ownership monotonicity over aliasing analysis as it allows us to retain more control over the accuracy of the translation. Conversely, using an aliasing analysis would mean that the accuracy of the translation is directly dictated by the accuracy of the aliasing analysis (i.e. false alarms from the aliasing analysis [23, 40] would result in Crown not translating pointers that are actually safe). With ownership monotonicity, we know exactly what the rejected valid ownership schemes are, and we can explicitly enable them (again, see discussion in Sect. 5.4).

#### **5.2 Generation of Ownership Constraints**

During constraint generation, we assume a given k denoting the length of the longest access path used in the code. This enables us to capture the ownership of all the access paths exposed in the code. Later in this section, we will discuss the handling of loops, which may expose longer access paths.

Next, we denote by <sup>P</sup> the set of all access paths in a program, base var(a) returns the base variable of access path <sup>a</sup>, and <sup>|</sup>a<sup>|</sup> computes the length of the access path a in terms of applied field selection operators from the base variable. In the context of the previous code listing, base var((\*p).next) = p, base var(p) = p, <sup>|</sup>p<sup>|</sup> = 1 and <sup>|</sup>(\*p).next<sup>|</sup> = 2. Then, we define ap(v, lb, ub) to return the set of access paths with base variable v and length in between lower bound lb and upper bound ub: ap(v, lb, ub) = {<sup>a</sup> ∈ P|base var(a) = <sup>v</sup> <sup>∧</sup> lb ≤ |a| ≤ ub}. For illustration, we have ap(p, <sup>1</sup>, 2) = {p, (\*p).next}.

**Ownership Transfer.** The program instructions where ownership transfer can happen are (pointer) assignment and function call. Here we discuss assignment and, due to space constraints, we leave the rules for interprocedural ownership analysis in the extended version [41]. Our rule for ownership transfer at assignment site follows Rust's Box semantics: when a Box pointer is moved, the

$$\begin{aligned} v &= base\\_var(\mathbf{p}), \quad w = base\\_var(\mathbf{q}),\\ a &\in ap(v, |\mathbf{p}|, k), \quad b \in ap(w, |\mathbf{q}|, k), \quad c \in ap(v, 1, |\mathbf{p}| - 1), \quad d \in ap(w, 1, |\mathbf{q}| - 1) \\ &\qquad |a| - |\mathbf{p}| = |b| - |\mathbf{q}|, |c| = |d| \\ is\\_prefix(p, a), \; is\\_prefix(q, b), \; is\\_prefix(c, p), \; is\\_prefix(d, q) \\ &\qquad C' = C \cup \{\mathbb{O}\_a = 0 \land \mathbb{O}\_{a'} + \mathbb{O}\_{b'} = \mathbb{O}\_b \land \mathbb{O}\_{c'} = \mathbb{O}\_c \land \mathbb{O}\_{d'} = \mathbb{O}\_d\} \\ &\qquad \qquad C \vdash \mathbf{p} = \mathbf{q}; \Rightarrow C' \end{aligned}$$

object it points to is moved as well. For instance, in the following Rust pseudocode snippet:

```
1 let p,q: Box<Box<i32>>;
2 p = q; // ownership transfer occurs
3 // the use of q and *q is disallowed
```
when ownership is transferred from q to p, \*q also loses ownership. Except for reassignment, the use of a Box pointer after it lost its ownership is disallowed, hence the use of q or \*q is forbidden at line 3.

Consequently, we enforce the following *ownership transfer rule*: if ownership transfer happens for a pointer variable (e.g. p and q in the example), then it must happen for all pointers reachable from that pointer (e.g. \*p and \*q). The ownership of pointer variables from which the pointer under discussion is reachable remains the same (e.g. if ownership transfer happens for some assignment \*p = \*q in the code, then q and p retain their respective previous ownership values).

*Possible Ownership Transfer at Pointer Assignment:* The ownership transfer rule at pointer assignment site is captured by rule ASSIGN in Fig. 3. The judgement <sup>C</sup> p = q; <sup>⇒</sup> <sup>C</sup> denotes the fact that the assignment is analysed under the set of constraints C, and generates C . We use prime notation to denote variables after the assignment. Given pointer assignment p=q, a and b represent all the access paths respectively starting from p and q, whereas c and d denote the access paths from the base variables of p and q that reach p and q, respectively. Then, equality O*<sup>a</sup>*- + O*<sup>b</sup>*- = O*<sup>b</sup>* captures the possibility of ownership transfer for all access paths originating at p and q: (i) If transfer happens then the ownership of b transfers to a (O*<sup>a</sup>*- = O*<sup>b</sup>* and O*<sup>b</sup>*- = 0). (ii) Otherwise, the ownership values are left unchanged (O*<sup>a</sup>*- = O*<sup>a</sup>* and O*<sup>b</sup>*- = O*b*). The last two equalities, O*<sup>c</sup>*- <sup>=</sup> <sup>O</sup>*c*∧O*<sup>d</sup>*- = O*d*, denote the fact that, for both (i) and (ii), pointers on access paths c and d retain their previous ownership. Note that "+" is interpreted as the usual arithmetic operation over N, where we impose an implicit constraint <sup>0</sup> <sup>≤</sup> <sup>O</sup> <sup>≤</sup> 1 for every ownership variable <sup>O</sup>.

*C Memory Leaks:* In the ASSIGN rule, we add constraint O*<sup>a</sup>* = 0 to C in order to force a to be non-owning before the assignment. Conversely, having a owning before being reassigned via the assignment under analysis signals a memory leak in the original C program. Given that in Rust memory is automatically returned, allowing the translation to happen would change the semantics of the original program by fixing the memory leak. Instead, our design choice is to disallow the ownership analysis from generating such a solution. As we will explain in Sect. 8, we intend for our translation to preserve memory usage (including possible memory leaks).

*Simultaneous Ownership Transfer Along an Access Path:* One may observe that the constraints generated by ASSIGN do not fully capture the stated ownership transfer rule. In particular, they do not ensure that, whenever ownership transfer occurs from p to q, it also transfers for all pointers on all access paths a and b. Instead, this is implicitly guaranteed by the ownership monotonicity rule, as stated in Theorem 1.

**Theorem 1 (Ownership transfer).** *If ownership is transferred from* p *to* q*, then, by the ASSIGN rule and ownership monotonicity, ownership also transfers between corresponding pointers on all access paths* a *and* b*:* O*<sup>a</sup>*- = O*<sup>b</sup> and* O*<sup>b</sup>*- = 0*. (proof in the extended version [41])*

*Ownership and Aliasing:* We saw in Sect. 5.1 that aliasing may cause situations in which, after ownership transfer, the same memory object has more than one owner. Theorem 2 states that this is not possible under ownership monotonicity.

**Theorem 2 (Soundness of pointer assignment under ownership monotonicity).** *Under ownership monotonicity, if all allocated memory objects have a unique owner before a pointer assignment, then they will also have a unique owner after the assignment. (proof in the extended version [41])*

Intuitively, Theorem 2 enables a pointer to acquire ownership without having to consider aliases: after ownership transfer, this pointer will be the unique owner. The idea resembles that of strong updates [30].

*Additional Access Paths:* As a remark, it is possible for p and q to be accessible from other base variables in the program. In such cases, given that those access paths are not explicitly mentioned at the location of the ownership transfer, we do not generate new ownership variables for them. Consequently, their current ownership variables are left unchanged by default.

**Ownership Transfer Example.** To illustrate the ASSIGN rule, we use the singly-linked list example below, where we assume that p, q are both of type \*mut Node. Therefore, we will have to consider the following four access path p, q,

(\*p).next, (\*q).next. In SSA-style, at each line in the example, we generate new ownership variables (by incrementing their subscript) for the access paths mentioned at that line. For the first assignment, ownership transfer can happen between p and q, and (\*p).next and (\*q).next, respectively. For the second assignment, ownership can be transferred between (\*p).next and (\*q).next, while p and q must retain their previous ownership.

```
1 p = q; // Op1 = 0 ∧ Op2 + Oq2 = Oq1 ∧
2 // O(∗p1).next = 0 ∧ O(∗p2).next + O(∗q2).next = O(∗q1).next
3 (*p).next = (*q).next;
4 // Op3 = Op2 ∧ Oq3 = Oq2 ∧
5 // O(∗p2).next = 0 ∧ O(∗p3).next + O(∗q3).next = O(∗q2).next
```
Besides generating ownership constraints for assignments, we must model the ownership information for commonly used C standard function like malloc, calloc, realloc, free, strcmp, memset, etc. Due to space constraints, more details about these, as well as the rules for ownership monotonicity and interprocedural ownership analysis are provided in the extended version [41].

**Handling Conditionals and Loops.** As mentioned in Sect. 3.2, we only analyse the body of loops once as it is sufficient to expose all the required ownership variables. For inductively defined data structures, while further unrolling of loop bodies increases the length of access paths, it does not expose any new struct fields (struct fields do not change ownership between loop iterations).

To handle join points of control paths, we apply a variant of the SSA construction algorithm [6], where different paths are merged via φ nodes. The value of each ownership variable must be the same on all joined paths, or otherwise the analysis fails.

### **5.3 Solving Ownership Constraints**

The ownership constraint system consists of a set of 3-variable linear constraints of the form O*<sup>v</sup>* = O*<sup>w</sup>* + O*u*, and 1-variable equality constraints O*<sup>v</sup>* = 0 and O*<sup>v</sup>* = 1.

**Definition 2 (Ownership constraint system).** *An* ownership constraint system (P, Δ, Σ, Σ¬) *consists of a set of ownership variables* <sup>P</sup> *that can have either value 0 or 1, a set of 3-variable equality constraints* <sup>Δ</sup> <sup>⊆</sup> <sup>P</sup> <sup>×</sup> <sup>P</sup> <sup>×</sup> <sup>P</sup>*, and two sets of 1-variable equality constraints,* Σ,Σ<sup>¬</sup> <sup>⊆</sup> <sup>P</sup>*. The equalities in* <sup>Σ</sup> *are of the form* <sup>x</sup> = 1*, whereas the equalities in* <sup>Σ</sup><sup>¬</sup> *are of the form* <sup>x</sup> = 0*.*

**Theorem 3 (Complexity of the ownership constraint solving).** *Deciding the satisfiability of the ownership constraint system in Definition 2 is NPcomplete. (proof in the extended version [41]).*

We solve the ownership constraints by calling a SAT solver. The ownership constraints may have no solution. This happens when there is no ownership scheme that obeys the Rust ownership model and the ownership monotonicity property (which is stricter than the Rust model for some cases), or the original C program has a memory leak. In the case where the ownership constraints have more than one solution, we consider the first assignment returned by the SAT solver.

Due to the complex Rust semantics, we do not formally prove that a satisfying assignment obeys the Rust ownership model. Instead, this check is performed after the translation by running the Rust compiler.

#### **5.4 Discussion on Ownership Monotonicity**

As mentioned earlier in Sect. 5, ownership monotonicity is stricter than the Rust semantics, causing our analysis to potentially reject some ownership schemes that would otherwise be accepted by the Rust compiler. We identified two such scenarios:

*(i) Reference output parameter:* This denotes a reference passed as a function parameter, which acts as an output as it can be accessed from outside the function (e.g. list in Fig. 1a). For such parameters, the base variable is non-owning (as it is a reference) and mutable, whereas the pointers reachable from it may be owning (see example in Fig. 1c, where (\*node).head gets assigned a pointer to a newly allocated node). We detect such situations and explicitly enable them. In particular, we explicitly convert owning pointers p to &mut(\*p) at the translation stage.

*(ii) Local borrows:* The code below involving a mutable local borrow is not considered valid by Crown as it disobeys the ownership monotonicity: after the assignment, local\_borrow is non-owning, whereas \*local\_borrow is owning.

```
1 let local_borrow = &mut n;
```

```
2 *local_borrow = Box::new(1);
```
While we could explicitly handle the translation to local borrows, in order to do so soundly, we would have to reason about lifetime information (e.g. Crown would have to check that there is no overlap between the lifetimes of different mutable references to the same object). In this work, we chose not to do this and instead leave it as future work (as also mentioned under limitations in Sect. 7). It was observed in [13] that scenario (i) is much more prevalent than scenario (ii). Additionally, we observed in our benchmarks that output parameter accounts for 93% of mutable references (hence the inclusion of a special case enabling the translation of scenario (i) in Crown).

## **6 C to Rust Translation**

Crown uses the results of the ownership, mutability and fatness analyses to perform the actual translation, which consists of retyping pointers (Sect. 6.1) and rewriting pointer uses (Sect. 6.2).

#### **6.1 Retyping Pointers**

As mentioned in Sect. 2.2, we do not attempt to translate array pointers to safe pointers. In the rest of the section, we focus on mutable, non-array pointers.

The translation requires a global view of pointers' ownership, whereas information inferred by the ownership analysis refers to individual program locations. For the purpose of translation, given that we refactor owning pointers into box pointers, a pointer is considered (globally) owning if it owns a memory object at any program location within its scope. Otherwise, it is (globally) non-owning. When retyping pointer fields of structs, we must consider the scope of the struct declaration, which generally transcends the whole program. Within this scope, each field is usually accessed from several base variables, which must all be taken into consideration. For instance, given the List declaration in Fig. 1b and two variables l1 and l2 of type \*mut List. Then, in order to determine the ownership status of field next, we have to consider all the access paths to next originating from both base variables l1 and l2.

The next table shows the retyping rules for mutable, non-array pointers, where we wrap safe pointer types into Option to account for null pointer values:


The non-owning pointers that are kept as raw pointers \*mut T correspond to mutable local borrows. As explained in Sects. 5.4 and 7, Crown doesn't currently handle the translation to mutable local borrows due to the fact that we do not have a lifetime analysis. Notably, this restriction does not apply to output parameters (which covers the majority of mutable references), where we translate to mutable references. The lack of a lifetime analysis means that we also can't handle immutable local borrows, hence our translation's focus on mutable pointers.

#### **6.2 Rewriting Pointer Uses**

The rewrite of a pointer expression depends on its new type and the context in which it is used. For example, when rewriting q in p=q, the context will depend on the new type of p. Based on this new type, we can have four contexts: BoxCtxt which requires Box pointers, MutCtxt which requires &mut references, ConstCtxt which requires & references, and RawCtxt which requires raw pointers. For example, if p above is a Box pointer, then we rewrite q in a BoxCtxt.

Then, the rewrite takes place according to the following table, where columns correspond to the new type of the pointer to be rewritten, and rows represent possible contexts<sup>1</sup>.


Our translation uses functions from the Rust standard library, as follows:


<sup>1</sup> The cell marked as <sup>⊥</sup> is not applicable due to our treatment of output parameter.


We also define the helper function to\_raw that transform safe pointers into raw pointers:

```
fn to_raw<T>(b: &mut Option<Box<T>>) -> *mut T {
    b.as_deref_mut().map(|b| b as *mut T).unwrap_or(null_mut())
}
```
Here, we explain to\_raw for a Box argument (the explanation for &mut is the same because of the polymorphic nature of as\_deref\_mut):


*Dereferences:* When a pointer p is dereferenced as part of a larger expression (e.g. (\*p).next), we need an additional unwrap().

*Box pointers check:* Rust disallows the use of Box pointers after they lost their ownership. As this rule cannot be captured by the ownership analysis, such situations are detected at translation stage, and the culpable Box pointers are reverted back to raw pointers.

For brevity, we omitted the slightly different treatment of struct fields that are not of pointer type.

### **7 Challenges of Handling Real-World Code**

We designed Crown to be able to analyse and translate real-world code, which poses significant challenges. In this section, we discuss some of the engineering challenges of Crown and its current limitations.

#### **7.1 Preprocessing**

During the transpilation of C libraries, c2rust treats each file as a separate compilation unit, which gets translated into a separate Rust module. Consequently, struct definitions are duplicated, and available function definitions are put in extern blocks [17]. We apply a preprocessing step similar to the resolve-imports tool of Laertes [17] that links those definitions across files.

### **7.2 Limitations of the Ownership Analysis**

There are a few C constructs and idioms that are not fully supported by our implementation, for which Crown generates partial ownership constraints. Crown's translation will attempt to rewrite a variable as long as there exists a constraint involving it. As a result, the translation is in theory neither *sound* nor *complete*: it may generate code that does not compile (though we have not observed this in practice for the benchmarks where Crown produces a result – see Sect. 8) and it may leave some pointers as raw pointers resulting in a less than optimal translation. We list below the cases when such a scenario may happen.

*Certain Unsafe C Constructs.* For type casts, we only generate ownership transfer constraints for head pointers; for unions we assume that they contain no pointer fields and consequently, we generate no constraints; similarly, we generate no constraints for variadic arguments. We noticed that unions and variadic arguments may cause our tool to crash (e.g. three of the benchmarks in [17], as mentioned in Sect. 8). Those crashes happen when analysing access paths that contain dereferences of union fields (where we assumed no pointer fields), and when analysing calls to functions with variadic arguments where a pointer is passed as argument.

*Function Pointers.* Crown does not generate any constraints for them.

*Non-standard Memory Management in C Libraries.* Certain C libraries wrap malloc and free, often with static function pointers (pointers to allocator/deallocator are stored in static variables), or function pointers in structs. Crown does not generate any constraints in such scenarios. In C, it is also possible to use malloc to allocate a large piece of memory, and then split it into several sub-regions assigned to different pointers. In our ownership analysis, only one pointer can gain ownership of the memory allocated by a call to malloc. Another C idiom that we don't fully support occurs when certain pointers can point to either heap allocated objects, or statically allocated stack arrays. Crown generates ownership constraints only for the heap and, consequently, those variables will be left under-constrained.

#### **7.3 Other Limitations of Crown**

*Array Pointers.* For array pointers, although Crown infers the correct ownership information, it does not generate the meta data required to synthesise Rust code.

*Mutable Local Borrows.* As explained in the last paragraph of Sect. 6.1, Crown does not translate mutable non-owning pointers to local mutable references as this requires dedicated analysis of lifetimes. Note that Crown does however generate mutable references for output parameters.

*Access Paths that Break Ownership Monotonicity.* As discussed in Sect. 5.4, ownership monotonicity may be stricter in certain cases than Rust's semantics.

## **8 Experimental Evaluation**

We implement Crown on top of the Rust compiler, version nightly-2023-01-26. We use c2rust with version 0.16.0. For the SAT solver, we rely on a Rust-binding of z3 [20] with version 0.11.2. We run all our experiments on a MacBook Pro with an Apple M1 chip, with 8 cores (4 performance and 4 efficiency). The computer has 16 GB RAM and runs macOS Monterey 12.5.1.

**Benchmark Selection.** To evaluate the utility of Crown, we collected a benchmark suite of 20 programs (Table 1). These include benchmarks from Laertes [17]'s accompanying artifact [16] (marked by \* in Table 1)<sup>2</sup>, and additionally 8 real-world projects (binn, brotli, buffer, heman, json.h, libtree, lodepng, rgba) together with 4 commonly used data structure libraries (avl, bst, ht, quadtree).

**Functional and Non-functional Guarantees.** With respect to functional properties, we want the original program and the refactored program to be observationally equivalent, i.e. for each input they produce the same output. We empirically validated this using all the available test suites (i.e. for libtree, rgba, quadtree, urlparser, genann, buffer in Table 1). All the test suites continue to pass after the translation. For nonfunctional properties, we intend to preserve memory usage and CPU time, i.e. we don't want our translation to introduce runtime overhead. We also validated this using the test suites.


**Table 1.** Benchmarks information

#### **8.1 Research Questions**

We aim at answering the following research questions.

<sup>2</sup> We excluded json-c, optipng, tinycc where Crown crashes because of the uses of unions and variadic arguments as discussed in Sect. 7. Additional programs (qsort, grabc, xzoom, snudown, tmux, libxml2) are mentioned in the paper [17] but are either missing or incomplete in the artifact [16].

RQ1. How many raw pointers/pointer uses can Crown translate to safe pointers/pointer uses?

RQ2. How does Crown's result compare with the state-of-the-art [17]?

RQ3. What is the runtime performance of Crown?

**RQ 1: Unsafe pointer reduction.** In order to judge Crown's efficacy, we measure the reduction rate of raw pointer declarations and uses. This is a direct indicative of the improvement in safety, as safe pointers are always checked by the Rust compiler (even inside unsafe regions). As previously mentioned, we focus on mutable non-array pointers. The results are presented in Table 2, where #ptrs counts the number of raw pointer declarations in a given benchmark, #uses counts the number of times raw pointers are being used, and the Laertes and Crown headers denote the reduction rates of the number of raw pointers and raw pointer uses achieved by the two tools, respectively. For instance, for benchmark avl, the rate of 100% means that all raw pointer declarations and their uses are translated into safe ones. Note that the "-" symbols on the row corresponding to robotfindskitten are due to the fact that the benchmark contains 0 raw pointer uses.

The median reduction rates achieved by Crown for raw pointers and raw pointer uses are 37.3% and 62.1%, respectively. Crown achieves a 100% reduction rate for many non-trivial data structures (avl, bst, buffer, ht), as well as for rgba. For brotli, a lossless data compression algorithm developed by Google, which is our largest benchmark, Crown achieves reduction rates of 21.4% and 20.9%, respectively. The relatively low reduction rates for brotli and a few other benchmarks (tulipindicators, lodepng, bzip2, genann, libzahl) is due to their use of non-standard memory management strategies (discussed in detail in Sect. 7).

Notably, all the translated benchmarks compile under the aforementioned Rust compiler version. As a check of semantics preservation, for the benchmarks that provide test suites (libtree, rgba, quadtree, urlparser, genann, buffer), our translated benchmarks pass all the provided tests.

**RQ 2: Comparing with state-of-the-art.** The comparison of Crown with Laertes [17] is also shown in Table 2, with bold font highlighting better results. The data on Laertes is either directly extracted from the artifact [16] or has been confirmed by the authors through private correspondence. We can see that Crown outperforms the state-of-the-art (often by a significant degree) in most cases, with lodepng being the only exception, where we suspect that the reason also lies with non-standard memory management strategies mentioned before. Laertes is less affected by this as it does not rely on ownership analysis.

**RQ 3: Runtime performance.** Although our analysis relies on solving a constraint satisfaction problem that is proven to be NP-complete, in practice the runtime performance of Crown is consistently high. The execution time of the analysis and the rewrite for the whole benchmark suite is within 60 s (where the execution time for our largest benchmark, brotli, is under 10 s).


**Table 2.** Reduction of (mutable, non-array) raw pointer declarations and uses

#### **9 Related Works**

**Ownership Discussion.** Ownership has been used in OO programming to enable controlled aliasing by restricting object graphs underlying the runtime heap [11,12] with efforts made in the automatic inference of ownership information [1,4,39], and applications of ownership to memory management [5,42]. Similarly, the concept of ownership has also been applied to analyse C/C++ programs. Heine et al. [24] inferred pointer ownership information for detecting memory leaks. Ravitch et al. [37] apply static analysis to infer ownership for automatic library binding generation. Giving the different application domains, each of these works makes different assumptions. Heine et al. [24] assumes that indirectly-accessed pointers (i.e. any pointer accessed through a path, like (\*p).next) cannot acquire ownership, whereas Ravitch et al. [37] assumes that all struct fields are owning unless explicitly annotated. We took from [24] its handling of flow sensitivity, but enhanced it with the analysis of nested pointers and inductively defined data structures, which we found to be essential for translating real-world code. The analysis in [24] assigns a default "non-owning" status to all indirectly accessed pointers. This rules out many interesting data structures such as linked lists, trees, hash tables, etc., and commonly used idioms such as passing by reference. Conversely, in our work, we rely on a strengthening assumption about the Rust ownership model, which allows handling the aforementioned scenarios and data structures. Lastly, the idea of ownership is also broadly applied in concurrent separation logic [7–9,19,38]. However, these works are not aimed as automatic ownership inference systems.

**Rust Verification.** The separation logic based reasoning framework Iris [28] was used to formalise the Rust type system [27], and verify Rust programs [34]. While these works cover unsafe Rust fragments, they are not fully automatic. When restricting reasoning to only safe Rust, RustHorn [35] gives a first-order logic formulation of the behavior of Rust code, which is ameanable to fully automatic verification, while Prusti [3] leverages Rust compiler information to generate separation logic verification conditions that are discharged by Viper [36]. In the current work, we provide an automatic ownership analysis for unsafe Rust programs.

**Type Qualifiers.** Type qualifiers are a lightweight, practical mechanism for specifying and checking properties not captured by traditional type systems. A general flow-insensitive type qualifier framework has been proposed [21], with subsequent applications analysing Java reference mutability [22,25] and C array bounds [32]. We adapted these works to Rust for our mutability and fatness analyses, respectively.

**C to Rust Translation.** We have already discussed c2rust [26], which is an industrial strength tool that converts C to Rust syntax. c2rust does not attempt to fix unsafe features such as raw pointers and the programs it generates are always annotated as unsafe. Nevertheless it forms the bases of other translation efforts. CRustS [31] applies AST-based code transformations to remove superfluous unsafe labelling generated by c2rust. But it does not fix the unsafe features either. Laertes [17] is the first tool that is actually able to automatically reduce the presence of unsafe code. It uses the Rust compiler as a blackbox oracle and search for code changes that remove raw pointers, which is different from Crown's approach (see Sect. 8 for an experimental comparison). The subsequent work [15] develops an evaluation methodology for studying the limitations of existing techniques that translate unsafe raw pointers to safe Rust references. The work adopts a new concept of 'pseudo safety', under which semantics preservation of the original programs is no longer guaranteed. As explained in Sect. 8, in our work, we aim to maintain semantic equivalence.

## **10 Conclusion**

We devised an ownership analysis for Rust programs translated by c2rust that is scalable (handling half a million LOC in less than 10 s) and precise (handling inductive data structures) thanks to a strengthening of the Rust ownership model, which we call ownership monotonicity. Based on this new analysis, we prototyped a refactoring tool for translating C programs into Rust programs. Our experimental evaluation shows that the proposed approach handles realworld benchmarks and outperforms the state-of-the-art.

## **References**


Design and Implementation. PLDI '00, pp. 35–46. Association for Computing Machinery, New York, NY, USA (2000). https://doi.org/10.1145/349299.349309


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **R2U2 Version 3.0: Re-Imagining a Toolchain for Specification, Resource Estimation, and Optimized Observer Generation for Runtime Verification in Hardware and Software**

Chris Johannsen1(B) , Phillip Jones1, Brian Kempa1, Kristin Yvonne Rozier1, and Pei Zhang<sup>2</sup>

<sup>1</sup> Iowa State University, Ames, USA {cgjohann,phjones,bckempa, kyrozier}@iastate.edu <sup>2</sup> Google LLC, Sunnyvale, USA

**Abstract.** R2U2 is a modular runtime verification framework capable of monitoring sets of specifications in real time and in resource-constrained environments. Such environments demand that a runtime monitor be fast, easily integratable, accessible to domain experts, and have predictable resource requirements. Version 3.0 adds new features to R2U2 and its associated suite of tools that meet these needs including a new front-end compiler that accepts a custom specification language, a GUI for resource estimation, and improvements to R2U2's internal architecture.

## **1 Tool Overview**

R2U2 (Realizable Responsive Unobtrusive Unit) is a modular framework for hardware (FPGA) and software (C and C++) real-time runtime verification (RV). R2U2 runs *online*, during system execution, with minimal overhead. (It also runs *offline*, over simulated data streasms or recorded data logs.) R2U2 is *stream-based*; given a runtime requirement ϕ and an input computation π of sensor and software values at each timestamp i, R2U2 returns the verdict (true or false) for all i as to whether π, i |= ϕ. We call this output stream an *execution sequence* [34]; it is a stream of two-tuples verdict, time for every time i. R2U2 encodes specifications as *observers* (a set of which we call a *configuration*) via an optimized algorithm with published proofs of correctness, time, and space [18,20,34].

Figure 1 depicts a standard R2U2 workflow. To integrate R2U2 into a target system, we first need a validated set of runtime requirements. Given the system's resource constraints, the Configuration Compiler for Property Organization (C2PO) creates an optimized encoding of the input set of requirements as an R2U2 configuration. Users can

This work was funded by NSF CAREER Award CNS-1552934, NASA-ECF NNX16AR57G, NASA Cooperative Agreement Grant #80NSSC21M0121, and NSF:CPS Award 2038903. Thanks to the NASA Lunar Gateway Vehicle System Manager team for novel feature requests.

**Fig. 1.** Workflow for verifying a specification using R2U2. Red shaded boxes denote runtime components and blue shaded boxes denote design-time components. Note that for validation, the runtime components can run offline, e.g., by replacing the data stream with a log file of simulated data. Users formalize their system requirements as MLTL formulas within a C2PO specification, use C2PO to generate an R2U2 configuration, then monitor the verdicts R2U2 outputs based on the configuration and data stream. (Color figure online)

swap configurations monitored by R2U2 at runtime, during system execution, based on system state, mission phase, or to upgrade the specification version – all without recompiling and redeploying the R2U2 engine, a key feature for systems that require onerous code change certifications, or e.g., systems that need to be launched into space and then dynamically updated as their hardware degrades.

R2U2 fills the unique gap in the RV community described by its name [39]:


there is sufficient information to evaluate π, i |= ϕ, thus monitoring integrity, safety, and security requirements in real-time. Since the monitor's response time is a function of the specification and known a priori, higher-level autonomous system health and decision-making controllers can rely on R2U2 verdicts to provide a tight bound on mitigation triggering or other reactive behaviors.

**UNOBTRUSIVENESS** R2U2's multi-architecture, multi-platform design enables effective runtime verification while respecting crucial unobtrusiveness properties of embedded systems, including functionality (no change in behavior), certifiability (bounded time and memory under safety cases), timing (no interference with timing guarantees), and tolerances (respect constraints on size, weight, power, bandwidth, and overhead). R2U2 obeys unobtrusiveness constraints, provably fitting into tight resource limits and operational constraints frequently encountered in space missions. It can operate without code instrumentation or insight into black-box subcomponents such as ITAR, restricted, or closed-source modules [29].

*User Base.* After an extensive survey of all currently-available verification tools, NASA's Lunar Gateway Vehicle System Manager (VSM) team selected R2U2 for operational verification [8–10]; R2U2 is currently operating in the NASA core Flight System/core Flight Executive (cFS/cFE) [28] VSM environment. R2U2 is embedded in the space left over on the FPGA controlling NASA's Robonaut2's knee to provide realtime fault disambiguation [18], interfacing via the Robot Operating System (ROS) [31]. R2U2 is running on a UAS Traffic Management (UTM) system [5], where it recently detected a flight-plan timing fault. JAXA is running R2U2 on a 2021 autonomous satellite mission with a requirement for a provable memory bound of 200KB [30]. R2U2 recently verified a CubeSat communications system [24], an open-source UAS [16], a sounding rocket [15], and a high-altitude balloon [23]. The CySat-I satellite uses R2U2 for autonomous fault recovery [2]. In the recent past, R2U2 was used in NASA's Autonomy Operating System (AOS) for UAS [22] (where it flew on NASA's S1000 octocopter [21]), the NASA Swift UAS [13,34,36,43], and the NASA DragonEye UAS [41,44]. R2U2 aided in NASA embedded system battery prognostics [42] and a case study on small satellites and landers [35]. R2U2 has also proven useful for monitoring and diagnosis of security threats on-board NASA UAS like the DragonEye [27,40]. R2U2 was cataloged by the user community in a 2018 taxonomy of RV tools [12,39], and appeared in a 2020 Institute of Information Security (ETH Zürich, Switzerland) case study [33]. R2U2 is open-source, dual licensed under MIT1 and Apache-2.0.<sup>2</sup>

### **2 Compiler and Specification Language**

Specification is a notoriously difficult aspect of RV [37]; verification results are only meaningful if the input specifications are correct and complete with respect to the system requirements. An RV engine is only usable if system engineers can *validate* that it monitors its given requirements as they expect, so they can clearly explain when and why different RV verdicts occur. In consultation with outside groups using R2U2 on

<sup>1</sup> https://choosealicense.com/licenses/mit/.

<sup>2</sup> https://choosealicense.com/licenses/apache-2.0/.

486 C. Johannsen et al.

**Table 1.** Overview of changes to the R2U2 specification syntax for a basic temperature limit requirement, where *T emp* is located at index 0 of the input signal vector. This is not an exhaustive comparison but covers directly equivalent features, while Fig. 2 and the remainder of Sect. 2 detail new capabilities.


real systems [8,14,30], we developed a new specification language and an accompanying formula-set compiler. The language's and compiler's features make specifications easier to read and write, improving user productivity and easing validation to address the challenges of specification in RV.

#### **2.1 New Specification Language**

Previous versions of R2U2 used a specification language derived from the implementation of the hardware runtime engine. While sufficiently expressive for the creation of R2U2 configurations, it utilized a restricted syntax that supported only basic MLTL operators and single-operator expressions over non-Boolean data types. Writing specifications that are transparent and easy to validate could be difficult without in-depth knowledge of R2U2's architecture [17].

The new SMV-inspired [26] specification language allows users the option to write specifications more naturally with support for compound expressions over complex data types including sets and C-like structs as well as sections for defining structs, variables, macros, and MLTL formulas. C2PO supports Boolean, struct, and parametric set types

**Fig. 2.** Sample C2PO specification file using structs (lines 2–3, 12–13), sets (lines 3, 15–16), and set aggregation operators (lines 22–23). The specification on lines 19–20 captures the English requirement, "The active times for *rq*<sup>0</sup> and *rq*<sup>1</sup> shall differ by no more than 10.0 s," and the specification on lines 22–26 captures the English requirement, "For each request *r* of each arbiter in *ArbSet*, *r*'s status shall be GRANT or REJECT within the next 5 s and until then shall be WAITING."

with configurable integer and floating point types. To run R2U2 in software, users select a C standard type for each of the integer and float types e.g., an unsigned 16-bit integer (uint16\_t) and double-precision floating point (double). If targeting hardware (FPGA implementation), users can configure integer and float types to a bit-width supported by the target system. Table 1 presents a comparison between the old [39] and new syntaxes and Fig. 2 presents a sample file for monitoring a request-handling system.

To create an R2U2 configuration, C2PO generates an Abstract Syntax Tree (AST) representation of the input, performs type checking, applies optimizations and rewriting rules, then outputs the corresponding R2U2 configuration. R2U2 does not use automata to encode temporal logic observers (as reported erroneously elsewhere [12]); instead C2PO traverses the AST to produce assembly-like imperative evaluation instructions for the R2U2 monitor to executed at runtime.

In order to meet the demands of a wide range of systems, R2U2 Version 3.0 includes many optional features that are specific to one of the three implementations that can be enabled during system integration. For example, the Booleanizer module computes arbitrary non-Boolean expressions in the C implementation of R2U2, but this feature is not an option in the C++ or hardware implementations. C2PO allows users to enable or disable such features according to the capabilities of their target systems and chosen R2U2 implementation.

#### **2.2 Assume-Guarantee Contract Support**

Assume Guarantee Contracts (AGCs) provide a template for structuring and validating complex requirements in aerospace operational concepts [3]. AGCs feature a guard or trigger clause called the "assumption" and a system invariant called the "guarantee;" they have been used to structure both English and formal (e.g., temporal logic) requirements by projects including the NASA Lunar Gateway Vehicle System Manager [10]. R2U2 V3.0 now directly supports AGCs with an input syntax for expressing AGCs in C2PO and an output format for R2U2 that provides granular interpretation of verdicts, as presented in [17]. The input syntax for declaring an AGC is assumption => guarantee where the semantics for this logical implication provides three distinct cases: the AGC is "inactive" if the assumption is false, "true" if both the assumption and guarantee are true, and "false" otherwise. When the optional AGC feature is enabled, R2U2 produces three-valued verdicts to represent the state of the AGCs in a clear format; otherwise R2U2 interprets logical implications in the standard way (where false → true results in the verdict true rather than inactive).

#### **2.3 Set Aggregation**

A common pattern in real-world specifications applies an identical formula to various input signals, such as testing all temperature sensors for an overheat condition. A naive encoding of these specifications in MLTL can be excessively large to the point of obscuring intent while providing ample opportunity for copy-paste errors, typos, or incomplete updates to variables – all of which are difficult for humans to spot during validation. C2PO mitigates this issue by supporting set aggregation operators that compactly encode these expressions as sets of streams with a predicate applied to each element [14].

To illustrate, consider the specification in Fig. 2. The direct encoding of this specification without the "foreach" operator is

```
(rq0.status == W) U[0,5] (rq0.status == G || rq0.status == R) &&
(rq1.status == W) U[0,5] (rq1.status == G || rq1.status == R) &&
(rq2.status == W) U[0,5] (rq2.status == G || rq2.status == R) &&
(rq3.status == W) U[0,5] (rq3.status == G || rq3.status == R)
```
Contrast this with the more compact encoding using the "foreach" operator on lines 22− 26 in Fig. 2. The latter retains the intent of the English-level requirement while being semantically equivalent to the direct encoding. This concise representation both eases validation by improving readability and reduces the potential for errors by avoiding replicated values that require simultaneous updates.

#### **2.4 Common Subexpression Elimination**

C2PO uses an AST as the intermediate representation of its input and can therefore use optimization techniques common in compiler design such as Common Subexpression Elimination (CSE) [6]. Similarly to applying the isomorphism elimination rule for Binary Decision Trees [4], Common Subexpression Elimination (CSE) prunes all but one instance of any identical AST subtrees, reusing the result from that subtree for monitoring multiple requirements without wasting memory and execution time by representing it redundantly. Analysis of CSE on randomly-generated MLTL requirements resulted in a speed-up of 37% and required 4.3% less memory [18]. We expect larger savings in human-authored requirement specifications, however, due to reuse of both common specification patterns and structures in the underlying system. For example, a non-trivial subexpression might represent a system's confidence in its navigational fix and many specifications might depend on the navigation state, thus re-using this subexpression.

## **3 Resource Estimation GUI**

As R2U2's user base expands, so does the variance in the domain expertise of these specification authors; R2U2 V3.0 therefore enables resource-aware requirements specification by users without experience with the performance trade-offs of syntactically different but semantically equivalent temporal logic encodings. The R2U2 Configuration Explorer is a web application that provides visual feedback from C2PO about the resource costs of specifications, e.g., in the form of MLTL formulas; see Fig. 3. With a short feedback loop on critical parameters like execution time, memory, and relative formula size, all a user needs to understand is what resources are available on their target system (not R2U2 itself) to write performant specifications that fit the available resources.

**Fig. 3.** R2U2 Configuration Explorer web application: 1) C2PO specification input; 2) C2PO options; 3) C2PO output; 4) AST visualization; 5) AST node data; 6) R2U2 instruction; 7) C engine speed and memory calculator; 8) FPGA speed and size calculator; 9) FPGA design size vs maximum timestamp value.

#### **3.1 C2PO Feedback**

Feedback from C2PO (elements 1–6 in Fig. 3) allows users to visualize the intermediate representation of a given input specification as well as the effects of optimizations and options on their final R2U2 configurations. Properties such as the memory required to represent specifications with differently-sized temporal intervals, or syntactically different but functionally similar checks, can be unintuitive for users to compute on the fly. The AST visualization provides transparency into this process for users unfamiliar with R2U2's implementation via an interactive web-based interface suited to experimentation with different variations of a possible specification.

### **3.2 Software Resource Calculator**

The software resource calculator (element 7 in Fig. 3) provides users of the R2U2 software implementations with an estimate of the time and memory required to evaluate one time step of a specification in the worst case.

**Software Worst-Case Execution Time.** The highly optimized nature of R2U2's software implementations makes runtime performance highly dependent on the target platform's architecture, C/C++ compiler version, and make environment factors; e.g., the length of the current working directory name can impact cache alignment. We use a simplified computing model to provide an estimation of the computing speed based on the number of CPU cycles required for each operation on the target platform. Users can edit these clock cycle values in the GUI, e.g., to test for platform-specific latencies. The estimated worst-case execution time (WCET) in software W*sw* of an AST node g is:

$$W\_{sw}(g) = \sum\_{c \in \mathbb{C}\_g} (W\_{sw}(c)) + Cycles(g.type) \tag{1}$$

where C*<sup>g</sup>* are the children nodes of g and Cycles is a dictionary mapping AST node types to a corresponding number of clock cycles. For instance, Cycles(∧) = 10 cycles by default.

**Software Memory Requirements.** R2U2 uses Shared Connection Queues (SCQs) to store verdict-timestamp pairs for each node in the AST. SCQs are single-writer, manyreader circular buffers that buffer the results of dependent temporal expressions that might not be evaluated at the same timestamp. The total SCQ size for a specification is the total number of SCQ slots required by the specification multiplied by the size of one slot. The required number of SCQ slots for a node g is:

$$size(g.Queue) = \max(\max\{s.wpd \mid \forall s \in \mathbb{S}\_g\} - g.bpd, 0) + 1\tag{2}$$

where g.Queue is the output SCQ of g, s.wpd is the worst-case propagation delay of node s, s.bpd is the best-case propagation delay of node s, and S*<sup>g</sup>* is the set of sibling nodes of g. The propagation delays of a node represent the minimum and maximum number of time steps needed to evaluate the node and are defined recursively in Definition 4 of [18]. Intuitively, a node requires enough memory such that its results will not be overwritten before they are consumed by a parent node. The total SCQ memory of an AST is the sum of the sizes of SCQs of all nodes in the AST.

SCQ memory is an estimation of the actual total memory usage, but is typically the largest and most constraining memory type, e.g., as compared to instruction or pointer memory. The R2U2 C implementation statically fixes all memory sizes in advance to avoid dynamic allocation, so the SCQ sizing feedback is useful for: (1) selecting an initial size based on expected usage and; (2) verifying a configuration will fit on a deployed monitor with a fixed SCQ limit.

#### **3.3 Hardware Resource Calculator**

The hardware resource calculator (elements 8 − 9 in Fig. 3) provides estimations for hardware WCET (W*hw*), total SCQ memory slots, and a graph for visualizing estimated FPGA resource requirements - Look-Up Tables (LUT) and Block RAMs (BRAM). Required resources depend on the type of FPGA architecture. The GUI accepts clock rate, LUT-type, timestamp length, and node sizing as parameters to better match the estimate to a target platform. This approach was validated on Virtex-5 and Zynq7000 FPGA platforms as well as the ACTEL ProASIC3L used for Robonaut2 in [18].

**Hardware Worst-Case Execution Time.** The GUI computes the estimated W*hw* using a more precise method than in Sect. 3.2 by taking into account SCQ usage during execution. The R2U2 hardware implementation's estimated worst-case execution time (W*hw*) of an AST node g is:

$$\begin{aligned} W\_{hw}(g) &= \sum\_{c \in \mathbb{C}\_g} (W\_{hw}(c)) + Latency\_{init}(g.type) \\ &+ Latency\_{eval}(g.type) \* \sum\_{c \in \mathbb{C}\_g} (size(c.Queue)) \end{aligned} \tag{3}$$

where Latency*init*, Latency*eval* are dictionaries mapping AST node types to microsecond latencies corresponding to the initial and evaluation times of the node respectively. The multiplication accounts for evaluation of each buffered input from the child node, up to the queue size in the worst case.

**Hardware Memory Requirements.** The hardware resource calculator provides the explicit number of SCQ slots required for the collection of specifications in the specification set (aka *configuration*) using Formula 2 and summing sizes required for all AST nodes.

FPGAs use BRAMs to implement an R2U2 monitor's SCQ memory, where the size and number of ports of the BRAMs limit the queue depth of the BRAMs. To compute the required number of BRAMs, let d be the total SCQ size, w be the bit width of each verdict-timestamp pair, w*max* be the widest bit width the BRAM can accommodate, and D(w) be the maximum queue depth of a BRAM with verdict-timestamp pair bit width w. The required number of cascaded BRAMs is:

$$N\_{BRAM}(w,d) = \lceil \frac{d}{D(w\_{max})} \rceil \* mod(w, w\_{max}) + \lceil \frac{d}{D(rem(w, w\_{max}))} \rceil \tag{4}$$

**Hardware LUT Requirements.** Each R2U2 operator requires a constant number of comparator and adder/subtractor LUTs, configured by the user in the GUI. The GUI accounts for scaling based on the LUT type and uses the bit width of each verdicttimestamp pair w to estimate total LUT usage. The total number of required comparator LUTs (N*cmp*) and adder/subtractor LUTs (N*add*) are:

$$N\_{cmp}(w) = \begin{cases} 4 \ast w & \text{if LUT-3} \\ 2 \ast w & \text{if LUT-4} \\ w & \text{if LUT-6} \end{cases} \qquad N\_{add}(w) = \begin{cases} 2 \ast w & \text{if LUT-3 or LUT-4} \\ w & \text{if LUT-6} \end{cases}$$

## **4 Runtime Engine Improvements**

To better serve mission-critical systems that must satisfy strict flight certification requirements (such as NASA's VSM [8–10]), we have made a number of improvements to the internal architecture of the C version of R2U2 that provide memory assurances and flexibility as well as extended computational abilities. Figure 4 depicts this updated architecture.

**Static Memory Arenas.** The R2U2 V3.0 C version uses only statically-allocated memory. This avoids the many pitfalls of allocating memory (slow allocator calls, fragmentation, leaks, out-of-memory errors, etc.) and guarantees the amount of memory required for the entire execution of R2U2 up front. Additionally, many mission-critical systems either do not have or do not permit dynamic memory allocation, e.g., to satisfy requirements for flight certification [32]. R2U2 now runs unmodified on these platforms as well as traditional systems.

Each type of memory (yellow boxes of Fig. 4) has a predefined "arena" with a maximum size set during integration of the monitor with the target platform. When a user loads an R2U2 configuration, R2U2 fills the slots of these arenas in sequence until the arena is full.

**Monitor Type Parameterization.** Complimentary to the switch to static memory, the internals of the reasoning engine are now fully parameterized. A single header file allows users to adjust maximum values, bit widths, and even internal types. Proper tuning has performance benefits, but crucially allows users to fit R2U2 to use the exact amounts of resources available on a target system. For example, limiting the size of the gaps between timestamps, e.g., in cases where the specification will be either reset frequently or evaluated infrequently, allows more SCQs to fit in the same amount of memory permitting larger formula sets with functionally similar behavior.

**Fig. 4.** Internal architecture of an R2U2 monitor. Orange boxes are streams of data, yellow boxes are memory arenas, and blue boxes are modules. Arrows entering and exiting blue boxes denote read and write relationships respectively. The red arrows denote relationships that are only active upon startup i.e., when R2U2 populates instruction memory and configures SCQ memory. (Color figure online)

**Arbitrary Data Flow.** R2U2 initially worked as a stack of engines, at each timestamp passing results from the Atomic Checker (AT) to the Temporal Logic engine (TL), then passing the TL verdicts through the Bayesian Network (BN) layer to produce that timestamp's verdict [34]. Now, R2U2 can connect these engines in any order. This simplifies configuration generation from the perspective of C2PO, enabling arbitrary ordering of instructions. Atomic checker properties can now accept results of temporal logic formulas as input, for example, without adding a confusing step delay in the verdict stream.

**AT Checker Extended Mode.** The C version of the atomic checker has an extended mode allowing for additional comparisons and filters beyond the standard hardwarecompatible set. In extended mode, the atomic checker produces Boolean "atomics" from conditionals, where each conditional compares the result of a filter to either a constant or another input signal. Filters are predefined functions such as simple data type casts (bool, int, float, etc.) or mathematical functions like rate, moving average, or absolute angle difference. For example:


**Booleanizer.** The R2U2 V3.0 C implementation includes a new general-purpose computing module that uses a three-address code representation [7] called the Booleanizer that can take the place of the AT checker. This module enables arbitrary expressions over non-Boolean data types using arithmetic, bitwise, and relational operators as well as extended set aggregation operators such as "forexactlyn" or "foratmostn" operators.

## **5 Discussion**

R2U2's toolchain now provides an effective means by which to formalize, validate, and verify system requirements in real time, giving users control and transparency of the memory and feature set of their target-specific monitors. We have combined the collection of capabilities from previously-published R2U2 case studies into one modular, centralized implementation that we have rigorously evaluated for correctness (e.g., using [19,38]).

C2PO and its new specification language enable higher-level abstractions for users that make the specification development process faster, more transparent, and less reliant on a deep understanding of R2U2's underlying algorithms. The new GUI frontend allows up-front specification design and resource usage estimation by system designers so that users can rapidly prototype specifications before downloading and using R2U2. These improvements make specifying, validating, and monitoring system requirements easier and more accessible to the systems that stand to benefit most from RV. Since specification is the biggest bottleneck to formal methods and autonomy [37], this is an important feature for an RV engine.

It is now much easier to integrate R2U2 into production environments, like NASA cFS/cFE [25,28] or ROS [31], due to the unified front end compiler, expanded engine capabilities, and better user tooling. Recently R2U2 has launched on several real-life, full-scale air and space missions, largely enabled by these advancements. This major upgrade lays a solid foundation for expanded RV capabilities and integration into a wider array of missions and embedded architectures.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **A**

Abdulla, Parosh Aziz I-184 Akshay, S. I-266, I-367, III-86 Albert, Elvira III-176 Alistarh, Dan I-156 Alur, Rajeev I-415 Amilon, Jesper III-281 Amir, Guy II-438 An, Jie I-62 Anand, Ashwani I-436 Andriushchenko, Roman III-113 Apicelli, Andrew I-27 Arcaini, Paolo I-62 Asada, Kazuyuki III-40 Ascari, Flavio II-41 Atig, Mohamed Faouzi I-184

#### **B**

Badings, Thom III-62 Barrett, Clark II-163, III-154 Bastani, Favyen I-459 Bastani, Osbert I-415, I-459 Bayless, Sam I-27 Becchi, Anna II-288 Beutner, Raven II-309 Bisping, Benjamin I-85 Blicha, Martin II-209 Bonchi, Filippo II-41 Bork, Alexander III-113 Braught, Katherine I-351 Britikov, Konstantin II-209 Brown, Fraser III-154 Bruni, Roberto II-41 Bucev, Mario III-398

#### **C**

Calinescu, Radu I-289 Ceška, Milan ˇ III-113 Chakraborty, Supratik I-367 Chatterjee, Krishnendu III-16, III-86 Chaudhuri, Swarat III-213 Chechik, Marsha III-374 Chen, Hanyue I-40 Chen, Taolue III-255 Chen, Yu-Fang III-139 Choi, Sung Woo II-397 Chung, Kai-Min III-139 Cimatti, Alessandro II-288 Cosler, Matthias II-383 Couillard, Eszter III-437 Czerner, Philipp III-437

#### **D**

Dardik, Ian I-326 Das, Ankush I-27 David, Cristina III-459 Dongol, Brijesh I-206 Dreossi, Tommaso I-253 Dutertre, Bruno II-187

#### **E**

Eberhart, Clovis III-40 Esen, Zafer III-281 Esparza, Javier III-437

#### **F**

Farzan, Azadeh I-109 Fedorov, Alexander I-156 Feng, Nick III-374 Finkbeiner, Bernd II-309 Fremont, Daniel J. I-253 Frenkel, Hadar II-309 Fu, Hongfei III-16 Fu, Yu-Fu II-227, III-329

#### **G**

Gacek, Andrew I-27 Garcia-Contreras, Isabel II-64

© The Editor(s) (if applicable) and The Author(s) 2023 C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13966, pp. 499–502, 2023. https://doi.org/10.1007/978-3-031-37709-9

Gastin, Paul I-266 Genaim, Samir III-176 Getir Yaman, Sinem I-289 Ghosh, Shromona I-253 Godbole, Adwait I-184 Goel, Amit II-187 Goharshady, Amir Kafshdar III-16 Goldberg, Eugene II-110 Gopinath, Divya I-289 Gori, Roberta II-41 Govind, R. I-266 Govind, V. K. Hari II-64 Griggio, Alberto II-288, III-423 Guilloud, Simon III-398 Gurfinkel, Arie II-64 Gurov, Dilian III-281

#### **H**

Hahn, Christopher II-383 Hasuo, Ichiro I-62, II-41, III-40 Henzinger, Thomas A. II-358 Hofman, Piotr I-132 Hovland, Paul D. II-265 Hückelheim, Jan II-265

#### **I**

Imrie, Calum I-289

#### **J**

Jaganathan, Dhiva I-27 Jain, Sahil I-367 Jansen, Nils III-62 Je˙z, Artur II-18 Johannsen, Chris III-483 Johnson, Taylor T. II-397 Jonáš, Martin III-423 Jones, Phillip III-483 Joshi, Aniruddha R. I-266 Jothimurugan, Kishor I-415 Junges, Sebastian III-62, III-113

#### **K**

Kang, Eunsuk I-326 Karimi, Mahyar II-358 Kashiwa, Shun I-253 Katoen, Joost-Pieter III-113 Katz, Guy II-438 Kempa, Brian III-483 Kiesl-Reiter, Benjamin II-187 Kim, Edward I-253 Kirchner, Daniel III-176 Kokologiannakis, Michalis I-230 Kong, Soonho II-187 Kori, Mayuko II-41 Koval, Nikita I-156 Kremer, Gereon II-163 Kˇretínský, Jan I-390 Krishna, Shankaranarayanan I-184 Kueffner, Konstantin II-358 Kunˇcak, Viktor III-398

#### **L**

Lafortune, Stéphane I-326 Lahav, Ori I-206 Lengál, Ondˇrej III-139 Lette, Danya I-109 Li, Elaine III-350 Li, Haokun II-87 Li, Jianwen II-288 Li, Yangge I-351 Li, Yannan II-335 Lidström, Christian III-281 Lin, Anthony W. II-18 Lin, Jyun-Ao III-139 Liu, Jiaxiang II-227, III-329 Liu, Mingyang III-255 Liu, Zhiming I-40 Lopez, Diego Manzanas II-397 Lotz, Kevin II-187 Luo, Ziqing II-265

#### **M**

Maayan, Osher II-438 Macák, Filip III-113 Majumdar, Rupak II-187, III-3, III-437 Mallik, Kaushik II-358, III-3 Mangal, Ravi I-289 Marandi, Ahmadreza III-62 Markgraf, Oliver II-18 Marmanis, Iason I-230 Marsso, Lina III-374 Martin-Martin, Enrique III-176 Mazowiecki, Filip I-132 Meel, Kuldeep S. II-132 Meggendorfer, Tobias I-390, III-86 Meira-Góes, Rômulo I-326 Mell, Stephen I-459 Mendoza, Daniel II-383

Metzger, Niklas II-309 Meyer, Roland I-170 Mi, Junri I-40 Milovanˇcevi´c, Dragana III-398 Mitra, Sayan I-351

#### **N**

Nagarakatte, Santosh III-226 Narayana, Srinivas III-226 Nayak, Satya Prakash I-436 Niemetz, Aina II-3 Nowotka, Dirk II-187

#### **O**

Offtermatt, Philip I-132 Opaterny, Anton I-170 Ozdemir, Alex II-163, III-154

#### **P**

Padhi, Saswat I-27 P˘as˘areanu, Corina S. I-289 Peng, Chao I-304 Perez, Mateo I-415 Preiner, Mathias II-3 Prokop, Maximilian I-390 Pu, Geguang II-288

#### **R**

Reps, Thomas III-213 Rhea, Matthew I-253 Rieder, Sabine I-390 Rodríguez, Andoni III-305 Roy, Subhajit III-190 Rozier, Kristin Yvonne III-483 Rümmer, Philipp II-18, III-281 Rychlicki, Mateusz III-3

#### **S**

Sabetzadeh, Mehrdad III-374 Sánchez, César III-305 Sangiovanni-Vincentelli, Alberto L. I-253 Schapira, Michael II-438 Schmitt, Frederik II-383 Schmuck, Anne-Kathrin I-436, III-3 Seshia, Sanjit A. I-253 Shachnai, Matan III-226 Sharma, Vaibhav I-27

Sharygina, Natasha II-209 Shen, Keyi I-351 Shi, Xiaomu II-227, III-329 Shoham, Sharon II-64 Siegel, Stephen F. II-265 Sistla, Meghana III-213 Sokolova, Maria I-156 Somenzi, Fabio I-415 Song, Fu II-413, III-255 Soudjani, Sadegh III-3 Srivathsan, B. I-266 Stanford, Caleb II-241 Stutz, Felix III-350 Su, Yu I-40 Sun, Jun II-413 Sun, Yican III-16

#### **T**

Takhar, Gourav III-190 Tang, Xiaochao I-304 Tinelli, Cesare II-163 Topcu, Ufuk III-62 Tran, Hoang-Dung II-397 Tripakis, Stavros I-326 Trippel, Caroline II-383 Trivedi, Ashutosh I-415 Tsai, Ming-Hsien II-227, III-329 Tsai, Wei-Lun III-139 Tsitelov, Dmitry I-156

#### **V**

Vafeiadis, Viktor I-230 Vahanwala, Mihir I-184 Veanes, Margus II-241 Vin, Eric I-253 Vishwanathan, Harishankar III-226

#### **W**

Waga, Masaki I-3 Wahby, Riad S. III-154 Wang, Bow-Yaw II-227, III-329 Wang, Chao II-335 Wang, Jingbo II-335 Wang, Meng III-459 Watanabe, Kazuki III-40 Wehrheim, Heike I-206 Whalen, Michael W. I-27 Wies, Thomas I-170, III-350

Wolff, Sebastian I-170 Wu, Wenhao II-265

#### **X**

Xia, Bican II-87 Xia, Yechuan II-288

#### **Y**

Yadav, Raveesh I-27 Yang, Bo-Yin II-227, III-329 Yang, Jiong II-132 Yang, Zhengfeng I-304 Yu, Huafeng I-289 Yu, Yijun III-459 Yue, Xiangyu I-253

#### **Z**

Zdancewic, Steve I-459 Zelazny, Tom II-438 Zeng, Xia I-304 Zeng, Zhenbing I-304 Zhang, Hanliang III-459 Zhang, Li I-304 Zhang, Miaomiao I-40 Zhang, Pei III-483 Zhang, Yedi II-413 Zhang, Zhenya I-62 Zhao, Tianqi II-87 Zhu, Haoqing I-351 Žikeli´c, Ðor de III-86 Zufferey, Damien III-350